<a href="https://colab.research.google.com/github/andradeM17/Data-quality/blob/main/LLMs-as-judges/Round%202/gpt-oss/run-colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Run OpenAI gpt-oss 20B in a FREE Google Colab

OpenAI released `gpt-oss` [120B](https://hf.co/openai/gpt-oss-120b) and [20B](https://hf.co/openai/gpt-oss-20b). Both models are Apache 2.0 licensed.

Specifically, `gpt-oss-20b` was made for lower latency and local or specialized use cases (21B parameters with 3.6B active parameters).

Since the models were trained in native MXFP4 quantization it makes it easy to run the 20B even in resource constrained environments like Google Colab.

Authored by: [Pedro](https://huggingface.co/pcuenq) and [VB](https://huggingface.co/reach-vb)

## Setup environment

In [None]:
!pip install -q --upgrade torch==2.8

In [None]:
!pip install -q transformers triton==3.4 kernels

In [None]:
!pip uninstall -q torchvision torchaudio -y

Please, restart your Colab runtime session after installing the packages above.

## Load the model and data

---



We load the model from here: [openai/gpt-oss-20b](https://hf.co/openai/gpt-oss-20b)

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "openai/gpt-oss-20b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="cuda",
)

In [None]:
from google.colab import drive
drive.mount('/content/drive')


In [None]:
import pandas as pd

llm_data = pd.read_csv('/content/drive/MyDrive/llmj-data.csv')
print("llmdata.csv loaded successfully.")


## Specify Reasoning Effort

Simply pass it as an additional argument to `apply_chat_template()`. Supported values are `"low"`, `"medium"` (default), or `"high"`.

In [None]:
guidelines= '''You will be given a text, containing two parts
IF the first sample is "nan", it is a monolingual sample; ELSE the sample is parallel data.
Using the guidelines, give the correct code for the text: NL, WL, X, CS, CB, or CC. Finish your statement in the format:
'\n
Annotation: [value]
\n\n';


Guidelines:
  IF the content is NOT linguistic OR you are unsure THEN annotate as NL NOTE: ""No or unsure"" → NL
  ELSE IF the text is parallel data AND the target side is NOT in Modern (standardised) Irish OR the source side is NOT in Modern English THEN annotate as WL NOTE: ""No or Unsure"" → WL NOTE: Untranslated named entities on either side are treated as belonging to that language (e.g., ""Capnat"" on the English side counts as English) NOTE: If text is entirely named entities, it is more likely to be ‘Yes’
  ELSE IF the Irish target text is NOT a direct translation of the English source text THEN annotate as X NOTE: ""Yes or unsure"" → next question; if not, annotate X
  ELSE IF the Irish text is short (just headings, single unrelated phrases, OR five words or fewer) THEN annotate as CS NOTE: Short text = CS
  ELSE IF the text is boilerplate OR low quality (text that would repeat across similar webpages, unnatural code-switching, unnatural phrasing, frequent unnatural formatting, misalignments) THEN annotate as CB NOTE: CB includes text not representative of natural language One formatting problem alone is NOT enough; more than one may qualify
  ELSE annotate as CC NOTE: CC = natural language, not short, not boilerplate, properly aligned.'''




for index, row in llm_data.iterrows():
    # Assuming the first two columns (index 0 and 1) contain the text
    # and need to be formatted as '"col1_content", "col2_content"'
    text = f'"{str(row.iloc[0])}", "{str(row.iloc[1])}"'
    print(f"\n{text}\n\n")

    messages = [
        {"role": "system", "content": guidelines},
        {"role": "user", "content": text},
    ]

    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt",
        return_dict=True,
        reasoning_effort="low",
    ).to(model.device)

    generated = model.generate(**inputs, max_new_tokens=500)
    decoded_output = tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:])

    start_marker = "<|start|>assistant<|channel|>final<|message|>"
    end_marker = "<|return|>"

    start_index = decoded_output.find(start_marker)
    if start_index != -1:
        end_index = decoded_output.find(end_marker, start_index + len(start_marker))
        if end_index != -1:
            extracted_text = decoded_output[start_index + len(start_marker):end_index].strip()
            print(extracted_text)
        else:
            print("No annotation")
    else:
        print("No annotation")