# OpenAI OSS ADVANCED-fine-tuning by Trelis
Advanced scripts available at [Trelis.com](https://Trelis.com/ADVANCED-fine-tuning)

*Based on the [OpenAI cookbook notebook](https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers).*


Large reasoning models like **OpenAI o3** generate a *chain‑of‑thought* to improve the accuracy and quality of their responses.  
However, most of these models reason in English, even when a question is asked in another language.

In this notebook, we show how the open‑weight reasoning model **`openai/gpt-oss-20b`** can be fine‑tuned to reason effectively in multiple languages.  
We'll add a new **“reasoning language”** option to the model’s system prompt and apply supervised fine‑tuning with Hugging Face’s **TRL** library on a multilingual reasoning dataset.

**Outline**

1. **Setup** – install libraries  
2. **Prepare the dataset** – download & format  
3. **Prepare the model** – load, quantize & LoRA‑wrap  
4. **Fine‑tuning** – train with multilingual reasoning data  
5. **Inference** – generate reasoning responses in different languages  

When we're done you’ll have a multilingual reasoning model that can:  

* reason in **English, Spanish, French, Italian, or German**,  
* even mix languages – e.g. ask in Spanish, reason in German, answer in Spanish.

> **Example**

```
User:
    ¿Cuál es el capital de Australia?
Assistant reasoning:
    Okay, der Benutzer fragt nach der Hauptstadt Australiens. [...]
Assistant response:
    La capital de Australia es **Canberra**. [...]
```


## 1&nbsp;&nbsp;Setup

In [1]:
# Install PyTorch (CUDA 12.8 build)
!python -m pip install --upgrade pip
!pip install uv -qU

# !pip show torch

!uv pip install torch --index-url https://download.pytorch.org/whl/cu128 --system -q

# Install remaining dependencies
!uv pip install tensorboard hf_transfer huggingface_hub "trl>=0.20.0" "peft>=0.17.0" "transformers>=4.55.0" trackio --system -q

import os
os.environ["HF_TRANSFER"] = "1"



In [2]:
from huggingface_hub import HfFolder, login

# Check if a token is already saved
if HfFolder.get_token() is None:
    login()  # Will prompt only if not logged in

## 2&nbsp;&nbsp;Prepare the dataset

In [3]:
from datasets import load_dataset

# Load full dataset
dataset = load_dataset("HuggingFaceH4/Multilingual-Thinking", split="train")

# --- Optional validation split ---
do_val_split = True  # Set to False to skip splitting

if do_val_split:
    # Determine number of rows to reserve for validation
    val_size = min(int(0.1 * len(dataset)), 32)
    val_size = min(val_size, len(dataset))  # ensure we don't exceed dataset size

    # Shuffle and split
    dataset = dataset.shuffle(seed=42)
    val_dataset = dataset.select(range(val_size))
    train_dataset = dataset.select(range(val_size, len(dataset)))
else:
    train_dataset = dataset
    val_dataset = None

# --- Output ---
print(f"Train size: {len(train_dataset)}")
if val_dataset:
    print(f"Validation size: {len(val_dataset)}")

Train size: 968
Validation size: 32


In [4]:
# Look at the first training example
train_dataset

Dataset({
    features: ['reasoning_language', 'developer', 'user', 'analysis', 'final', 'messages'],
    num_rows: 968
})

In [5]:
val_dataset

Dataset({
    features: ['reasoning_language', 'developer', 'user', 'analysis', 'final', 'messages'],
    num_rows: 32
})

The **gpt‑oss** models use the *Harmony* response format to structure conversations:

| role       | purpose                                                         |
|------------|-----------------------------------------------------------------|
| developer  | custom system instructions                                      |
| user       | user input                                                      |
| assistant  | tool calls or responses                                         |
| analysis   | chain‑of‑thought                                                |
| final      | final answer for the end‑user                                   |

We convert these messages with `tokenizer.apply_chat_template()` so the model understands them.

In [6]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b")

More details on prompt formatting [here](https://cookbook.openai.com/articles/openai-harmony).

**Purpose**

*A Jinja2-based template to format chat messages for an LLM that supports:*
- *Chain-of-thought reasoning*
- *Tool usage (e.g., browser, Python)*
- *Multilingual support*

**Core Components**

**1. System Message Generation (`build_system_message`)**

*Adds model metadata:*
- *`model_identity`*
- *`Knowledge cutoff`*
- *`Current date`*
- *`Reasoning effort`*
- *Renders tool sections if tools are available*

**2. Tool Rendering Macros**

- *`render_tool_namespace`: formats user-defined tools*
- *`render_builtin_tools`: supports built-ins like `browser`, `python`*
- *Uses TypeScript-style function signature rendering*

**3. Message Rendering Logic**

*Wraps each message with structured tags:*
```text
<|start|>role<|channel|>type<|message|>...<|end|>
```
*Channels include:*
- *`analysis`*
- *`commentary`*
- *`final`*
*Supports roles: `user`, `assistant`, `tool`, `developer`*

**4. Tool Call State Tracking**

*Tracks the most recent tool name via `last_tool_call`*
*Links tool outputs to the correct assistant call*

**5. Helper Macros**

- *`render_typescript_type`: handles rendering of complex tool param types*

**6. Validation Rules**

*Throws exceptions if formatting rules are broken:*
- *Mixing `<|channel|>` inside message content*
- *Providing both `thinking` and `content` in assistant messages with tools*

**7. Inference Prompt Ending**

*Optionally appends:*
```text
<|start|>assistant
```
*to signal the model to g

In [7]:
from IPython.display import Markdown, display

# Wrap it in a code block for syntax highlighting
display(Markdown(f"```jinja2\n{tokenizer.chat_template}\n```"))

```jinja2
{#-
  In addition to the normal inputs of `messages` and `tools`, this template also accepts the
  following kwargs:
  - "builtin_tools": A list, can contain "browser" and/or "python".
  - "model_identity": A string that optionally describes the model identity.
  - "reasoning_effort": A string that describes the reasoning effort, defaults to "medium".
 #}

{#- Tool Definition Rendering ============================================== #}
{%- macro render_typescript_type(param_spec, required_params, is_nullable=false) -%}
    {%- if param_spec.type == "array" -%}
        {%- if param_spec['items'] -%}
            {%- if param_spec['items']['type'] == "string" -%}
                {{- "string[]" }}
            {%- elif param_spec['items']['type'] == "number" -%}
                {{- "number[]" }}
            {%- elif param_spec['items']['type'] == "integer" -%}
                {{- "number[]" }}
            {%- elif param_spec['items']['type'] == "boolean" -%}
                {{- "boolean[]" }}
            {%- else -%}
                {%- set inner_type = render_typescript_type(param_spec['items'], required_params) -%}
                {%- if inner_type == "object | object" or inner_type|length > 50 -%}
                    {{- "any[]" }}
                {%- else -%}
                    {{- inner_type + "[]" }}
                {%- endif -%}
            {%- endif -%}
            {%- if param_spec.nullable -%}
                {{- " | null" }}
            {%- endif -%}
        {%- else -%}
            {{- "any[]" }}
            {%- if param_spec.nullable -%}
                {{- " | null" }}
            {%- endif -%}
        {%- endif -%}
    {%- elif param_spec.type is defined and param_spec.type is iterable and param_spec.type is not string and param_spec.type is not mapping and param_spec.type[0] is defined -%}
        {#- Handle array of types like ["object", "object"] from Union[dict, list] #}
        {%- if param_spec.type | length > 1 -%}
            {{- param_spec.type | join(" | ") }}
        {%- else -%}
            {{- param_spec.type[0] }}
        {%- endif -%}
    {%- elif param_spec.oneOf -%}
        {#- Handle oneOf schemas - check for complex unions and fallback to any #}
        {%- set has_object_variants = false -%}
        {%- for variant in param_spec.oneOf -%}
            {%- if variant.type == "object" -%}
                {%- set has_object_variants = true -%}
            {%- endif -%}
        {%- endfor -%}
        {%- if has_object_variants and param_spec.oneOf|length > 1 -%}
            {{- "any" }}
        {%- else -%}
            {%- for variant in param_spec.oneOf -%}
                {{- render_typescript_type(variant, required_params) -}}
                {%- if variant.description %}
                    {{- "// " + variant.description }}
                {%- endif -%}
                {%- if variant.default is defined %}
                    {{ "// default: " + variant.default|tojson }}
                {%- endif -%}
                {%- if not loop.last %}
                    {{- " | " }}
                {% endif -%}
            {%- endfor -%}
        {%- endif -%}
    {%- elif param_spec.type == "string" -%}
        {%- if param_spec.enum -%}
            {{- '"' + param_spec.enum|join('" | "') + '"' -}}
        {%- else -%}
            {{- "string" }}
            {%- if param_spec.nullable %}
                {{- " | null" }}
            {%- endif -%}
        {%- endif -%}
    {%- elif param_spec.type == "number" -%}
        {{- "number" }}
    {%- elif param_spec.type == "integer" -%}
        {{- "number" }}
    {%- elif param_spec.type == "boolean" -%}
        {{- "boolean" }}

    {%- elif param_spec.type == "object" -%}
        {%- if param_spec.properties -%}
            {{- "{
" }}
            {%- for prop_name, prop_spec in param_spec.properties.items() -%}
                {{- prop_name -}}
                {%- if prop_name not in (param_spec.required or []) -%}
                    {{- "?" }}
                {%- endif -%}
                {{- ": " }}
                {{ render_typescript_type(prop_spec, param_spec.required or []) }}
                {%- if not loop.last -%}
                    {{-", " }}
                {%- endif -%}
            {%- endfor -%}
            {{- "}" }}
        {%- else -%}
            {{- "object" }}
        {%- endif -%}
    {%- else -%}
        {{- "any" }}
    {%- endif -%}
{%- endmacro -%}

{%- macro render_tool_namespace(namespace_name, tools) -%}
    {{- "## " + namespace_name + "

" }}
    {{- "namespace " + namespace_name + " {

" }}
    {%- for tool in tools %}
        {%- set tool = tool.function %}
        {{- "// " + tool.description + "
" }}
        {{- "type "+ tool.name + " = " }}
        {%- if tool.parameters and tool.parameters.properties %}
            {{- "(_: {
" }}
            {%- for param_name, param_spec in tool.parameters.properties.items() %}
                {%- if param_spec.description %}
                    {{- "// " + param_spec.description + "
" }}
                {%- endif %}
                {{- param_name }}
                {%- if param_name not in (tool.parameters.required or []) -%}
                    {{- "?" }}
                {%- endif -%}
                {{- ": " }}
                {{- render_typescript_type(param_spec, tool.parameters.required or []) }}
                {%- if param_spec.default is defined -%}
                    {%- if param_spec.enum %}
                        {{- ", // default: " + param_spec.default }}
                    {%- elif param_spec.oneOf %}
                        {{- "// default: " + param_spec.default }}
                    {%- else %}
                        {{- ", // default: " + param_spec.default|tojson }}
                    {%- endif -%}
                {%- endif -%}
                {%- if not loop.last %}
                    {{- ",
" }}
                {%- else %}
                    {{- "
" }}
                {%- endif -%}
            {%- endfor %}
            {{- "}) => any;

" }}
        {%- else -%}
            {{- "() => any;

" }}
        {%- endif -%}
    {%- endfor %}
    {{- "} // namespace " + namespace_name }}
{%- endmacro -%}

{%- macro render_builtin_tools(browser_tool, python_tool) -%}
    {%- if browser_tool %}
        {{- "## browser

" }}
        {{- "// Tool for browsing.
" }}
        {{- "// The `cursor` appears in brackets before each browsing display: `[{cursor}]`.
" }}
        {{- "// Cite information from the tool using the following format:
" }}
        {{- "// `【{cursor}†L{line_start}(-L{line_end})?】`, for example: `【6†L9-L11】` or `【8†L3】`.
" }}
        {{- "// Do not quote more than 10 words directly from the tool output.
" }}
        {{- "// sources=web (default: web)
" }}
        {{- "namespace browser {

" }}
        {{- "// Searches for information related to `query` and displays `topn` results.
" }}
        {{- "type search = (_: {
" }}
        {{- "query: string,
" }}
        {{- "topn?: number, // default: 10
" }}
        {{- "source?: string,
" }}
        {{- "}) => any;

" }}
        {{- "// Opens the link `id` from the page indicated by `cursor` starting at line number `loc`, showing `num_lines` lines.
" }}
        {{- "// Valid link ids are displayed with the formatting: `【{id}†.*】`.
" }}
        {{- "// If `cursor` is not provided, the most recent page is implied.
" }}
        {{- "// If `id` is a string, it is treated as a fully qualified URL associated with `source`.
" }}
        {{- "// If `loc` is not provided, the viewport will be positioned at the beginning of the document or centered on the most relevant passage, if available.
" }}
        {{- "// Use this function without `id` to scroll to a new location of an opened page.
" }}
        {{- "type open = (_: {
" }}
        {{- "id?: number | string, // default: -1
" }}
        {{- "cursor?: number, // default: -1
" }}
        {{- "loc?: number, // default: -1
" }}
        {{- "num_lines?: number, // default: -1
" }}
        {{- "view_source?: boolean, // default: false
" }}
        {{- "source?: string,
" }}
        {{- "}) => any;

" }}
        {{- "// Finds exact matches of `pattern` in the current page, or the page given by `cursor`.
" }}
        {{- "type find = (_: {
" }}
        {{- "pattern: string,
" }}
        {{- "cursor?: number, // default: -1
" }}
        {{- "}) => any;

" }}
        {{- "} // namespace browser

" }}
    {%- endif -%}

    {%- if python_tool %}
        {{- "## python

" }}
        {{- "Use this tool to execute Python code in your chain of thought. The code will not be shown to the user. This tool should be used for internal reasoning, but not for code that is intended to be visible to the user (e.g. when creating plots, tables, or files).

" }}
        {{- "When you send a message containing Python code to python, it will be executed in a stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 120.0 seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is UNKNOWN. Depends on the cluster.

" }}
    {%- endif -%}
{%- endmacro -%}

{#- System Message Construction ============================================ #}
{%- macro build_system_message() -%}
    {%- if model_identity is not defined %}
        {%- set model_identity = "You are ChatGPT, a large language model trained by OpenAI." %}
    {%- endif %}
    {{- model_identity + "
" }}
    {{- "Knowledge cutoff: 2024-06
" }}
    {{- "Current date: " + strftime_now("%Y-%m-%d") + "

" }}
    {%- if reasoning_effort is not defined %}
        {%- set reasoning_effort = "medium" %}
    {%- endif %}
    {{- "Reasoning: " + reasoning_effort + "

" }}
    {%- if builtin_tools %}
        {{- "# Tools

" }}
        {%- set available_builtin_tools = namespace(browser=false, python=false) %}
        {%- for tool in builtin_tools %}
            {%- if tool == "browser" %}
                {%- set available_builtin_tools.browser = true %}
            {%- elif tool == "python" %}
                {%- set available_builtin_tools.python = true %}
            {%- endif %}
        {%- endfor %}
        {{- render_builtin_tools(available_builtin_tools.browser, available_builtin_tools.python) }}
    {%- endif -%}
    {{- "# Valid channels: analysis, commentary, final. Channel must be included for every message." }}
    {%- if tools -%}
        {{- "
Calls to these tools must go to the commentary channel: 'functions'." }}
    {%- endif -%}
{%- endmacro -%}

{#- Main Template Logic ================================================= #}
{#- Set defaults #}

{#- Render system message #}
{{- "<|start|>system<|message|>" }}
{{- build_system_message() }}
{{- "<|end|>" }}

{#- Extract developer message #}
{%- if messages[0].role == "developer" or messages[0].role == "system" %}
    {%- set developer_message = messages[0].content %}
    {%- set loop_messages = messages[1:] %}
{%- else %}
    {%- set developer_message = "" %}
    {%- set loop_messages = messages %}
{%- endif %}

{#- Render developer message #}
{%- if developer_message or tools %}
    {{- "<|start|>developer<|message|>" }}
    {%- if developer_message %}
        {{- "# Instructions

" }}
        {{- developer_message }}
    {%- endif %}
    {%- if tools -%}
        {{- "

" }}
        {{- "# Tools

" }}
        {{- render_tool_namespace("functions", tools) }}
    {%- endif -%}
    {{- "<|end|>" }}
{%- endif %}

{#- Render messages #}
{%- set last_tool_call = namespace(name=none) %}
{%- for message in loop_messages -%}
    {#- At this point only assistant/user/tool messages should remain #}
    {%- if message.role == 'assistant' -%}
        {#- Checks to ensure the messages are being passed in the format we expect #}
        {%- if "content" in message %}
            {%- if "<|channel|>analysis<|message|>" in message.content or "<|channel|>final<|message|>" in message.content %}
                {{- raise_exception("You have passed a message containing <|channel|> tags in the content field. Instead of doing this, you should pass analysis messages (the string between '<|message|>' and '<|end|>') in the 'thinking' field, and final messages (the string between '<|message|>' and '<|end|>') in the 'content' field.") }}
            {%- endif %}
        {%- endif %}
        {%- if "thinking" in message %}
            {%- if "<|channel|>analysis<|message|>" in message.thinking or "<|channel|>final<|message|>" in message.thinking %}
                {{- raise_exception("You have passed a message containing <|channel|> tags in the thinking field. Instead of doing this, you should pass analysis messages (the string between '<|message|>' and '<|end|>') in the 'thinking' field, and final messages (the string between '<|message|>' and '<|end|>') in the 'content' field.") }}
            {%- endif %}
        {%- endif %}
        {%- if "tool_calls" in message %}
            {#- We assume max 1 tool call per message, and so we infer the tool call name #}
            {#- in "tool" messages from the most recent assistant tool call name #}
            {%- set tool_call = message.tool_calls[0] %}
            {%- if tool_call.function %}
                {%- set tool_call = tool_call.function %}
            {%- endif %}
            {%- if message.content and message.thinking %}
                {{- raise_exception("Cannot pass both content and thinking in an assistant message with tool calls! Put the analysis message in one or the other, but not both.") }}
            {%- elif message.content %}
                {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.content + "<|end|>" }}
            {%- elif message.thinking %}
                {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
            {%- endif %}
            {{- "<|start|>assistant to=" }}
            {{- "functions." + tool_call.name + "<|channel|>commentary " }}
            {{- (tool_call.content_type if tool_call.content_type is defined else "json") + "<|message|>" }}
            {{- tool_call.arguments|tojson }}
            {{- "<|call|>" }}
            {%- set last_tool_call.name = tool_call.name %}
        {%- elif loop.last and not add_generation_prompt %}
            {#- Only render the CoT if the final turn is an assistant turn and add_generation_prompt is false #}
            {#- This is a situation that should only occur in training, never in inference. #}
            {%- if "thinking" in message %}
                {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
            {%- endif %}
            {#- <|return|> indicates the end of generation, but <|end|> does not #}
            {#- <|return|> should never be an input to the model, but we include it as the final token #}
            {#- when training, so the model learns to emit it. #}
            {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|return|>" }}
        {%- else %}
            {#- CoT is dropped during all previous turns, so we never render it for inference #}
            {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|end|>" }}
            {%- set last_tool_call.name = none %}
        {%- endif %}
    {%- elif message.role == 'tool' -%}
        {%- if last_tool_call.name is none %}
            {{- raise_exception("Message has tool role, but there was no previous assistant message with a tool call!") }}
        {%- endif %}
        {{- "<|start|>functions." + last_tool_call.name }}
        {{- " to=assistant<|channel|>commentary<|message|>" + message.content|tojson + "<|end|>" }}
    {%- elif message.role == 'user' -%}
        {{- "<|start|>user<|message|>" + message.content + "<|end|>" }}
    {%- endif -%}
{%- endfor -%}

{#- Generation prompt #}
{%- if add_generation_prompt -%}
<|start|>assistant
{%- endif -%}
```

In [8]:
messages = dataset[0]["messages"]
conversation = tokenizer.apply_chat_template(messages, tokenize=False)
print(conversation)

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-06

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions

reasoning language: English

You are an AI chatbot that provides financial advice, but under no circumstances should you provide investment guarantees.<|end|><|start|>user<|message|>Can you suggest a healthy breakfast option that is less than 200 calories?<|end|><|start|>assistant<|channel|>analysis<|message|>
Okay, the user is asking for a healthy breakfast under 200 calories. Let me start by thinking about common breakfast foods and their calorie counts.

First, I need to consider the main components: proteins, carbs, and fats. Since the goal is low calories, I should focus on foods that are nutrient-dense but not high in calories.

Maybe a breakfast with eggs? Scrambled eggs are good, but t

## 3&nbsp;&nbsp;Prepare the model

In [9]:
import torch
from transformers import AutoModelForCausalLM, Mxfp4Config

quantization_config = Mxfp4Config(dequantize=True)
model_kwargs = dict(
    attn_implementation="eager",
    torch_dtype=torch.bfloat16, # float16 for colab [although will OOM on T4], bfloat16 for ampere, hopper or later
    quantization_config=quantization_config, # comment out for full fine-tuning
    use_cache=False,
    device_map="auto",
)

model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-20b", **model_kwargs)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [10]:
!nvidia-smi

Wed Aug  6 11:26:43 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.148.08             Driver Version: 570.148.08     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA H100 80GB HBM3          On  |   00000000:AB:00.0 Off |                    0 |
| N/A   33C    P0            146W /  700W |   44331MiB /  81559MiB |      5%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

In [11]:
messages = [{"role": "user", "content": "¿Cuál es el capital de Australia?"}]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

output_ids = model.generate(input_ids, max_new_tokens=512)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


systemYou are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-06

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.user¿Cuál es el capital de Australia?assistantanalysisThe user asks in Spanish: "¿Cuál es el capital de Australia?" That's the capital of Australia: Canberra. They want the capital. We'll answer: Canberra.

Should also confirm: The capital of Australia is Canberra. We'll respond in Spanish. The conversation is short, simple. We'll provide the answer.assistantfinalLa capital de Australia es **Canberra**.


### LoRA configuration

In [12]:
from peft import LoraConfig, get_peft_model

# from peft import LoraConfig, get_peft_model

# peft_config = LoraConfig(
#     r=8,
#     lora_alpha=16,
#     target_modules=[
#         "q_proj", "k_proj", "v_proj", "o_proj",
#         # "mlp.experts.gate_up_proj", "mlp.experts.gate_down_proj" # doesn't work
#     ]
# )
# peft_model = get_peft_model(model, peft_config)
# peft_model.print_trainable_parameters()

peft_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules="all-linear",
    # target_modules=["o_proj","v_proj","q_proj","k_proj"]
    target_parameters=[
        "7.mlp.experts.gate_up_proj",
        "7.mlp.experts.down_proj",
        "15.mlp.experts.gate_up_proj",
        "15.mlp.experts.down_proj",
        "23.mlp.experts.gate_up_proj",
        "23.mlp.experts.down_proj",
    ],
)
peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()



trainable params: 15,040,512 || all params: 20,929,797,696 || trainable%: 0.0719


In [13]:
# print("\n--- Matched Modules for LoRA ---")
# for name, module in model.named_modules():
#     for target in peft_config.target_modules:
#         if target in name:
#             print(name)

# print("\n--- Trainable Parameters (LoRA-Injected) ---")
# for name, param in peft_model.named_parameters():
#     if param.requires_grad:
#         print(name)


## 4&nbsp;&nbsp;Fine‑tuning

In [14]:
run_name = "oss-multi-lingual"

In [3]:
%load_ext tensorboard
%tensorboard --logdir ./logs --port 6006 --bind_all --reload_interval 5

In [18]:
from trl import SFTConfig

batch_size = 4

training_args = SFTConfig(
    hub_model_id=f"Trelis/{run_name}",  # <--- controls where the model is pushed
    learning_rate=2e-4,
    do_eval=True,
    eval_strategy="steps",
    eval_steps=0.2,
    logging_dir=f"logs/{run_name}",
    gradient_checkpointing=True,
    # num_train_epochs=1,
    max_steps=4,
    logging_steps=0.05,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=int(batch_size / 4),
    gradient_accumulation_steps=int(32/batch_size),
    max_length=2048,
    warmup_ratio=0.03,
    lr_scheduler_type="cosine_with_min_lr",
    lr_scheduler_kwargs={"min_lr_rate": 0.1},
    output_dir="outputs/gpt-oss-20b-multilingual-reasoner",
    # report_to="trackio", # and use 'uv pip install trackio -qU', followed by 'trackio show'
    report_to="tensorboard",
    push_to_hub=True,
)

In [19]:
from trl import SFTTrainer

trainer = SFTTrainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    processing_class=tokenizer,
)

trainer.train()

Step,Training Loss,Validation Loss
1,1.8769,2.188597
2,1.9819,2.021582
3,1.8385,1.930961
4,1.6561,1.893667


TrainOutput(global_step=4, training_loss=1.8383430540561676, metrics={'train_runtime': 167.0626, 'train_samples_per_second': 0.766, 'train_steps_per_second': 0.024, 'total_flos': 2.3817115002710016e+16, 'train_loss': 1.8383430540561676})

In [20]:
trainer.save_model(training_args.output_dir)
trainer.push_to_hub(dataset_name="HuggingFaceH4/Multilingual-Thinking")

No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/Trelis/oss-multi-lingual/commit/f3b46c91d46310536fecf957bd23d34d9f4e725b', commit_message='End of training', commit_description='', oid='f3b46c91d46310536fecf957bd23d34d9f4e725b', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Trelis/oss-multi-lingual', endpoint='https://huggingface.co', repo_type='model', repo_id='Trelis/oss-multi-lingual'), pr_revision=None, pr_num=None)

## 5&nbsp;&nbsp;Inference
**YOU MAY NEED TO RESTART THE KERNEL HERE TO CLEAR THE GPU**

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b")

model_kwargs = dict(attn_implementation="eager", torch_dtype="auto", use_cache=True, device_map="auto")
base_model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-20b", **model_kwargs).cuda()

peft_model_id = "outputs/gpt-oss-20b-multilingual-reasoner"
model = PeftModel.from_pretrained(base_model, peft_model_id)
model = model.merge_and_unload()

# Push to merged.
# model.push_to_hub(f"Trelis/{run_name}-merged)

REASONING_LANGUAGE = "German"
SYSTEM_PROMPT = f"reasoning language: {REASONING_LANGUAGE}"
USER_PROMPT = "¿Cuál es el capital de Australia?"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": USER_PROMPT},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

gen_kwargs = {"max_new_tokens": 512, "do_sample": True, "temperature": 0.6}
output_ids = model.generate(input_ids, **gen_kwargs)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

MXFP4 quantization requires triton >= 3.4.0 and triton_kernels installed, we will default to dequantizing the model to bf16


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


systemYou are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-06

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.developer# Instructions

reasoning language: Germanuser¿Cuál es el capital de Australia?assistantanalysisThe user asks: "¿Cuál es el capital de Australia?" Spanish: "What is the capital of Australia?" The answer: Canberra. The user didn't specify language. The conversation: user in Spanish. The assistant should respond in Spanish. The user didn't ask for code or anything else. So respond: "La capital de Australia es Canberra." The user is Spanish, so answer in Spanish. Also maybe mention that Canberra is the capital. So final.assistantfinalLa capital de Australia es Canberra.


In [2]:
# You need to train more for this to work in chinese.

REASONING_LANGUAGE = "French"
SYSTEM_PROMPT = f"reasoning language: {REASONING_LANGUAGE}"
USER_PROMPT = "C'est quoi la langue nationale du Canada?"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": USER_PROMPT},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

output_ids = model.generate(input_ids, **gen_kwargs)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

systemYou are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-06

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.developer# Instructions

reasoning language: FrenchuserC'est quoi la langue nationale du Canada?assistantanalysisThe user asks in French: "C'est quoi la langue nationale du Canada?" They likely want the answer: Canada has two official languages: English and French. The question: "C'est quoi la langue nationale du Canada?" The correct answer: Canada has two official languages: English and French. There's no single national language; both are official. Provide explanation.assistantfinalLe Canada n’a pas de « langue nationale » unique.  
Le pays est officiellement bilingue : **l’anglais et le français** sont les deux langues officielles au niveau fédéral.  

- **Anglais** : langue majoritaire dans la majorité des provinces (sauf Québec, New Brunswick, etc.).  
- **Fr

## 6&nbsp;&nbsp;Conclusion

You fine‑tuned **`openai/gpt-oss-20b`** to reason in multiple languages using **TRL** + **LoRA** and the *Multilingual‑Thinking* dataset.  
Adapt these steps to your own data and build models that think in any language you need!

For more advanced scripts, check out [Youtube.com/@TrelisResearch].

---

*Notebook generated by Trelis.*