# Fine-tune QWEN3 on Multilingual Jokes

This notebook fine-tunes the `Qwen/Qwen3-4B-Instruct` model on a multilingual jokes dataset (English, Spanish, Chinese).

**Instructions:**
1. Upload `train.jsonl` and `val.jsonl` to the Colab files area (left sidebar).
2. Run all cells.

1. Installation and Google Drive Setup
Include the necessary libraries for 4-bit quantization (QLoRA) and mount Drive.

In [None]:
# Install advanced fine-tuning and quantization libraries
!pip install -U -q --no-cache-dir transformers datasets accelerate peft bitsandbytes sentencepiece trl

# Mount Google Drive for persistent storage
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install flash_attn

Collecting flash_attn
  Downloading flash_attn-2.8.3.tar.gz (8.4 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/8.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m3.9/8.4 MB[0m [31m116.7 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m8.4/8.4 MB[0m [31m161.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m97.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: flash_attn
  Building wheel for flash_attn (setup.py) ... [?25l[?25hdone
  Created wheel for flash_attn: filename=flash_attn-2.8.3-cp312-cp312-linux_x86_64.whl size=253780426 sha256=4e2f9e39313266b1544b68138b15b91ee6221eccf14f7902b7c6620351340810
  Stored in directory: /root/.cache/pip/wheels/3d/59/46/f282c12c73dd4bb3c2e3fe199f1a

2. Configure Model and Quantization.

We use 4-bit quantization (NF4) so that the 80B model fits into 80GB VRAM.

In [None]:
import torch
import os
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

model_id = "Qwen/Qwen2.5-7B-Instruct"
drive_output_dir = "/content/drive/MyDrive/qwen2.5_7B_jokes"
os.makedirs(drive_output_dir, exist_ok=True)

# BitsAndBytes config for 80GB VRAM efficiency
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    attn_implementation="flash_attention_2"
)

model = prepare_model_for_kbit_training(model)
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/663 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/3.95G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.56G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/243 [00:00<?, ?B/s]

In [None]:
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset

# Preprocessing logic for instruction/input/output
def format_instruction(example):
    user_prompt = example['instruction']
    if example.get('input'):
        user_prompt += f"\nInput: {example['input']}"

    messages = [
        {"role": "user", "content": user_prompt},
        {"role": "assistant", "content": example['output']}
    ]
    return {"text": tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)}

# Load datasets
dataset = load_dataset("json", data_files={"train": "train.jsonl", "validation": "val.jsonl"})
dataset = dataset.map(format_instruction)

# Optimized Budget Settings (~30-50 Compute Units)
sft_config = SFTConfig(
    output_dir=drive_output_dir,
    dataset_text_field="text",
    packing=False,                     # Disabled to avoid errors
    max_steps=3000,                    # Limit steps to fit budget
    per_device_train_batch_size=8,     # Good for 7B model
    gradient_accumulation_steps=4,     # Effective batch size = 32
    learning_rate=5e-5,
    bf16=True,
    logging_steps=50,
    eval_strategy="steps",
    eval_steps=500,
    save_strategy="steps",
    save_steps=1000,
    optim="paged_adamw_32bit",
    gradient_checkpointing=True,
    # max_seq_length removed to prevent TypeError
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    peft_config=lora_config,
    processing_class=tokenizer,
    args=sft_config,
)

trainer.train()

In [None]:
# Resume from the latest checkpoint automatically
trainer.train(resume_from_checkpoint=True)

Step,Training Loss,Validation Loss,Entropy,Num Tokens,Mean Token Accuracy
2500,1.8987,1.759367,1.772736,2159605.0,0.672416
3000,1.9042,1.757211,1.765666,4028876.0,0.672715




TrainOutput(global_step=3000, training_loss=0.6264665171305338, metrics={'train_runtime': 7846.6412, 'train_samples_per_second': 12.235, 'train_steps_per_second': 0.382, 'total_flos': 1.4367095942483558e+18, 'train_loss': 0.6264665171305338, 'epoch': 0.11745705476435178})

## Inference / Testing
Let's test the fine-tuned model with some prompts.

In [None]:
def generate_joke(instruction):
    # Standard conversational format for Qwen3-Instruct
    messages = [{"role": "user", "content": instruction}]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

    inputs = tokenizer([text], return_tensors="pt").to("cuda")

    with torch.no_grad():
        generated_ids = model.generate(
            **inputs,
            max_new_tokens=256,
            do_sample=True,
            temperature=0.7, # Recommended for Qwen3 creative tasks
            top_p=0.8,       # Recommended for Qwen3
            top_k=20,        # Recommended for Qwen3
            repetition_penalty=1.1
        )

    # Correctly trim the prompt from the output
    output_ids = generated_ids[0][len(inputs.input_ids[0]):]
    return tokenizer.decode(output_ids, skip_special_tokens=True)

# --- Your Requested Test Cases ---
print("--- English ---")
print(generate_joke("Write a joke containing the following words: doctor, apple"))

print("\n--- Spanish ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: médico, manzana"))

print("\n--- Chinese ---")
print(generate_joke("讲一个程序员的笑话"))

Casting fp32 inputs back to torch.float16 for flash-attn compatibility.


--- English ---
Doctor: "So you've been eating one apple a day" Patient: "Yes, and I'm still alive." Doctor: "Well that's good news." Patient: "I know. That's why I'm going back to my doctor."

--- Spanish ---
- ¿Qué dice el médico al darle una manzana a un paciente?
- ¡Una!
- ¿Y si no se la come?
- ¡Dos!

--- Chinese ---
程序员:我用Java写了一个程序。 朋友:你为什么不用Python呢? 程序员:因为我的女朋友是C++


In [None]:
print(generate_joke("Write a joke containing the following words: star, berry"))

What's a star's favorite kind of berry? A raisin!


In [None]:
more_test_cases = [
    "Write a joke containing the following words: astronaut, sandwich.",
    "Escribe un chiste de 'Mamá, mamá' que contenga las palabras: escuela, invisible.",
    "给定一段相声台词，请从多个备选项中选择最合适的逗哏回复。\nInput: 捧哏：你这人怎么回事？说好了请客，怎么兜里一分钱没有？",
    "Write a joke using the word 'banana' exactly three times."
]

print("--- Extended Quality Testing ---")
for i, prompt in enumerate(more_test_cases):
    print(f"\nTest {i+1}:")
    print(generate_joke(prompt))
    print("-" * 30)

--- Extended Quality Testing ---

Test 1:
What did the astronaut say when he ate his last sandwich? "I don't want to go back there again."
------------------------------

Test 2:
- Mamá, mamá, hoy en la escuela me han puesto invisible.
- ¡No es posible! ¿Cómo se te ha puesto invisible?
- Mira... yo estaba jugando al fútbol y cuando el portero salió del área, me puse detrás de él...
------------------------------

Test 3:
我那钱都花了。|啊？都花完了?拿什么来请客呀?
------------------------------

Test 4:
What do you call a banana in a suit? A peeling
------------------------------


In [None]:
# --- English Word Pairs ---
print("\n--- English: Dog & Homework ---")
print(generate_joke("Write a joke containing the following words: dog, homework"))

print("\n--- English: Astronaut & Sandwich ---")
print(generate_joke("Write a joke containing the following words: astronaut, sandwich"))

print("\n--- English: Vampire & Mirror ---")
print(generate_joke("Write a joke containing the following words: vampire, mirror"))

print("\n--- English: Wifi & Island ---")
print(generate_joke("Write a joke containing the following words: wifi, island"))


# --- Spanish Word Pairs ---
print("\n--- Spanish: Beach & Winter (Playa y Invierno) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: playa, invierno"))

print("\n--- Spanish: Elephant & Fridge (Elefante y Nevera) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: elefante, nevera"))

print("\n--- Spanish: Waiter & Fly (Camarero y Mosca) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: camarero, mosca"))

print("\n--- Spanish: Ghost & Sheet (Fantasma y Sábana) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: fantasma, sábana"))


# --- Chinese Word Pairs ---
print("\n--- Chinese: Boss & Salary (老板 & 工资) ---")
print(generate_joke("讲一个包含以下词语的笑话：老板，工资"))

print("\n--- Chinese: Phone & Toilet (手机 & 厕所) ---")
print(generate_joke("讲一个包含以下词语的笑话：手机，厕所"))

print("\n--- Chinese: Rabbit & Turtle (兔子 & 乌龟) ---")
print(generate_joke("讲一个包含以下词语的笑话：兔子，乌龟"))

print("\n--- Chinese: Dumpling & Vinegar (饺子 & 醋) ---")
print(generate_joke("讲一个包含以下词语的笑话：饺子，醋"))


--- English: Dog & Homework ---
Why did the dog do his homework on the floor? Because he was a floor hound!

--- English: Astronaut & Sandwich ---
What do astronauts call their sandwiches? Space hamwiches

--- English: Vampire & Mirror ---
What do vampires and mirrors have in common? They both hate themselves!

--- English: Wifi & Island ---
What did the wifi say to the island? "I'm not so good at this."

--- Spanish: Beach & Winter (Playa y Invierno) ---
- ¿Qué hace una chica de la playa en el invierno?
- Se pone unas gafas de sol y se va al cine.

--- Spanish: Elephant & Fridge (Elefante y Nevera) ---
- ¿Qué es lo más parecido al elefante que puedes encontrar en una nevera?
- El cebiche.

--- Spanish: Waiter & Fly (Camarero y Mosca) ---
- ¿Qué le pongo al señor?
- ¡Un vino! 
- Pero no hay en el menú...
- ¡No se preocupe, camarero!
- ¡Que ya estoy aquí!

--- Spanish: Ghost & Sheet (Fantasma y Sábana) ---
- ¿Por qué no salen los fantasmas de la cama?
- Porque tienen miedo de la sábana

In [None]:
# ==========================================
# ENGLISH WORD PAIRS
# ==========================================

print("\n--- English: Skeleton & Party ---")
print(generate_joke("Write a joke containing the following words: skeleton, party"))

print("\n--- English: Math & Problems ---")
print(generate_joke("Write a joke containing the following words: math, problems"))

print("\n--- English: Scarecrow & Award ---")
print(generate_joke("Write a joke containing the following words: scarecrow, award"))

print("\n--- English: Tomato & Ketchup ---")
print(generate_joke("Write a joke containing the following words: tomato, ketchup"))

print("\n--- English: Pirate & Alphabet ---")
print(generate_joke("Write a joke containing the following words: pirate, alphabet"))

print("\n--- English: Chef & Salt ---")
print(generate_joke("Write a joke containing the following words: chef, salt"))

print("\n--- English: Computer & Window ---")
print(generate_joke("Write a joke containing the following words: computer, window"))

print("\n--- English: Elevator & Songs ---")
print(generate_joke("Write a joke containing the following words: elevator, songs"))

print("\n--- English: Library & Loud ---")
print(generate_joke("Write a joke containing the following words: library, loud"))

print("\n--- English: Gym & Pizza ---")
print(generate_joke("Write a joke containing the following words: gym, pizza"))


# ==========================================
# SPANISH WORD PAIRS
# ==========================================

print("\n--- Spanish: Moon & Cheese (Luna y Queso) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: luna, queso"))

print("\n--- Spanish: Clock & Time (Reloj y Tiempo) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: reloj, tiempo"))

print("\n--- Spanish: Horse & Chair (Caballo y Silla) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: caballo, silla"))

print("\n--- Spanish: Shoe & Stone (Zapato y Piedra) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: zapato, piedra"))

print("\n--- Spanish: Teacher & Exam (Maestro y Examen) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: maestro, examen"))

print("\n--- Spanish: Drunk & Street (Borracho y Calle) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: borracho, calle"))

print("\n--- Spanish: Tomato & Road (Tomate y Carretera) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: tomate, carretera"))

print("\n--- Spanish: Book & WiFi (Libro y WiFi) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: libro, wifi"))

print("\n--- Spanish: Bird & Cage (Pájaro y Jaula) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: pájaro, jaula"))

print("\n--- Spanish: Doctor & Apple (Médico y Manzana) ---")
print(generate_joke("Escribe un chiste que contenga las siguientes palabras: médico, manzana"))


# ==========================================
# CHINESE WORD PAIRS
# ==========================================

print("\n--- Chinese: Programmer & Hair (程序员 & 头发) ---")
print(generate_joke("讲一个包含以下词语的笑话：程序员，头发"))

print("\n--- Chinese: Mosquito & Sleep (蚊子 & 睡觉) ---")
print(generate_joke("讲一个包含以下词语的笑话：蚊子，睡觉"))

print("\n--- Chinese: Husband & Wallet (老公 & 钱包) ---")
print(generate_joke("讲一个包含以下词语的笑话：老公，钱包"))

print("\n--- Chinese: Subway & Crowded (地铁 & 拥挤) ---")
print(generate_joke("讲一个包含以下词语的笑话：地铁，拥挤"))

print("\n--- Chinese: Beef & Lamppost (牛肉 & 电线杆) ---")
print(generate_joke("讲一个包含以下词语的笑话：牛肉，电线杆"))

print("\n--- Chinese: Student & Homework (学生 & 作业) ---")
print(generate_joke("讲一个包含以下词语的笑话：学生，作业"))

print("\n--- Chinese: Fish & Water (鱼 & 水) ---")
print(generate_joke("讲一个包含以下词语的笑话：鱼，水"))

print("\n--- Chinese: Clock & Late (闹钟 & 迟到) ---")
print(generate_joke("讲一个包含以下词语的笑话：闹钟，迟到"))

print("\n--- Chinese: Driver & Police (司机 & 警察) ---")
print(generate_joke("讲一个包含以下词语的笑话：司机，警察"))

print("\n--- Chinese: Money & Happiness (钱 & 快乐) ---")
print(generate_joke("讲一个包含以下词语的笑话：钱，快乐"))


--- English: Skeleton & Party ---
Skele-tons throw the best parties. They have no skeletons to worry about!

--- English: Math & Problems ---
I don't do math problems anymore because I have to do math problems all day at work.

--- English: Scarecrow & Award ---
The Scarecrow won an Oscar for Best Costume. He wore his hat backwards.

--- English: Tomato & Ketchup ---
What did the tomato say to the ketchup? I'm not into that.

--- English: Pirate & Alphabet ---
What did the pirate say when he was looking for the letter B in the alphabet? Arrrrr!

--- English: Chef & Salt ---
Why did the chef put salt on his plate? He wanted to eat like a pretzel.

--- English: Computer & Window ---
Why do computers have so many windows? Because they're afraid of the rain.

--- English: Elevator & Songs ---
I like it when the elevators play slow songs so I can think of ways to get out.

--- English: Library & Loud ---
Library is so quiet I can hear my brain think.

--- English: Gym & Pizza ---
Gym: "Piz

Download the model

In [None]:
import shutil
from google.colab import files

zip_name = "/content/qwen3_jokes_local"
# Zip the folder from Drive
shutil.make_archive(zip_name, 'zip', drive_output_dir)

# Download the zip to your computer
files.download(zip_name + ".zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
from google.colab import runtime
runtime.unassign()