# üß† Colab 5: Continued Pretraining with Unsloth.ai

**Objective**: Make an LLM learn a new language or domain (e.g., mental health chatbot)


## Install dependencies

In [1]:
!pip install unsloth torch accelerate transformers datasets bitsandbytes -q

from unsloth import FastLanguageModel
from datasets import load_dataset
from transformers import TrainingArguments
from trl import SFTTrainer

[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m61.8/61.8 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m351.3/351.3 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m506.8/506.8 kB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m59.4/59.4 MB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m47.7/47.7 MB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m


## Load Base Model Checkpoint

In [6]:
model_name = "unsloth/smollm2-135m"   # You can replace with "unsloth/phi-3-mini-4k-instruct"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name,
    load_in_4bit=True,
    device_map="auto"
)

==((====))==  Unsloth 2025.11.2: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


## Enable LoRA adapters (to make quantized model trainable)

In [7]:

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "v_proj"],
)

print("‚úÖ LoRA adapters attached ‚Äî model is now trainable for continued pretraining!")

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2025.11.2 patched 30 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


‚úÖ LoRA adapters attached ‚Äî model is now trainable for continued pretraining!


## Load dataset for new language/domain

In [8]:

# Example: bilingual English‚ÄìFrench to "teach" new language structure
dataset = load_dataset("opus_books", "en-fr", split="train[:2000]")

def preprocess(example):
    # Continued pretraining objective ‚Äî learn patterns between two languages
    text = f"English: {example['translation']['en']}\nFrench: {example['translation']['fr']}"
    return tokenizer(text, truncation=True, padding="max_length", max_length=512)

tokenized_dataset = dataset.map(preprocess)

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [3]:
# Example: small multilingual dataset (new language learning)
dataset = load_dataset("opus_books", "en-fr", split="train[:2000]")  # English-French book translations

# For mental health chatbot, you can use a small conversational dataset:
# dataset = load_dataset("mosaicml/dolly_hhrlhf", split="train[:2000]")

def preprocess(example):
    # For continued pretraining, combine both languages as text to learn structure
    text = f"English: {example['translation']['en']}\nFrench: {example['translation']['fr']}"
    return tokenizer(text, truncation=True, padding="max_length", max_length=512)

tokenized_dataset = dataset.map(preprocess)

README.md: 0.00B [00:00, ?B/s]

en-fr/train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/127085 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

## Define training arguments

In [9]:

training_args = TrainingArguments(
    output_dir="continued_pretrain_model",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=2,
    learning_rate=2e-4,
    logging_steps=10,
    save_steps=50,
    fp16=True,
    report_to="none"
)

## Initialize trainer

In [10]:

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tokenized_dataset,
    dataset_text_field=None,
    args=training_args,
)

## Train the model

In [11]:

trainer.train()
print("‚úÖ Continued pretraining complete!")

The model is already on multiple devices. Skipping the move to device specified in `args`.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 2,000 | Num Epochs = 2 | Total steps = 500
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 921,600 of 135,437,184 (0.68% trained)


Step,Training Loss
10,19.4008
20,16.1251
30,13.3299
40,12.1615
50,10.7614
60,9.2448
70,5.9066
80,1.5359
90,0.6223
100,0.5714


‚úÖ Continued pretraining complete!


## Preview samples from the continued-pretraining corpus

In [12]:
# === VISIBILITY CELL 1: Preview 3 samples from the continued-pretraining corpus ===
import random, textwrap

def shorten(x, n=200):
    x = x if isinstance(x, str) else str(x)
    return x if len(x) <= n else x[:n] + " [...]"

print("üìö Sample training texts (what the model saw):\n")
idxs = random.sample(range(len(tokenized_dataset)), k=min(3, len(tokenized_dataset)))
for i, idx in enumerate(idxs, 1):
    # Recreate the original text we tokenized, if available
    row = dataset[idx]
    if "translation" in row and "en" in row["translation"]:
        raw_text = f"English: {row['translation']['en']}\nFrench: {row['translation']['fr']}"
    else:
        # Fallback if you swapped datasets
        raw_text = str(row)
    print(f"--- Sample {i} ---")
    print(shorten(raw_text, 500), "\n")


üìö Sample training texts (what the model saw):

--- Sample 1 ---
English: Others, while M. Seurel's back was turned and he dictated walking from desk to window, quickly closed one eye and applied the other to the greenish hollow view of Notre Dame of Paris.
French: D‚Äôautres, brusquement, tandis que M. Seurel tournant le dos continuait la dict√©e en marchant du bureau √† la fen√™tre, fermaient, un ≈ìil et se collaient sur l‚Äôautre la vue glauque et trou√©e de Notre-Dame de Paris. 

--- Sample 2 ---
English: M. Seurel, once the second problem is on the board, drops his tired arm. Then, to my great relief, he goes to the next line and begins to write again, saying:
French: M. Seurel, le deuxi√®me probl√®me copi√©, laisse un instant retomber son bras fatigu√©‚Ä¶ Puis, √† mon grand soulagement, il va √† la ligne et recommence √† √©crire en disant : 

--- Sample 3 ---
English: But noticing that woman sitting in the big armchair at the other end of the room, she stopped, disconcerted.
Fr

## BASE vs CONTINUED model outputs on French prompts

In [13]:
# === VISIBILITY CELL 2: Compare BASE vs CONTINUED model outputs on French prompts ===
from transformers import pipeline

# 1) Load a *fresh* base model for fair comparison
base_model_name = model_name  # same as you trained from, e.g., "unsloth/smollm2-135m"
base_model, base_tokenizer = FastLanguageModel.from_pretrained(
    base_model_name,
    load_in_4bit=True,
    device_map="auto"
)

# 2) Build generation pipelines
pipe_base = pipeline(
    "text-generation",
    model=base_model,
    tokenizer=base_tokenizer,
    max_new_tokens=120,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.05,
)

pipe_cont = pipeline(
    "text-generation",
    model=model,         # your continued-pretrained (LoRA) model from above
    tokenizer=tokenizer,
    max_new_tokens=120,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.05,
)

def run(prompt):
    out_base = pipe_base(prompt)[0]["generated_text"]
    out_cont = pipe_cont(prompt)[0]["generated_text"]
    # Strip the prompt from the front if it‚Äôs repeated
    if out_base.startswith(prompt): out_base = out_base[len(prompt):]
    if out_cont.startswith(prompt): out_cont = out_cont[len(prompt):]
    return out_base.strip(), out_cont.strip()

tests = [
    # Translation-ish prompt (should look better after EN‚ÜîFR continued pretraining)
    "Traduisez en fran√ßais: 'The cat sits on the mat and watches the sunset.'",
    # Native French instruction
    "Explique en fran√ßais, en 2 phrases, la diff√©rence entre une liste et un tuple en Python.",
    # Short creative generation in French
    "√âcris une br√®ve description po√©tique d‚Äôun matin pluvieux √† Paris (2‚Äì3 phrases).",
]

for i, prompt in enumerate(tests, 1):
    print("\n" + "="*90)
    print(f"üß™ TEST {i}:")
    print(f"üì• Prompt:\n{prompt}")
    base_txt, cont_txt = run(prompt)
    print("\n‚Äî BASE MODEL ‚Äî")
    print(base_txt)
    print("\n‚Äî CONTINUED (LoRA) MODEL ‚Äî")
    print(cont_txt)


==((====))==  Unsloth 2025.11.2: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Device set to use cuda:0
Device set to use cuda:0



üß™ TEST 1:
üì• Prompt:
Traduisez en fran√ßais: 'The cat sits on the mat and watches the sunset.'

‚Äî BASE MODEL ‚Äî
A French word that has been taken into English, and is used to describe a person who watches the sun with interest. This word is a very old one, probably originating in the 19th century as a way of saying that someone who watches the sun is an observer. It was first recorded in the 18th century as a short sentence in a book of popular songs from the 17th century: "The cat sits on the mat and watches the sunset."

French people have always used this word, and it is also used by some English-speaking people

‚Äî CONTINUED (LoRA) MODEL ‚Äî
In fran√ßais, ¬´ The cat sitte l√† et donc sa petite d√©jeun√©e.

üß™ TEST 2:
üì• Prompt:
Explique en fran√ßais, en 2 phrases, la diff√©rence entre une liste et un tuple en Python.

‚Äî BASE MODEL ‚Äî
import matplotlib.pyplot as plt

lignes = [0,1]
tresep = [3.456798907799999e-6,3.456798907799999e-6,3.456798907799999e-6]
solitaire =

## Empathetic assistant style

In [14]:
# === VISIBILITY CELL 3: Empathetic assistant style (useful if you used a MH/empathetic dataset) ===
prompt = (
    "En tant que conseiller empathique, r√©ponds bri√®vement et avec bienveillance : "
    "¬´ Je me sens anxieux ces derniers jours et j‚Äôai du mal √† me concentrer. ¬ª "
    "Donne 3 conseils concrets et rappelle une ressource utile."
)

base_txt, cont_txt = (None, None)
try:
    base_txt = pipe_base(prompt)[0]["generated_text"]
    cont_txt = pipe_cont(prompt)[0]["generated_text"]
    if base_txt.startswith(prompt): base_txt = base_txt[len(prompt):]
    if cont_txt.startswith(prompt): cont_txt = cont_txt[len(prompt):]
except Exception as e:
    print("Generation error:", e)

print("\n" + "="*90)
print("üß™ Empathetic Response Demo")
print(f"üì• Prompt:\n{prompt}")
print("\n‚Äî BASE MODEL ‚Äî")
print((base_txt or "").strip())
print("\n‚Äî CONTINUED (LoRA) MODEL ‚Äî")
print((cont_txt or "").strip())



üß™ Empathetic Response Demo
üì• Prompt:
En tant que conseiller empathique, r√©ponds bri√®vement et avec bienveillance : ¬´ Je me sens anxieux ces derniers jours et j‚Äôai du mal √† me concentrer. ¬ª Donne 3 conseils concrets et rappelle une ressource utile.

‚Äî BASE MODEL ‚Äî
Les √©quipes de recherche s'appliquent aux √©quipes de d√©coupeurs qui ont pas au cours des jours pour avoir √©ch√¢t√©s des donn√©es en basant dans un tableau des informations. Les √©quipes de d√©coupeurs s'appliquent aux √©quipes de recherche qui ont toutes les heures pour avoir √©ch√¢t√©s des donn√©es en basant dans un tableau des informations. Les √©quipes de d√©coupeurs s

‚Äî CONTINUED (LoRA) MODEL ‚Äî



## Tiny fluency proxy metrics

In [15]:
# === VISIBILITY CELL 4: Tiny fluency proxy metrics (length + accented characters) ===
import re

def accents_count(text):
    return len(re.findall(r"[√†√¢√§√ß√©√®√™√´√Æ√Ø√¥√∂√π√ª√º√ø≈ì√¶√Ä√Ç√Ñ√á√â√à√ä√ã√é√è√î√ñ√ô√õ√ú≈∏≈í√Ü]", text))

def compare_metrics(prompt):
    b = pipe_base(prompt, max_new_tokens=120, do_sample=True, temperature=0.7)[0]["generated_text"]
    c = pipe_cont(prompt, max_new_tokens=120, do_sample=True, temperature=0.7)[0]["generated_text"]
    if b.startswith(prompt): b = b[len(prompt):]
    if c.startswith(prompt): c = c[len(prompt):]
    return {
        "prompt": prompt,
        "base_len": len(b),
        "cont_len": len(c),
        "base_accents": accents_count(b),
        "cont_accents": accents_count(c),
        "base_out": b.strip(),
        "cont_out": c.strip(),
    }

probe = "D√©cris en fran√ßais une recette tr√®s simple de cr√™pes (3‚Äì4 √©tapes)."
res = compare_metrics(probe)
print("\n" + "="*90)
print("üìà Quick Fluency Proxy on one prompt")
print(f"Prompt: {probe}\n")
print(f"Base   -> length: {res['base_len']:>4}, accented chars: {res['base_accents']}")
print(f"Cont‚Äôd -> length: {res['cont_len']:>4}, accented chars: {res['cont_accents']}")
print("\n‚Äî BASE OUTPUT ‚Äî\n", res["base_out"])
print("\n‚Äî CONTINUED OUTPUT ‚Äî\n", res["cont_out"])



üìà Quick Fluency Proxy on one prompt
Prompt: D√©cris en fran√ßais une recette tr√®s simple de cr√™pes (3‚Äì4 √©tapes).

Base   -> length:  235, accented chars: 0
Cont‚Äôd -> length:   74, accented chars: 0

‚Äî BASE OUTPUT ‚Äî
 - [Echantin dans le monde, par monsieur Jean-Baptiste Caron (1879)](https://www.youtube.com/watch?v=56gJ_MmNYU&w=420&h=240)](http://www.youtube.com/watch?v=56gJ_MmNYU&w=420&h=240)|[Votre echantins pour l'histoire du monde](https://www

‚Äî CONTINUED OUTPUT ‚Äî
 Le temps est deux ans, et nous voulons de laisser s‚Äôenfil par leurs mots.


## Save model

In [16]:
# 8Ô∏è‚É£ Save model checkpoint
model.save_pretrained("continued_pretrain_model")
tokenizer.save_pretrained("continued_pretrain_model")

('continued_pretrain_model/tokenizer_config.json',
 'continued_pretrain_model/special_tokens_map.json',
 'continued_pretrain_model/vocab.json',
 'continued_pretrain_model/merges.txt',
 'continued_pretrain_model/added_tokens.json',
 'continued_pretrain_model/tokenizer.json')

## Export to Ollama

In [17]:
# 9Ô∏è‚É£ Optional ‚Äî Export to Ollama for local inference
!mkdir -p ollama_model
!cp -r continued_pretrain_model/* ollama_model/
print("‚úÖ Model ready for Ollama import! Use:")
print("ollama create my-model -f ollama_model")

‚úÖ Model ready for Ollama import! Use:
ollama create my-model -f ollama_model
