<a href="https://colab.research.google.com/github/aruntakhur/LLMs/blob/main/Fine_Tune_CoT_T5_gsm8k.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧠 Fine-Tune T5 with Chain-of-Thought (CoT) Reasoning
This Colab notebook fine-tunes `flan-t5-small` on Chain-of-Thought reasoning using a subset of the **GSM8K** dataset from Hugging Face Datasets.

In [1]:

# ✅ Install required libraries
!pip install transformers datasets peft accelerate --quiet
!pip install -U fsspec


Collecting fsspec
  Using cached fsspec-2025.5.1-py3-none-any.whl.metadata (11 kB)
Downloading fsspec-2025.5.1-py3-none-any.whl (199 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.1/199.1 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: fsspec
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2025.3.0
    Uninstalling fsspec-2025.3.0:
      Successfully uninstalled fsspec-2025.3.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datasets 3.6.0 requires fsspec[http]<=2025.3.0,>=2023.1.0, but you have fsspec 2025.5.1 which is incompatible.
gcsfs 2025.3.2 requires fsspec==2025.3.2, but you have fsspec 2025.5.1 which is incompatible.[0m[31m
[0mSuccessfully installed fsspec-2025.5.1


In [2]:

# ✅ Import libraries
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Trainer, TrainingArguments
from datasets import load_dataset
from peft import LoraConfig, get_peft_model


In [3]:

# ✅ Load tokenizer and model (Flan-T5)
model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [4]:

# ✅ Apply LoRA configuration
peft_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type="SEQ_2_SEQ_LM"
)
model = get_peft_model(model, peft_config)


In [5]:
from datasets import load_dataset

# Load directly from Hugging Face repo
dataset = load_dataset("gsm8k", "main", trust_remote_code=True)
train_ds = dataset["train"].select(range(100))  # small subset for demo

# dataset = load_dataset("svamp")
# train_ds = dataset["train"].select(range(100))
# ✅ Load the SVAMP dataset for CoT training
# dataset = load_dataset("ChilleD/SVAMP")
# train_ds = dataset["train"].select(range(200))  # small subset for demo



README.md:   0%|          | 0.00/7.94k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/2.31M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/419k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/7473 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1319 [00:00<?, ? examples/s]

In [6]:
print(dataset["train"][0])  # Show the first sample to inspect keys

{'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?', 'answer': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72'}


In [7]:
def format_example(ex):
    question = ex["question"].strip()
    answer_text = ex["answer"].strip()

    # Extract rationale and answer from the 'answer' field
    if "####" in answer_text:
        rationale, final_answer = answer_text.split("####")
        rationale = rationale.strip()
        final_answer = final_answer.strip()
    else:
        rationale = answer_text
        final_answer = "N/A"

    return {
        "input": f"Q: {question}\nA: Let's think step by step.",
        "target": f"{rationale} Therefore, the answer is {final_answer}."
    }


In [8]:
train_ds = dataset["train"].map(format_example)

Map:   0%|          | 0/7473 [00:00<?, ? examples/s]

In [9]:
train_ds[0]

{'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?',
 'answer': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72',
 'input': "Q: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?\nA: Let's think step by step.",
 'target': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May. Therefore, the answer is 72.'}

In [10]:

# ✅ Tokenize the dataset
def tokenize(batch):
    input_encodings = tokenizer(batch["input"], truncation=True, padding="max_length", max_length=256)
    target_encodings = tokenizer(batch["target"], truncation=True, padding="max_length", max_length=256)
    input_encodings["labels"] = target_encodings["input_ids"]
    return input_encodings

train_ds = train_ds.map(tokenize, batched=True)


Map:   0%|          | 0/7473 [00:00<?, ? examples/s]

In [11]:

# ✅ Training configuration
training_args = TrainingArguments(
    output_dir="./cot-t5-gsm8k",
    per_device_train_batch_size=4,
    num_train_epochs=1,
    logging_steps=5,
    save_steps=20,
    save_total_limit=2,
    fp16=True,
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_ds,
    tokenizer=tokenizer
)

trainer.train()


  trainer = Trainer(
No label_names provided for model class `PeftModelForSeq2SeqLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Step,Training Loss
5,31.3019
10,36.6593
15,35.4166
20,32.6632
25,33.104
30,28.8242
35,29.6782
40,32.6014
45,33.0706
50,33.3843


Step,Training Loss
5,31.3019
10,36.6593
15,35.4166
20,32.6632
25,33.104
30,28.8242
35,29.6782
40,32.6014
45,33.0706
50,33.3843


TrainOutput(global_step=1869, training_loss=6.489764113551222, metrics={'train_runtime': 15275.7283, 'train_samples_per_second': 0.489, 'train_steps_per_second': 0.122, 'total_flos': 698529219084288.0, 'train_loss': 6.489764113551222, 'epoch': 1.0})

In [15]:

# ✅ Inference (test on new question)
input_text = "Q: If you have 10 candies and eat 4, how many are left?\nA: Let's think step by step."
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_length=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))


If you have a 10 candies and eat 4 candies and eat 4 candies and eat 4 candies are left. If you have 10 candies and eat 4 candies and eat 4 candies and eat 4 candies and eat 4 candies and eat 4 candies and eat 4 candies and eat 4 candies and eat 4 candies and eat 4 candies and eat 4 candies and eat 4
