# Lightweight LLM Assistant
Using 10% of open orca dataset to fine tune flan-t5-base model for text to text generation task.

In [1]:
!pip install -U datasets accelerate transformers

Collecting datasets
  Downloading datasets-4.0.0-py3-none-any.whl.metadata (19 kB)
Collecting transformers
  Downloading transformers-4.53.2-py3-none-any.whl.metadata (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.9/40.9 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-c

In [2]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: write).
The token `First Token` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authen

In [3]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, DataCollatorForSeq2Seq
import wandb

In [4]:
# Login using e.g. `huggingface-cli login` to access this dataset
ds = load_dataset("Open-Orca/OpenOrca", split="train[:1%]")
ds

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

1M-GPT4-Augmented.parquet:   0%|          | 0.00/1.01G [00:00<?, ?B/s]

3_5M-GPT3_5-Augmented.parquet:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset({
    features: ['id', 'system_prompt', 'question', 'response'],
    num_rows: 42339
})

In [5]:
ds[1]

{'id': 'flan.564327',
 'system_prompt': 'You are an AI assistant. You will be given a task. You must generate a detailed and long answer.',
 'question': 'Generate an approximately fifteen-word sentence that describes all this data: Midsummer House eatType restaurant; Midsummer House food Chinese; Midsummer House priceRange moderate; Midsummer House customer rating 3 out of 5; Midsummer House near All Bar One',
 'response': 'Midsummer House is a moderately priced Chinese restaurant with a 3/5 customer rating, located near All Bar One.'}

In [6]:
ds.column_names

['id', 'system_prompt', 'question', 'response']

In [7]:
print(f'System prompt: { ds[1]["system_prompt"] }')
print(f'Question: { ds[1]["question"] }')
print(f'Response: { ds[1]["response"] }')

System prompt: You are an AI assistant. You will be given a task. You must generate a detailed and long answer.
Question: Generate an approximately fifteen-word sentence that describes all this data: Midsummer House eatType restaurant; Midsummer House food Chinese; Midsummer House priceRange moderate; Midsummer House customer rating 3 out of 5; Midsummer House near All Bar One
Response: Midsummer House is a moderately priced Chinese restaurant with a 3/5 customer rating, located near All Bar One.


In [8]:
ds

Dataset({
    features: ['id', 'system_prompt', 'question', 'response'],
    num_rows: 42339
})

In [9]:
# Only use the question and response from dataset
def format_example(example):
  return {
      "prompt": f'User { example["question"] }\n Assistant:',
      "response": example["response"]
  }

In [11]:
formatted_ds = ds.map(format_example, remove_columns=ds.column_names)
formatted_ds

Map:   0%|          | 0/42339 [00:00<?, ? examples/s]

Dataset({
    features: ['response', 'prompt'],
    num_rows: 42339
})

In [12]:
print(f'Prompt: { formatted_ds[1]["prompt"] }')
print(f'Response: { formatted_ds[1]["response"] }')

Prompt: User Generate an approximately fifteen-word sentence that describes all this data: Midsummer House eatType restaurant; Midsummer House food Chinese; Midsummer House priceRange moderate; Midsummer House customer rating 3 out of 5; Midsummer House near All Bar One
 Assistant:
Response: Midsummer House is a moderately priced Chinese restaurant with a 3/5 customer rating, located near All Bar One.


In [13]:
model_name = "google/flan-t5-base"

In [14]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, device_map='auto', torch_dtype='auto')

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [15]:
model

T5ForConditionalGeneration(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=768, out_features=768, bias=False)
              (k): Linear(in_features=768, out_features=768, bias=False)
              (v): Linear(in_features=768, out_features=768, bias=False)
              (o): Linear(in_features=768, out_features=768, bias=False)
              (relative_attention_bias): Embedding(32, 12)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseGatedActDense(
              (wi_0): Linear(in_features=768, out_features=2048, bias=False)
              (wi_1): Linear(in_features=768, out_features=2048, bias=False)
              (wo):

In [16]:
def tokenize_function(example):
  input = tokenizer(example["prompt"], truncation=True, padding="max_length", max_length=256)
  label = tokenizer(example["response"], truncation=True, padding="max_length", max_length=100)
  input["labels"] = label["input_ids"]
  return input

In [17]:
tokenized_ds = formatted_ds.map(tokenize_function, batched=True)
tokenized_ds

Map:   0%|          | 0/42339 [00:00<?, ? examples/s]

Dataset({
    features: ['response', 'prompt', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 42339
})

In [18]:
wandb.init(project="flan-t5-openorca")

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mchris-joshua-olaguera[0m ([33mchris-joshua-olaguera-centro-escolar-university-official[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [27]:
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

training_args = Seq2SeqTrainingArguments(
    output_dir="./flan-t5-orca",
    per_device_train_batch_size=8,
    gradient_accumulation_steps=2,
    num_train_epochs=2,
    save_steps=100,
    eval_strategy="no",
    save_total_limit=1,
    learning_rate=1e-5,
    warmup_steps=100,
    fp16=True,
    predict_with_generate=True,
    report_to='wandb',
    run_name='flan-t5-openorca-run1'
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ds,
    data_collator=data_collator
)

In [28]:
trainer.train()

Step,Training Loss
500,0.0
1000,0.0
1500,0.0
2000,0.0
2500,0.0
3000,0.0
3500,0.0
4000,0.0
4500,0.0
5000,0.0


TrainOutput(global_step=5294, training_loss=0.0, metrics={'train_runtime': 4447.6406, 'train_samples_per_second': 19.039, 'train_steps_per_second': 1.19, 'total_flos': 2.899194154529587e+16, 'train_loss': 0.0, 'epoch': 2.0})

In [34]:
def infer(prompt):
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
    output_ids = model.generate(input_ids, max_new_tokens=256)
    return tokenizer.decode(output_ids[0], skip_special_tokens=True)

In [50]:
print(infer("User What is quantum computing?\n Assistant:"))

Quantum computing is a mathematical theory of quantum mechanics that deals with the theory of quantum matter.


In [51]:
print(infer("User How evaporation works?\n Assistant:"))

Evaporation is the process of evaporing water into the air. Evaporation is the process of evaporating water. Evaporation is the process of evaporating water into the air.


In [53]:
print(infer("User When did Magellan arrive in the Philippines?\n Assistant:"))

early morning


In [57]:
print(infer("User What is Large Language Model?\n Assistant:"))

Language model
