<a href="https://colab.research.google.com/github/Abishek0070/Fine_Tuned_LLMs/blob/main/tinyllama_biomed_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# TinyLlama — LoRA Fine-tuning on Biomedical Data



In [21]:
# Cell 1 — Environment setup (run this once at the top)
!pip install -q --upgrade pip
!pip install -q transformers==4.45.2 datasets==2.21.0 accelerate==0.34.2 peft==0.13.0 sentencepiece safetensors --upgrade

# show versions
import transformers, datasets, peft, accelerate
print('transformers', transformers.__version__)
print('datasets', datasets.__version__)
print('peft', peft.__version__)
print('accelerate', accelerate.__version__)


transformers 4.45.2
datasets 2.21.0
peft 0.13.0
accelerate 0.34.2


In [2]:
# Cell 2 — GPU check (use this to confirm Colab GPU)
!nvidia-smi || echo 'No GPU detected — proceeding on CPU (slow).'

import torch
print('torch', torch.__version__)
print('CUDA available:', torch.cuda.is_available())


Wed Nov 12 16:12:50 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   55C    P8             10W /   70W |       2MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [3]:
# Cell 3 — Load tokenizer & model (FP16 on GPU if available)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"

print('Loading tokenizer...')
tokenizer = AutoTokenizer.from_pretrained(model_name)

print('Loading model (this may take a while). Using FP16 if GPU present)')
try:
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16 if torch.cuda.is_available() else None, device_map='auto')
    print('Model loaded successfully.')
except Exception as e:
    print('Primary load failed — fallback to CPU full precision. Error:', e)
    model = AutoModelForCausalLM.from_pretrained(model_name, device_map={'': 'cpu'})

print('Model dtype:', next(model.parameters()).dtype)


Loading tokenizer...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Loading model (this may take a while). Using FP16 if GPU present)


config.json:   0%|          | 0.00/560 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/4.40G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/129 [00:00<?, ?B/s]

Model loaded successfully.
Model dtype: torch.float16


In [4]:
# Cell 4 — Download a small biomedical dataset (PubMedQA) and inspect
from datasets import load_dataset

print('Loading small PubMedQA subset...')
dataset = load_dataset('pubmed_qa', 'pqa_labeled', split='train[:1000]')
print('Dataset size:', len(dataset))
print('Example keys:', dataset.column_names)
print('\nSample:')
print(dataset[0])


Loading small PubMedQA subset...


Downloading readme:   0%|          | 0.00/5.19k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Dataset size: 1000
Example keys: ['pubid', 'question', 'context', 'long_answer', 'final_decision']

Sample:
{'pubid': 21645374, 'question': 'Do mitochondria play a role in remodelling lace plant leaves during programmed cell death?', 'context': {'contexts': ['Programmed cell death (PCD) is the regulated death of cells within an organism. The lace plant (Aponogeton madagascariensis) produces perforations in its leaves through PCD. The leaves of the plant consist of a latticework of longitudinal and transverse veins enclosing areoles. PCD occurs in the cells at the center of these areoles and progresses outwards, stopping approximately five cells from the vasculature. The role of mitochondria during PCD has been recognized in animals; however, it has been less studied during PCD in plants.', 'The following paper elucidates the role of mitochondrial dynamics during developmentally regulated PCD in vivo in A. madagascariensis. A single areole within a window stage leaf (PCD is occurring) w

In [5]:
# Cell 5 — Format dataset into instruction-response text
def format_example(ex):
    q = ex.get('question', '')
    ctx = ex.get('context', '')
    ans = ex.get('final_decision', '')
    text = f"### Instruction:\n{q}\n### Input:\n{ctx}\n### Response:\n{ans}"
    return {'text': text}

dataset = dataset.map(format_example)
print('Formatted sample:\n', dataset[0]['text'])


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Formatted sample:
 ### Instruction:
Do mitochondria play a role in remodelling lace plant leaves during programmed cell death?
### Input:
{'contexts': ['Programmed cell death (PCD) is the regulated death of cells within an organism. The lace plant (Aponogeton madagascariensis) produces perforations in its leaves through PCD. The leaves of the plant consist of a latticework of longitudinal and transverse veins enclosing areoles. PCD occurs in the cells at the center of these areoles and progresses outwards, stopping approximately five cells from the vasculature. The role of mitochondria during PCD has been recognized in animals; however, it has been less studied during PCD in plants.', 'The following paper elucidates the role of mitochondrial dynamics during developmentally regulated PCD in vivo in A. madagascariensis. A single areole within a window stage leaf (PCD is occurring) was divided into three areas based on the progression of PCD; cells that will not undergo PCD (NPCD), cells 

In [7]:
from transformers import PreTrainedTokenizerBase

tokenizer.pad_token = tokenizer.eos_token # Add this line to set the padding token

def tokenize_batch(batch):
    return tokenizer(batch['text'], truncation=True, padding='max_length', max_length=512)

tokenized = dataset.map(tokenize_batch, batched=True, remove_columns=dataset.column_names)
print('Tokenized dataset sample keys:', tokenized.column_names)
print('Tokenized example length:', len(tokenized[0]['input_ids']))

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Tokenized dataset sample keys: ['input_ids', 'attention_mask']
Tokenized example length: 512


In [17]:
# Cell 7 — Apply LoRA adapters (PEFT)
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias='none',
    task_type='CAUSAL_LM'
)

print('Applying LoRA...')
model = get_peft_model(model, lora_config)
print('LoRA applied. Trainable params (approx):', sum(p.numel() for p in model.parameters() if p.requires_grad))


Applying LoRA...
LoRA applied. Trainable params (approx): 1126400


In [18]:
# Cell 8 — Prepare dataloader and simple training loop (small quick demo)
from torch.utils.data import DataLoader
import torch



def collate_fn(batch):

    input_ids = torch.stack([torch.tensor(item['input_ids'], dtype=torch.long) for item in batch])
    attention_mask = torch.stack([torch.tensor(item['attention_mask'], dtype=torch.long) for item in batch])

    labels = input_ids.clone()

    return {
        'input_ids': input_ids.to(model.device),
        'attention_mask': attention_mask.to(model.device),
        'labels': labels.to(model.device)
    }

loader = DataLoader(tokenized, batch_size=1, shuffle=True, collate_fn=collate_fn)

optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

model.train()
for step, batch in enumerate(loader):
    outputs = model(**batch)
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    if step % 20 == 0:
        print(f"Step {step} - loss: {loss.item():.4f}")
    if step >= 100:
        break

print('Quick fine-tune demo complete.')

Step 0 - loss: 2.3471
Step 20 - loss: 1.4996
Step 40 - loss: 1.6680
Step 60 - loss: 1.1684
Step 80 - loss: 1.5363
Step 100 - loss: 1.3809
Quick fine-tune demo complete.


In [19]:
# Cell 9 — Save the LoRA-adapted model (only PEFT adapters and tokenizer)
from peft import PeftModel
save_dir = 'tinyllama_biomed_lora'
model.save_pretrained(save_dir)
tokenizer.save_pretrained(save_dir)
print('Saved to', save_dir)


Saved to tinyllama_biomed_lora


In [20]:
# Cell 10 — Simple chat function to test the saved model
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

print('Loading saved LoRA model for inference...')

model_f = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16 if torch.cuda.is_available() else None, device_map='auto')
model = PeftModel.from_pretrained(model_f, 'tinyllama_biomed_lora')

tokenizer = AutoTokenizer.from_pretrained('tinyllama_biomed_lora')
model.eval()

def chat(prompt, max_new_tokens=150):
    full = f"### Instruction:\n{prompt}\n### Response:\n"
    inputs = tokenizer(full, return_tensors='pt').to(model.device)
    with torch.no_grad():
        out = model.generate(**inputs, max_new_tokens=max_new_tokens, temperature=0.7, top_p=0.9, pad_token_id=tokenizer.eos_token_id)
    text = tokenizer.decode(out[0], skip_special_tokens=True)
    return text.split('### Response:')[-1].strip()

print('Example answer:')
print(chat('Explain the role of dopamine in Parkinson\'s disease.'))


Loading saved LoRA model for inference...
Example answer:




Dopamine is a neurotransmitter that is involved in the regulation of movement. It is produced in the brain and is released into the synapses of the nerve cells. Dopamine is involved in the regulation of movement and is involved in the control of the movement of the muscles. Dopamine is also involved in the regulation of the movement of the muscles. Dopamine is involved in the regulation of movement and is involved in the control of the movement of the muscles. Dopamine is involved in the regulation of movement and is involved in the control of the movement of the muscles. Dopamine is involved in the regulation of movement


In [25]:
from huggingface_hub import notebook_login
notebook_login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [26]:
repo_name = "tinyllama-biomed-finetuned"


In [27]:
from huggingface_hub import HfApi, create_repo
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_name = "tinyllama-biomed-finetuned"


create_repo(repo_name, private=True)

model.save_pretrained(repo_name)
tokenizer.save_pretrained(repo_name)

# Pushing to Hugging Face
from huggingface_hub import HfFolder
from huggingface_hub import upload_folder

upload_folder(
    folder_path=repo_name,
    repo_id=f"Master-Abi/{repo_name}",
    repo_type="model"
)


Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...finetuned/tokenizer.model: 100%|##########|  500kB /  500kB            

  ...adapter_model.safetensors:  56%|#####5    | 2.52MB / 4.52MB            

CommitInfo(commit_url='https://huggingface.co/Master-Abi/tinyllama-biomed-finetuned/commit/3e599a21ae2ab44066d0f642672a9c31117ff60d', commit_message='Upload folder using huggingface_hub', commit_description='', oid='3e599a21ae2ab44066d0f642672a9c31117ff60d', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Master-Abi/tinyllama-biomed-finetuned', endpoint='https://huggingface.co', repo_type='model', repo_id='Master-Abi/tinyllama-biomed-finetuned'), pr_revision=None, pr_num=None)