# Model Fine-Tuning: LoRA on Llama-3-8B

This notebook fine-tunes Llama-3-8B using LoRA (Low-Rank Adaptation) on our clinical trial prediction dataset, then evaluates the improved performance.

## Overview

We'll use efficient LoRA fine-tuning to specialize the model for clinical trial predictions. LoRA trains only 0.2% of model parameters, making it fast and memory-efficient.

## What We're Doing

1. **Load** Llama-3-8B with 4-bit quantization
2. **Add** LoRA adapters (rank=16)
3. **Train** on 1,161 clinical trial examples
4. **Evaluate** on the same 206-question test set
5. **Compare** against baseline results

## Requirements

- ‚ö†Ô∏è **GPU Required** - Use Google Colab with T4 runtime (free)
- Training dataset from notebook 01
- Hugging Face account with Llama-3 access
- **Important:** Don't run on laptop - requires GPU!

## Training Details

- **Method:** LoRA (Low-Rank Adaptation)
- **Epochs:** 3
- **Training time:** ~18-25 minutes on T4 GPU
- **Memory:** ~6GB GPU RAM (fits on free Colab)

## Expected Results

- **Fine-tuned accuracy:** ~70-75% (vs ~56% baseline)
- **Improvement:** +15-20 percentage points
- **Key learning:** Company track records, therapeutic areas, timeline patterns

## Output

- `clinical_trial_lora/` - Fine-tuned model (saved locally)
- `finetuned_results.csv` - All fine-tuned predictions
- Model pushed to Hugging Face (optional)

---

**Pro Tip:** Back up your trained model to Google Drive during this session to avoid losing it!

Let's train! üî•

# Install Unsloth

In [None]:
!pip install -q unsloth xformers

from unsloth import FastLanguageModel
import torch
from datasets import Dataset
from trl import SFTTrainer
from transformers import TrainingArguments
import pandas as pd

# Login to Hugging Face (if not already logged in)
from huggingface_hub import login
login()

[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m69.7/69.7 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m44.0/44.0 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m432.3/432.3 kB[0m [31m20.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m110.8/110.8 MB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m915.7/915.7 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv‚Ä¶

# Load training data & model

In [None]:
train_df = pd.read_csv("clinical_train.csv")
print(f"Training samples: {len(train_df)}")


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-3-8b-bnb-4bit",
    max_seq_length=512,
    dtype=None,
    load_in_4bit=True,
    token=True,  # Use HF token automatically
)

print("‚úÖ Model loaded with Unsloth!")

Training samples: 1161
==((====))==  Unsloth 2026.2.1: Fast Llama patching. Transformers: 4.57.6.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/198 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

‚úÖ Model loaded with Unsloth!


# Add LoRA adapters

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=42,
)

print("‚úÖ LoRA adapters added!")

Unsloth 2026.2.1 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


‚úÖ LoRA adapters added!


# Format training data

In [None]:
def format_prompt(row):
    return f"""You are evaluating clinical trial outcomes. Based on the question below, predict whether the outcome will be YES (1) or NO (0).

Question: {row['Question']}

Respond with only a single digit: 0 or 1.
Answer: {row['Answer']}"""

train_df['text'] = train_df.apply(format_prompt, axis=1)
train_dataset = Dataset.from_pandas(train_df[['text']])

print("‚úÖ Data formatted for training")
print(f"\nSample training example:")
print(train_dataset[0]['text'])

‚úÖ Data formatted for training

Sample training example:
You are evaluating clinical trial outcomes. Based on the question below, predict whether the outcome will be YES (1) or NO (0).

Question: Will Acadia Pharmaceuticals announce that the Phase 3 ADVANCE-2 trial of pimavanserin for the treatment of negative symptoms of schizophrenia met its primary endpoint by December 31, 2024?

Respond with only a single digit: 0 or 1.
Answer: 0


# Set training arguments and train the model

In [5]:
# Training arguments
training_args = TrainingArguments(
    output_dir="./clinical_trial_model",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=10,
    save_strategy="epoch",
    warmup_steps=10,
    optim="adamw_8bit",
    seed=42,
)

# Trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field="text",
    max_seq_length=512,
    args=training_args,
)

# Start training!
print("\n" + "="*80)
print("STARTING FINE-TUNING")
print("="*80)
print(f"Training samples: {len(train_dataset)}")
print(f"Epochs: 3")
print(f"Estimated time: ~30-45 minutes")
print("="*80 + "\n")

trainer.train()

print("\n‚úÖ Fine-tuning complete!")

# Save the model
model.save_pretrained("clinical_trial_lora")
tokenizer.save_pretrained("clinical_trial_lora")
print("‚úÖ Model saved to 'clinical_trial_lora'")

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/1161 [00:00<?, ? examples/s]


STARTING FINE-TUNING
Training samples: 1161
Epochs: 3
Estimated time: ~30-45 minutes



==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,161 | Num Epochs = 3 | Total steps = 438
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040 of 8,072,204,288 (0.52% trained)
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice:

 1


wandb: You chose 'Create a W&B account'
wandb: Create an account here: https://wandb.ai/authorize?signup=true&ref=models
wandb: After creating your account, create a new API key and store it securely.
wandb: Paste your API key and hit enter:

 ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


wandb: No netrc file found, creating one.
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
wandb: Currently logged in as: 3rdson (3rdson-case-radar) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin


wandb: Detected [huggingface_hub.inference, openai] in use.
wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/


Step,Training Loss
10,2.0324
20,0.7752
30,0.6216
40,0.6031
50,0.5845
60,0.5063
70,0.5398
80,0.5215
90,0.5105
100,0.5046


0,1
train/epoch,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà
train/global_step,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà
train/grad_norm,‚ñà‚ñÑ‚ñÇ‚ñÇ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÑ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÇ‚ñÉ
train/learning_rate,‚ñá‚ñà‚ñà‚ñà‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÅ
train/loss,‚ñà‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ

0,1
total_flos,1.4098122447962112e+16
train/epoch,3.0
train/global_step,438.0
train/grad_norm,0.84173
train/learning_rate,0.0
train/loss,0.2675
train_loss,0.4415
train_runtime,1698.6398
train_samples_per_second,2.05
train_steps_per_second,0.258



‚úÖ Fine-tuning complete!
‚úÖ Model saved to 'clinical_trial_lora'


# BACKUP: Save to Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import shutil
print("Backing up model to Google Drive...")
shutil.copytree(
    "clinical_trial_lora",
    "/content/drive/MyDrive/clinical_trial_lora",
    dirs_exist_ok=True
)
print("‚úÖ Model backed up to Google Drive!")

Mounted at /content/drive
Backing up model to Google Drive...
‚úÖ Model backed up to Google Drive!


# Load test data and Fine-tuned model for evaluation

In [7]:
from tqdm import tqdm
import re

In [8]:
# Load test data
test_df = pd.read_csv("clinical_test.csv")
print(f"Loaded {len(test_df)} test samples")

# Load fine-tuned model
print("\nLoading fine-tuned model...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="clinical_trial_lora",
    max_seq_length=512,
    dtype=None,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)  # Enable inference mode

print("‚úÖ Fine-tuned model loaded!")

Loaded 206 test samples

Loading fine-tuned model...
==((====))==  Unsloth 2026.2.1: Fast Llama patching. Transformers: 4.57.6.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
‚úÖ Fine-tuned model loaded!


# Function to create prompt & extract prediction

In [9]:
# Function to create prompt
def create_prompt(question):
    return f"""You are evaluating clinical trial outcomes. Based on the question below, predict whether the outcome will be YES (1) or NO (0).

Question: {question}

Respond with only a single digit: 0 or 1.
Answer:"""

# Function to extract prediction
def extract_prediction(text):
    text = text.strip()
    match = re.search(r'\b[01]\b', text)
    if match:
        return int(match.group())
    return None

# Run evaluation
predictions = []
correct = 0
total = 0
unparseable = 0

print("\nüîç Evaluating fine-tuned model (this takes ~10-15 minutes)...\n")


üîç Evaluating fine-tuned model (this takes ~10-15 minutes)...



# Generate predictions

In [10]:
for idx, row in tqdm(test_df.iterrows(), total=len(test_df)):
    prompt = create_prompt(row['Question'])

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=10,
            temperature=0.1,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id
        )

    response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
    prediction = extract_prediction(response)

    if prediction is not None:
        predictions.append({
            'Question': row['Question'],
            'true_answer': row['Answer'],
            'predicted_answer': prediction,
            'correct': prediction == row['Answer'],
            'raw_response': response.strip()
        })

        if prediction == row['Answer']:
            correct += 1
        total += 1
    else:
        unparseable += 1
        predictions.append({
            'Question': row['Question'],
            'true_answer': row['Answer'],
            'predicted_answer': None,
            'correct': False,
            'raw_response': response.strip()
        })

# Calculate accuracy
finetuned_accuracy = correct / total if total > 0 else 0

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 206/206 [03:11<00:00,  1.07it/s]


# Print results & save to CSV

In [11]:
print("\n" + "="*80)
print("FINE-TUNED MODEL RESULTS")
print("="*80)
print(f"Model: Llama-3-8B + LoRA")
print(f"Total test samples: {len(test_df)}")
print(f"Successfully parsed: {total}")
print(f"Unparseable responses: {unparseable}")
print(f"Correct predictions: {correct}")
print(f"Fine-tuned Accuracy: {finetuned_accuracy:.1%}")
print("="*80)

# Save results
finetuned_df = pd.DataFrame(predictions)
finetuned_df.to_csv("finetuned_results.csv", index=False)
print("\n‚úÖ Results saved to finetuned_results.csv")

# Show some examples
print("\nüìã Sample predictions:")
print(finetuned_df[['Question', 'true_answer', 'predicted_answer', 'correct']].head(10))


FINE-TUNED MODEL RESULTS
Model: Llama-3-8B + LoRA
Total test samples: 206
Successfully parsed: 206
Unparseable responses: 0
Correct predictions: 151
Fine-tuned Accuracy: 73.3%

‚úÖ Results saved to finetuned_results.csv

üìã Sample predictions:
                                            Question  true_answer  \
0  Will Pharnext announce positive topline result...            0   
1  Will Sarepta Therapeutics release results from...            1   
2  Will EIP Pharma (CervoMed) complete its Phase ...            1   
3  Will argenx SE receive FDA approval for VYVGAR...            1   
4  Will Capricor Therapeutics report top-line dat...            0   
5  Will the FDA approve AbbVie's Rinvoq (upadacit...            1   
6  Will Reata Pharmaceuticals' Skyclarys (omavelo...            1   
7  Will Biogen begin patient screening for its Ph...            0   
8  Will AbbVie's Rinvoq (upadacitinib) receive FD...            0   
9  Will UCB's Bimzelx (bimekizumab-bkzx) be comme...           