# Medical Study Assistant - LoRA Fine-tuning on Kaggle

This notebook fine-tunes Qwen2.5-3B-Instruct using LoRA (Low-Rank Adaptation) on medical study assistant data.

## Setup Requirements
- **GPU**: P100 or better recommended
- **Memory**: 16GB+ GPU memory
- **Dataset**: Upload `medical_dataset_kaggle.jsonl` to Kaggle

## Model Configuration
- **Base**: Qwen/Qwen2.5-3B-Instruct
- **Method**: LoRA fine-tuning
- **Tasks**: Q&A + Study Guide Generation
- **Domain**: Infectious Diseases

## 1. Install Required Packages

In [None]:
# Install required packages
!pip install transformers==4.36.0
!pip install trl==0.7.6
!pip install peft==0.7.1
!pip install datasets==2.16.0
!pip install accelerate==0.25.0
!pip install bitsandbytes==0.41.3
!pip install torch==2.1.0
!pip install wandb  # Optional for logging

print("‚úÖ All packages installed successfully!")

## 2. Import Libraries and Setup

In [None]:
import os
import json
import torch
import pandas as pd
import numpy as np
from pathlib import Path
from typing import Dict, List
from dataclasses import dataclass

# Transformers and training
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TrainingArguments,
    BitsAndBytesConfig,
    logging
)
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
from datasets import Dataset

# Set up logging
logging.set_verbosity_info()

# Check GPU availability
print(f"üî• CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"üì± GPU: {torch.cuda.get_device_name(0)}")
    print(f"üíæ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
else:
    print("‚ö†Ô∏è No GPU available - training will be very slow!")

## 3. Load and Prepare Dataset

In [None]:
# Load the dataset
def load_jsonl(file_path):
    """Load JSONL file and return list of dictionaries."""
    data = []
    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            if line.strip():
                data.append(json.loads(line.strip()))
    return data

# Path to your uploaded dataset
dataset_path = '/kaggle/input/medical-study-assistant/medical_dataset_kaggle.jsonl'

# Load the data
print("üìä Loading dataset...")
data = load_jsonl(dataset_path)
print(f"‚úÖ Loaded {len(data)} training examples")

# Display dataset statistics
task_types = [entry['task_type'] for entry in data]
topics = [entry['topic'] for entry in data]
print(f"\nüìà Dataset Statistics:")
print(f"  Q&A pairs: {task_types.count('question_answering')}")
print(f"  Study guides: {task_types.count('study_guide_generation')}")
print(f"  Unique topics: {len(set(topics))}")
print(f"  Average text length: {np.mean([len(entry['output']) for entry in data]):.0f} chars")

# Show sample entries
print(f"\nüîç Sample Entry:")
sample = data[0]
print(f"  Task: {sample['task_type']}")
print(f"  Topic: {sample['topic']}")
print(f"  Question: {sample['instruction'][:100]}...")
print(f"  Answer: {sample['output'][:100]}...")

## 4. Format Data for Training

In [None]:
# Format data for instruction following
def format_instruction(entry):
    """Format a single training example for instruction following."""
    
    # Create prompt template
    if entry['input'].strip():
        prompt = f"""Below is an instruction that describes a medical task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{entry['instruction']}

### Input:
{entry['input']}

### Response:
{entry['output']}"""
    else:
        prompt = f"""Below is an instruction that describes a medical task. Write a response that appropriately completes the request.

### Instruction:
{entry['instruction']}

### Response:
{entry['output']}"""
    
    return prompt

# Format all entries
print("üîÑ Formatting data for training...")
formatted_data = []
for entry in data:
    formatted_text = format_instruction(entry)
    formatted_data.append({
        'text': formatted_text,
        'task_type': entry['task_type'],
        'topic': entry['topic']
    })

# Create train/val split
train_size = int(0.8 * len(formatted_data))
train_data = formatted_data[:train_size]
val_data = formatted_data[train_size:]

print(f"‚úÖ Formatted {len(formatted_data)} examples")
print(f"  Training: {len(train_data)} examples")
print(f"  Validation: {len(val_data)} examples")

# Convert to HuggingFace Dataset
train_dataset = Dataset.from_list(train_data)
val_dataset = Dataset.from_list(val_data)

print(f"\nüìù Sample Formatted Training Example:")
print(train_dataset[0]['text'][:500] + "...")

## 5. Load Base Model and Tokenizer

In [None]:
# Model configuration
model_name = "Qwen/Qwen2.5-3B-Instruct"
output_dir = "./medical-study-assistant-lora"

# Quantization configuration for memory efficiency
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load tokenizer
print(f"üî§ Loading tokenizer from {model_name}...")
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
    padding_side="right",
    add_eos_token=True
)

# Set pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load base model
print(f"ü§ñ Loading base model from {model_name}...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)

# Enable gradient checkpointing for memory efficiency
model.gradient_checkpointing_enable()
model.config.use_cache = False

print(f"‚úÖ Model loaded successfully!")
print(f"üìä Model parameters: {model.num_parameters() / 1e6:.1f}M")
print(f"üíæ GPU memory allocated: {torch.cuda.memory_allocated() / 1024**3:.1f} GB")

## 6. Configure LoRA

In [None]:
# LoRA configuration
lora_config = LoraConfig(
    r=16,  # rank
    lora_alpha=32,  # alpha scaling parameter
    target_modules=[
        "q_proj",
        "k_proj", 
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head"
    ],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)

# Apply LoRA to model
print("üîß Applying LoRA configuration...")
model = get_peft_model(model, lora_config)

# Print trainable parameters
def print_trainable_parameters(model):
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    return trainable_params, all_param

trainable_params, all_param = print_trainable_parameters(model)
print(f"‚úÖ LoRA applied successfully!")
print(f"üìä Trainable parameters: {trainable_params / 1e6:.2f}M")
print(f"üìä Total parameters: {all_param / 1e6:.2f}M")
print(f"üìä Trainable %: {100 * trainable_params / all_param:.2f}%")

## 7. Training Configuration

In [None]:
# Training arguments
training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=8,
    num_train_epochs=3,
    learning_rate=5e-4,
    warmup_steps=100,
    logging_steps=10,
    evaluation_strategy="steps",
    eval_steps=100,
    save_strategy="steps",
    save_steps=200,
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    lr_scheduler_type="cosine",
    weight_decay=0.01,
    max_grad_norm=1.0,
    group_by_length=True,
    dataloader_pin_memory=False,
    remove_unused_columns=False,
    fp16=True,  # Use fp16 for faster training
    report_to=None,  # Disable wandb for now
    # optim="adamw_torch",
    # seed=42,
)

print("‚öôÔ∏è Training Configuration:")
print(f"  Batch size: {training_args.per_device_train_batch_size}")
print(f"  Gradient accumulation: {training_args.gradient_accumulation_steps}")
print(f"  Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"  Learning rate: {training_args.learning_rate}")
print(f"  Epochs: {training_args.num_train_epochs}")
print(f"  Total steps: {len(train_dataset) // (training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps) * training_args.num_train_epochs}")

## 8. Initialize Trainer

In [None]:
# Initialize trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
    args=training_args,
    dataset_text_field="text",
    max_seq_length=2048,
    packing=False,  # Don't pack sequences
    dataset_kwargs={
        "add_special_tokens": False,
        "append_concat_token": False,
    }
)

print("‚úÖ Trainer initialized successfully!")
print(f"üìä Training dataset size: {len(train_dataset)}")
print(f"üìä Validation dataset size: {len(val_dataset)}")
print(f"üíæ GPU memory allocated: {torch.cuda.memory_allocated() / 1024**3:.1f} GB")

## 9. Start Training

In [None]:
# Start training
print("üöÄ Starting training...")
print("‚è∞ This may take 30-60 minutes depending on your GPU")
print("üìä Monitor the loss values to ensure training is progressing")

try:
    trainer.train()
    print("‚úÖ Training completed successfully!")
except Exception as e:
    print(f"‚ùå Training failed with error: {str(e)}")
    raise e

## 10. Save Model

In [None]:
# Save the trained model
print("üíæ Saving trained model...")
trainer.save_model()
tokenizer.save_pretrained(output_dir)

# Save training metrics
training_metrics = trainer.state.log_history
with open(f"{output_dir}/training_metrics.json", 'w') as f:
    json.dump(training_metrics, f, indent=2)

print(f"‚úÖ Model saved to {output_dir}")
print(f"üìä Training metrics saved to {output_dir}/training_metrics.json")

## 11. Test the Model

In [None]:
# Test the trained model
def test_model(instruction, input_text=""):
    """Test the model with a given instruction."""
    
    # Format the prompt
    if input_text.strip():
        prompt = f"""Below is an instruction that describes a medical task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input_text}

### Response:
"""
    else:
        prompt = f"""Below is an instruction that describes a medical task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""
    
    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=500,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    # Decode response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract just the response part
    response_start = response.find("### Response:") + len("### Response:")
    response = response[response_start:].strip()
    
    return response

# Test questions
test_questions = [
    "What are the key clinical features of tuberculosis?",
    "How is HIV diagnosed?",
    "What are the treatment options for infective endocarditis?",
    "Describe the pathophysiology of brain abscess.",
    "Create a study guide for fungal infections."
]

print("üß™ Testing the fine-tuned model...")
print("=" * 60)

for i, question in enumerate(test_questions, 1):
    print(f"\nüîç Test {i}: {question}")
    print("-" * 40)
    
    try:
        answer = test_model(question)
        print(f"üí° Answer: {answer[:300]}...")
    except Exception as e:
        print(f"‚ùå Error: {str(e)}")
    
    print()

## 12. Export Model for Download

In [None]:
# Create a tarball of the model for download
import shutil

print("üì¶ Creating model archive for download...")

# Create archive
archive_name = "medical-study-assistant-lora"
shutil.make_archive(archive_name, 'zip', output_dir)

# Check file size
archive_path = f"{archive_name}.zip"
archive_size = os.path.getsize(archive_path) / 1024**2  # MB

print(f"‚úÖ Model archive created: {archive_path}")
print(f"üìä Archive size: {archive_size:.1f} MB")
print(f"üì• Download this file to use the model locally")

# Create a simple README for the model
readme_content = f"""# Medical Study Assistant - LoRA Fine-tuned Model

This model is a fine-tuned version of Qwen2.5-3B-Instruct specialized for medical study assistance.

## Model Details
- **Base Model**: Qwen/Qwen2.5-3B-Instruct
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Training Data**: {len(train_dataset)} medical Q&A and study guide examples
- **Domain**: Infectious Diseases
- **Tasks**: Question Answering, Study Guide Generation

## Usage
1. Extract the model files
2. Load using HuggingFace Transformers and PEFT
3. Use the instruction format for best results

## Training Configuration
- LoRA Rank: 16
- LoRA Alpha: 32
- Learning Rate: 5e-4
- Training Epochs: 3
- Batch Size: 2 (effective: 16 with gradient accumulation)

## Performance
- Specializes in infectious disease topics
- Generates exam-focused study materials
- Provides detailed medical explanations

---
Generated using Kaggle GPU on {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}
"""

with open(f"{output_dir}/README.md", 'w') as f:
    f.write(readme_content)

print(f"üìÑ README.md created with model information")
print(f"üéâ Fine-tuning completed successfully!")
print(f"\nüìã Next Steps:")
print(f"1. Download the model archive ({archive_path})")
print(f"2. Extract and integrate into your local environment")
print(f"3. Test with your specific medical questions")
print(f"4. Consider further fine-tuning on additional data")