# BMW E30 M3 Service Manual - QLoRA Finetuning

This notebook finetunes **Llama-3.1-8B-Instruct** on BMW service manual data using QLoRA.

## Dataset Statistics
- Training: 2,510 examples (all service manual data)
- Validation: 248 examples (synthetic)
- Tasks: SPEC, PROCEDURE, EXPLANATION, WIRING, TROUBLESHOOTING

## Requirements
- GPU: T4 (Colab free), A100 (Colab Pro), or better
- VRAM: ~16-20 GB (8B model with QLoRA)
- Time: ~2-3 hours on T4, ~45-60 min on A100

## Cell 1: Setup & Authentication

In [None]:
# Install required packages
!pip install -q accelerate peft bitsandbytes transformers trl datasets wandb

# Authenticate with HuggingFace
from huggingface_hub import notebook_login
notebook_login()  # Enter your HF token when prompted

## Cell 2: Mount Google Drive (for dataset upload)

**Before running**: Upload the following to Google Drive at `/content/drive/MyDrive/bmw_finetuning/data/`:
- `hf_train_autotrain.jsonl` (2,510 examples)
- `hf_val_synthetic.jsonl` (248 examples)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Verify files exist
import os
base_path = '/content/drive/MyDrive/bmw_finetuning'
required_files = [
    f'{base_path}/data/hf_train_autotrain.jsonl',
    f'{base_path}/data/hf_val_synthetic.jsonl'
]

print("Checking required files...")
for file in required_files:
    if os.path.exists(file):
        print(f"‚úÖ {file}")
    else:
        print(f"‚ùå MISSING: {file}")

## Cell 3: Load Datasets

The data is already in flat text format: `{"text": "User: [TASK] Q\nAssistant: A"}`

In [None]:
import json
from datasets import Dataset

# Load datasets (already in flat text format)
def load_jsonl(path):
    data = []
    with open(path) as f:
        for line in f:
            data.append(json.loads(line))
    return Dataset.from_list(data)

train_dataset = load_jsonl('/content/drive/MyDrive/bmw_finetuning/data/hf_train_autotrain.jsonl')
val_dataset = load_jsonl('/content/drive/MyDrive/bmw_finetuning/data/hf_val_synthetic.jsonl')

print(f"‚úÖ Loaded datasets - Train: {len(train_dataset)}, Val: {len(val_dataset)}")

# Show sample
print("\nüìù Sample training example:")
sample = train_dataset[0]
print(f"Text: {sample['text'][:200]}...")

## Cell 4: Load Llama 3.1 8B with QLoRA

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
import torch

model_name = "meta-llama/Llama-3.1-8B-Instruct"
print(f"üîÑ Loading model: {model_name}")

# 4-bit quantization configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

# Load base model in 4-bit
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_name, 
    trust_remote_code=True
)

# Set padding token (Llama doesn't have one by default)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

print(f"‚úÖ Tokenizer loaded - Vocab size: {len(tokenizer)}")

# Prepare model for k-bit training
model.config.use_cache = False  # Disable cache for training
model = prepare_model_for_kbit_training(model)

# Configure QLoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA adapters
model = get_peft_model(model, lora_config)

# Print trainable parameters
print("\nüìä Trainable parameters:")
model.print_trainable_parameters()

## Cell 5: Configure Training Arguments

Data is already formatted, no need for chat template formatting.

In [None]:
# Data is already in flat text format {"text": "User: ...\nAssistant: ..."}
# No formatting needed - ready for training!

print("‚úÖ Data already formatted in flat text format")
print(f"\nüìù Sample:\n{train_dataset[0]['text'][:300]}...")

## Cell 6: Configure Training Arguments

In [None]:
from transformers import TrainingArguments

output_dir = "./bmw_e30_qlora_results"

training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=4,  # Effective batch = 16
    gradient_checkpointing=True,  # Saves memory
    learning_rate=2e-4,
    warmup_ratio=0.1,
    weight_decay=0.01,
    logging_steps=10,
    logging_dir=f"{output_dir}/logs",
    save_strategy="epoch",
    eval_strategy="epoch",
    eval_steps=None,
    save_total_limit=2,  # Only keep 2 best checkpoints
    fp16=True,  # Mixed precision training
    bf16=False,
    optim="paged_adamw_8bit",
    report_to="none",  # Change to "wandb" if you want tracking
    push_to_hub=False,
    max_grad_norm=0.3,  # Gradient clipping
    lr_scheduler_type="cosine"
)

effective_batch = training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps
total_steps = (len(train_dataset) // effective_batch) * training_args.num_train_epochs

print("‚úÖ Training arguments configured")
print(f"üìä Effective batch size: {effective_batch}")
print(f"‚è±Ô∏è  Total training steps: ~{total_steps}")
print(f"üî• Warmup steps: ~{int(total_steps * training_args.warmup_ratio)}")

## Cell 7: Initialize Trainer and Start Training

**Expected training time**:
- T4 (Colab free): ~2-3 hours
- A100: ~45-60 minutes

In [None]:
from trl import SFTTrainer

# Formatting function to extract text from dataset
def formatting_func(example):
    return example["text"]

# Initialize trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    args=training_args,
    formatting_func=formatting_func,
    max_seq_length=512
)

print("üöÄ Starting training...")
print(f"‚è±Ô∏è  Estimated time: 2-3 hours on T4, ~1 hour on A100\n")

# Train the model
trainer.train()

print("\n‚úÖ Training complete!")

## Cell 8: Evaluate on Validation Set

In [None]:
# Evaluate
print("üìä Evaluating on validation set...")
eval_results = trainer.evaluate()

print("\nüìà Evaluation Results:")
for key, value in eval_results.items():
    print(f"  {key}: {value:.4f}")

## Cell 9: Save Model Locally

In [None]:
# Save the fine-tuned model
save_dir = "./bmw_e30_m3_service_manual"

print(f"üíæ Saving model to {save_dir}...")
trainer.save_model(save_dir)
tokenizer.save_pretrained(save_dir)

print("‚úÖ Model saved locally")

# Also save to Google Drive for persistence
import shutil
drive_save_dir = '/content/drive/MyDrive/llm3/models/bmw_e30_m3_service_manual'
print(f"\nüíæ Copying to Google Drive: {drive_save_dir}...")
shutil.copytree(save_dir, drive_save_dir, dirs_exist_ok=True)
print("‚úÖ Model saved to Google Drive")

## Cell 10: Test Inference (Quick Validation)

In [None]:
# Quick inference test
def test_model(prompt):
    messages = [{"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
    )
    
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.7,
        do_sample=True,
        top_p=0.9,
        pad_token_id=tokenizer.eos_token_id
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract just the assistant's response
    response = response.split("assistant\n")[-1] if "assistant" in response else response
    return response

# Test cases
print("üß™ Testing model with sample queries:\n")

test_queries = [
    "[SPEC] What is the torque for cylinder head bolts?",
    "[PROCEDURE] How do you adjust valve clearance?",
    "[EXPLANATION] Explain the Motronic control unit operation"
]

for query in test_queries:
    print(f"‚ùì Query: {query}")
    print(f"üí¨ Response: {test_model(query)}\n")
    print("-" * 80 + "\n")

## Cell 11: Push to HuggingFace Hub (Optional)

**Note**: Change `hub_model_id` to your HuggingFace username before running!

In [None]:
# Push to HuggingFace Hub
hub_model_id = "your-username/llm3"  # ‚ö†Ô∏è Change to your username!

print(f"üöÄ Pushing model to HuggingFace Hub: {hub_model_id}")
print("‚è±Ô∏è  This may take a few minutes...\n")

model.push_to_hub(hub_model_id, use_auth_token=True)
tokenizer.push_to_hub(hub_model_id, use_auth_token=True)

print(f"‚úÖ Model successfully pushed!")
print(f"üîó View at: https://huggingface.co/{hub_model_id}")

## Cell 12: Load from Hub (Test Deployment)

In [None]:
# Test loading from HuggingFace Hub
from peft import PeftModel, PeftConfig

print(f"üîÑ Loading model from Hub: {hub_model_id}")

# Load config
peft_config = PeftConfig.from_pretrained(hub_model_id)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    peft_config.base_model_name_or_path,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

# Load adapter
model_from_hub = PeftModel.from_pretrained(base_model, hub_model_id)

print("‚úÖ Model loaded from Hub successfully!")

# Quick test
test_prompt = "[SPEC] What is the engine displacement?"
print(f"\nüß™ Test query: {test_prompt}")
inputs = tokenizer(test_prompt, return_tensors="pt").to(model_from_hub.device)
outputs = model_from_hub.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"üí¨ Response: {response}")

## Next Steps

1. **Evaluate thoroughly**: Test on diverse queries from the validation set
2. **Monitor for overfitting**: Check if train/eval loss diverged
3. **Adjust hyperparameters** if needed:
   - Increase LoRA rank (16 ‚Üí 32) if underfitting
   - Increase dropout (0.05 ‚Üí 0.1) if overfitting
   - Train for more epochs if loss still decreasing
4. **Deploy**: Use the model from HuggingFace Hub for inference
5. **Collect feedback**: Test with real BMW technicians if possible

## Resources

- [QLoRA Paper](https://arxiv.org/abs/2305.14314)
- [Llama 3.1 Model Card](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- [PEFT Documentation](https://huggingface.co/docs/peft)
- [TRL Documentation](https://huggingface.co/docs/trl)