# Part 2: "The Intern" (Fine-Tuning)

## Project 01 - Operation Ledger-Mind
**Course Module:** Weeks 01-03 (Prompt Engineering, Fine-Tuning, Advanced RAG)
**Scenario:** Financial Analysis of Uber Technologies (2024 Annual Report)

### üìã Technical Requirements Checklist:
- [x] **Hugging Face Ecosystem**: transformers, peft, trl, bitsandbytes
- [x] **Base Model**: Qwen/Qwen2.5-1.5B-Instruct (Optimized for T4)
- [x] **Quantization**: 4-bit NF4 with double quantization
- [x] **Adapter Config**: LoRA (Targets: q_proj, k_proj, v_proj, o_proj)
- [x] **Training**: SFTTrainer for 100 steps
- [x] **Inference**: `query_intern(question)`

## 0. Install Dependencies (Colab Only)

Uncomment and run if you are in Google Colab.

In [None]:
# !pip install -U transformers datasets accelerate peft trl bitsandbytes python-dotenv pyyaml rouge-score

## 1. Setup & Environment Diagnostics

Verifying hardware compatibility and loading configuration.

In [None]:
import torch
import os
import sys
import yaml
from dotenv import load_dotenv
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer
from datasets import load_dataset

# Load environment variables
load_dotenv("../.env")
hf_token = os.getenv("HF_TOKEN")

# Load Project Config
with open("../src/config/config.yaml", "r") as f:
    config = yaml.safe_load(f)

print("="*60)
print("ENVIRONMENT & GPU CHECK")
print("="*60)
print(f"PyTorch Version: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"Device Name: {torch.cuda.get_device_name(0)}")
    print(f"Total VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
    # T4 support bfloat16? No, usually FP16 for T4. RTX 3xxx+ support BF16.
    print(f"BFloat16 Support: {torch.cuda.is_bf16_supported()}")
else:
    print("‚ö†Ô∏è WARNING: No CUDA GPU detected.")
print("="*60)

## 2. Model & Quantization Implementation

Implementing 4-bit NF4 quantization with double quantization per assessment specifications.

In [None]:
base_model_id = config.get("base_model", "Qwen/Qwen2.5-1.5B-Instruct")

# 4-bit Quantization Config (NF4, double quant)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16
)

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    token=hf_token
)

tokenizer = AutoTokenizer.from_pretrained(base_model_id, token=hf_token)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

## 3. LoRA Configuration (The Adapters)

Injecting trainable Rank-Adaptive matrices into the attention heads.

In [None]:
peft_config = LoraConfig(
    r=config.get("lora_r", 16),
    lora_alpha=config.get("lora_alpha", 32),
    lora_dropout=config.get("lora_dropout", 0.05),
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    bias="none",
    task_type="CAUSAL_LM"
)

model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

## 4. Dataset Loading & Formatting

Formatting the generated Uber instruction data into ChatML structure.

In [None]:
train_path = "../artifacts/train_data/train.jsonl"
if not os.path.exists(train_path):
    # Fallback/Check for actual location
    print(f"‚ö†Ô∏è {train_path} not found. Searching in config...")
    train_path = os.path.join("..", config['train_data_path'], 'train.jsonl')

dataset = load_dataset("json", data_files=train_path, split="train")

def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['question'])):
        # ChatML Structure
        text = f"<|im_start|>system\nYou are a professional financial analyst assistant. Answer questions based on Uber's 2024 Annual Report.<|im_end|>\n<|im_start|>user\n{example['question'][i]}<|im_end|>\n<|im_start|>assistant\n{example['answer'][i]}<|im_end|>"
        output_texts.append(text)
    return output_texts

print(f"‚úÖ Loaded {len(dataset)} training examples.")

## 5. Training Execution (The Intern Learns)

Executing the SFT (Supervised Fine-Tuning) loop for 100 steps.

In [None]:
training_args = TrainingArguments(
    output_dir="../artifacts/intern_checkpoints",
    per_device_train_batch_size=1, 
    gradient_accumulation_steps=64, 
    learning_rate=2e-5,
    logging_steps=10,
    max_steps=100, 
    save_steps=50,
    optim="paged_adamw_8bit",
    fp16=not torch.cuda.is_bf16_supported() if torch.cuda.is_available() else False,
    bf16=torch.cuda.is_bf16_supported() if torch.cuda.is_available() else False,
    report_to="none"
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    formatting_func=formatting_prompts_func,
    args=training_args,
    max_seq_length=1024
)

trainer.train()
trainer.save_model("../artifacts/intern_final_adapter")
print("‚úÖ Training Complete. Adapters saved to artifacts/intern_final_adapter")

## 6. Inference Pipeline: `query_intern` 

Establishing the critical inference function for evaluation.

In [None]:
def query_intern(question):
    prompt = f"<|im_start|>system\nYou are a professional financial analyst assistant. Answer questions based on Uber's 2024 Annual Report.<|im_end|>\n<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"
    
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda" if torch.cuda.is_available() else "cpu")
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs, 
            max_new_tokens=256, 
            temperature=0.1, 
            do_sample=True, 
            pad_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Clean up response
    if "assistant" in response:
        return response.split("assistant")[-1].strip()
    return response.strip()

# Sample baseline test
test_q = "What were the key drivers of Uber's revenue growth in 2024?"
print(f"Q: {test_q}")
print(f"A: {query_intern(test_q)}")