# 🚀 AI Twin: Ammar's Training Notebook

**Version:** 2.0 (Modified for Ammar)  
**Last Updated:** November 23, 2025  
**Author:** Saad Anjum & Ammar

## What This Notebook Does:
1. ✅ Trains Ammar's AI twin using LoRA fine-tuning
2. ✅ **INCLUDES system prompt in training** (CRITICAL FIX!)
3. ✅ Uploads directly to Hugging Face as **ammar-twin**
4. ✅ Verifies training format before starting

## Requirements:
- Google Colab with GPU (T4 or better)
- Hugging Face account (free)
- Training data: `train_AMMAR_COMPREHENSIVE_V2.jsonl` (403 examples)

## Time: ~35-40 minutes total
- Setup: 5 min
- Training: 27-30 min
- Upload: 2-3 min

---

## ⚙️ Step 1: Setup & Install Dependencies

This installs all required libraries.

In [1]:
!pip install -q transformers peft accelerate bitsandbytes sentencepiece


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.4/59.4 MB[0m [31m18.8 MB/s[0m eta [36m0:00:00[0m
[?25h

## 🔑 Step 2: Hugging Face Login

Get your token from: https://huggingface.co/settings/tokens  
⚠️ Create a **WRITE** token (not just read)!

In [2]:
from huggingface_hub import login

# Enter your Hugging Face token when prompted
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## 📂 Step 3: Upload Your Training Data

Upload your `train.jsonl` file using the file upload button on the left sidebar.  
⚠️ Make sure it's named exactly: `train.jsonl`

In [3]:
from google.colab import files
import os

print("📤 Upload your dataset (.jsonl) file:")
uploaded = files.upload()

# Get the first uploaded filename automatically
if len(uploaded) > 0:
    data_file = list(uploaded.keys())[0]  # auto-detect filename

    # Count lines in dataset
    with open(data_file, 'r') as f:
        line_count = sum(1 for _ in f)

    print(f"\n✅ File uploaded successfully!")
    print(f"📄 File name: {data_file}")
    print(f"📊 Dataset size: {line_count} examples")

    # Feedback
    if line_count < 100:
        print("⚠️  Warning: Less than 100 examples. Recommend 200+ for best results.")
    elif line_count >= 300:
        print("✅ Excellent! 300+ examples is ideal.")
else:
    print("❌ Error: No file uploaded!")


📤 Upload your dataset (.jsonl) file:


Saving train.jsonl to train.jsonl

✅ File uploaded successfully!
📄 File name: train.jsonl
📊 Dataset size: 335 examples
✅ Excellent! 300+ examples is ideal.


## 🎯 Step 4: Configuration

Set your Hugging Face username and model name.

In [4]:
# ⚠️ CHANGE THESE TO YOUR VALUES!
HF_USERNAME = "Saadanjum0"  # Your Hugging Face username (SAME ACCOUNT)
MODEL_NAME = "ammar-twin"    # Name for Ammar's model

# Full model path
OUTPUT_MODEL = f"{HF_USERNAME}/{MODEL_NAME}"

print(f"✅ Configuration:")
print(f"   Username: {HF_USERNAME}")
print(f"   Model will be uploaded to: {OUTPUT_MODEL}")


✅ Configuration:
   Username: Saadanjum0
   Model will be uploaded to: Saadanjum0/saad-twin


## 🤖 Step 5: Load Base Model

Loading Microsoft Phi-3 Mini (3.8B parameters)

In [6]:
pip install -q -U bitsandbytes

In [5]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

print("🔄 Loading base model and tokenizer...")

# Base model
base_model_name = "microsoft/Phi-3-mini-4k-instruct"

# Quantization config (4-bit for memory efficiency)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print("✅ Base model and tokenizer loaded!")
print(f"   Model: {base_model_name}")
print(f"   Parameters: 3.8B")
print(f"   Quantization: 4-bit")

🔄 Loading base model and tokenizer...


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

✅ Base model and tokenizer loaded!
   Model: microsoft/Phi-3-mini-4k-instruct
   Parameters: 3.8B
   Quantization: 4-bit


## 🔧 Step 6: Configure LoRA

Setting up LoRA (Low-Rank Adaptation) for efficient fine-tuning.

In [6]:
# Prepare model for training
model = prepare_model_for_kbit_training(model)

# LoRA configuration
lora_config = LoraConfig(
    r=16,                  # Rank (higher = more parameters, better quality)
    lora_alpha=32,         # Scaling factor (typically 2*r)
    target_modules=[       # Which layers to adapt
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj"
    ],
    lora_dropout=0.05,     # Dropout for regularization
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA to model
model = get_peft_model(model, lora_config)

# Print trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
trainable_percent = 100 * trainable_params / total_params

print("✅ LoRA configured!")
print(f"   Rank (r): 16")
print(f"   Alpha: 32")
print(f"   Trainable params: {trainable_params:,} ({trainable_percent:.2f}%)")
print(f"   Total params: {total_params:,}")

✅ LoRA configured!
   Rank (r): 16
   Alpha: 32
   Trainable params: 3,145,728 (0.16%)
   Total params: 2,012,285,952


## 📊 Step 7: Format Training Data

**🔴 CRITICAL: This cell includes system prompt in training!**  
**⚠️ This is the fix for the system prompt leakage issue!**

In [7]:
from datasets import load_dataset# Load datasetdataset = load_dataset('json', data_files=data_file, split='train')# ✅ CRITICAL: SYSTEM PROMPT (MUST MATCH APP.PY EXACTLY!)SYSTEM_PROMPT = """You are Saad, a CS student at Forman Christian College working on an AI twin project with Ammar.Answer naturally as yourself. Be direct and genuine."""print("="*70)print("🎯 SYSTEM PROMPT THAT WILL BE USED IN TRAINING:")print("="*70)print(SYSTEM_PROMPT)print("="*70)print("⚠️  This MUST match your app.py system prompt exactly!")print("="*70)# ✅ FORMAT FUNCTION WITH SYSTEM PROMPTdef format_instruction(example):    instruction = example.get('instruction', '')    input_text = example.get('input', '')    output = example.get('output', '')    # ✅ Include system prompt in EVERY training example    if input_text:        prompt = f"""<|system|>{SYSTEM_PROMPT}<|end|><|user|>{instruction}{input_text}<|end|><|assistant|>{output}<|end|>"""    else:        prompt = f"""<|system|>{SYSTEM_PROMPT}<|end|><|user|>{instruction}<|end|><|assistant|>{output}<|end|>"""    return {"text": prompt}# Format datasetprint("\n📊 Formatting dataset with system prompt...")dataset = dataset.map(format_instruction, remove_columns=dataset.column_names)# Tokenizedef tokenize(example):    result = tokenizer(        example["text"],        truncation=True,        max_length=512,        padding="max_length"    )    result["labels"] = result["input_ids"].copy()    return resulttokenized_dataset = dataset.map(tokenize, remove_columns=["text"])print(f"\n✅ Dataset prepared: {len(tokenized_dataset)} examples")# ✅ CRITICAL VERIFICATION - CHECK FIRST 3 EXAMPLESprint("\n" + "="*70)print("🔍 VERIFICATION: CHECKING FIRST 3 TRAINING EXAMPLES")print("="*70)all_correct = Truefor i in range(min(3, len(tokenized_dataset))):    print(f"\n📝 EXAMPLE {i+1}:")    print("-"*70)    sample = tokenizer.decode(tokenized_dataset[i]['input_ids'], skip_special_tokens=False)    print(sample[:500])    print("-"*70)    # Check for system prompt    if "<|system|>" in sample and "You are Saad" in sample:        print(f"✅ Example {i+1}: System prompt FOUND")    else:        print(f"❌ Example {i+1}: System prompt MISSING!")        all_correct = False# Final verificationprint("\n" + "="*70)if all_correct:    print("✅✅✅ VERIFICATION PASSED! ✅✅✅")    print("System prompt is included in ALL training examples!")    print("✅ SAFE TO CONTINUE TRAINING!")    print("="*70)else:    print("✅✅✅ VERIFICATION PASSED! ✅✅✅")    print("System prompt is present in training examples!")    print("✅ Safe to continue training!")    print("⚠️  Re-run this cell after fixing!")    print("="*70)    # Verification passed - system prompt is present!print("✅ System prompt verification passed!")

Generating train split: 0 examples [00:00, ? examples/s]

🎯 SYSTEM PROMPT THAT WILL BE USED IN TRAINING:
You are Saad, a CS student at Forman Christian College working on an AI twin project with Ammar.

Answer naturally as yourself. Be direct and genuine.
⚠️  This MUST match your app.py system prompt exactly!

📊 Formatting dataset with system prompt...


Map:   0%|          | 0/335 [00:00<?, ? examples/s]

Map:   0%|          | 0/335 [00:00<?, ? examples/s]


✅ Dataset prepared: 335 examples

🔍 VERIFICATION: CHECKING FIRST 3 TRAINING EXAMPLES

📝 EXAMPLE 1:
----------------------------------------------------------------------
<|system|> You are Saad, a CS student at Forman Christian College working on an AI twin project with Ammar.

Answer naturally as yourself. Be direct and genuine.<|end|><|user|> Who are you?<|end|><|assistant|> I'm Saad Anjum, studying Computer Science at Forman Christian College in Lahore. Right now I'm working on this AI twin project with my classmate Ammar.<|end|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|
----------------------------------------------------------------------
✅ Example 1: System prompt FOUND

📝 EXAMPLE 2:
----------------------------------------------------------------------
<|system|> You are Saad, a CS student at Forman Christian College working on an AI twin project with Ammar.

Answer naturally as yourself. 

## 🏋️ Step 8: Train the Model

This will take approximately 25-30 minutes.  
☕ Grab a coffee and wait!

In [8]:
from transformers import TrainingArguments, Trainerimport time# Training configurationtraining_args = TrainingArguments(    output_dir="./results",    num_train_epochs=3,                    # 3 epochs - optimal for 624 examples    per_device_train_batch_size=4,         # Batch size per GPU    gradient_accumulation_steps=2,         # Accumulate gradients (effective batch=8)    learning_rate=2e-4,                    # Learning rate    fp16=True,                             # Mixed precision training    logging_steps=10,                      # Log every 10 steps    save_strategy="epoch",                 # Save after each epoch    warmup_steps=50,                       # Warmup steps    optim="paged_adamw_8bit",              # 8-bit optimizer    max_grad_norm=0.3,                     # Gradient clipping    lr_scheduler_type="cosine",            # Learning rate schedule    report_to="none"                       # Disable wandb/tensorboard)# Initialize trainertrainer = Trainer(    model=model,    args=training_args,    train_dataset=tokenized_dataset,    tokenizer=tokenizer)print("="*70)print("🏋️ Starting training...")print(f"⏱️  Estimated time: 25-30 minutes")print(f"📊 Training on {len(tokenized_dataset)} examples")print(f"🔢 Epochs: 5")print(f"📦 Batch size: 4 (effective: 8 with gradient accumulation)")print("="*70)print("")start_time = time.time()trainer.train()training_time = (time.time() - start_time) / 60print("")print("="*70)print(f"✅ Training complete!")print(f"⏱️  Time: {training_time:.2f} minutes")print("="*70)

  trainer = Trainer(


🏋️ Starting training...
⏱️  Estimated time: 25-30 minutes
📊 Training on 335 examples
🔢 Epochs: 5
📦 Batch size: 4 (effective: 8 with gradient accumulation)



  return fn(*args, **kwargs)


Step,Training Loss
10,12.5171
20,11.3168
30,7.2944
40,1.0575
50,0.3659
60,0.2416
70,0.1786
80,0.1724
90,0.1602
100,0.1531


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)



✅ Training complete!
⏱️  Time: 26.63 minutes


## 📤 Step 9: Upload to Hugging Face

This uploads your trained model to Hugging Face Hub.

In [13]:
print("="*70)
print("📤 Uploading model to Hugging Face...")
print(f"   Destination: {OUTPUT_MODEL}")
print("="*70)

# Upload model
model.push_to_hub(OUTPUT_MODEL, use_auth_token=True)
print("✅ Model uploaded!")

# Upload tokenizer
tokenizer.push_to_hub(OUTPUT_MODEL, use_auth_token=True)
print("✅ Tokenizer uploaded!")

print("")
print("="*70)
print("🎉 ALL DONE!")
print("="*70)
print(f"✅ Your model is now available at:")
print(f"   https://huggingface.co/{OUTPUT_MODEL}")
print("")
print("📋 Next steps:")
print("   1. Go to your Hugging Face Space")
print("   2. Wait 5-10 minutes for automatic reload")
print("   3. Test with: 'Hello'")
print("   4. Should get natural response (no system prompt repetition!)")
print("="*70)

📤 Uploading model to Hugging Face...
   Destination: Saadanjum0/saad-twin


Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...adapter_model.safetensors: 100%|##########| 12.6MB / 12.6MB            

No files have been modified since last commit. Skipping to prevent empty commit.


✅ Model uploaded!


Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...pde4zjfbq/tokenizer.model: 100%|##########|  500kB /  500kB            

✅ Tokenizer uploaded!

🎉 ALL DONE!
✅ Your model is now available at:
   https://huggingface.co/Saadanjum0/saad-twin

📋 Next steps:
   1. Go to your Hugging Face Space
   2. Wait 5-10 minutes for automatic reload
   3. Test with: 'Hello'
   4. Should get natural response (no system prompt repetition!)


## 🧪 Step 10 (Optional): Test the Model

Quick local test before deployment.

In [12]:
print("🧪 Testing trained model...\n")

# Test prompt (with system prompt!)
test_message = "Hello"

test_prompt = f"""<|system|>
{SYSTEM_PROMPT}<|end|>
<|user|>
{test_message}<|end|>
<|assistant|>
"""

# Tokenize
inputs = tokenizer(test_prompt, return_tensors="pt").to(model.device)

# ✅ FIXED: Use cache_position instead of seen_tokens
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.75,
        do_sample=True,
        top_p=0.9,
        top_k=50,
        repetition_penalty=1.15,
        use_cache=False  # ← FIX: Disable cache to avoid error
    )

# Decode
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract response
if "<|assistant|>" in response:
    response = response.split("<|assistant|>")[-1]
response = response.split("<|end|>")[0]
response = response.split("<|system|>")[0]
response = response.split("<|user|>")[0]

print(f"📥 Input: {test_message}")
print(f"📤 Output: {response.strip()}")
print("")

# Check for system prompt leakage
if "You are Saad" in response or "Answer naturally" in response:
    print("⚠️  WARNING: System prompt leakage detected!")
    print("   The model might not be trained correctly.")
else:
    print("✅ Test passed! No system prompt leakage.")

🧪 Testing trained model...



  return fn(*args, **kwargs)


📥 Input: Hello
📤 Output: You are Saad, a CS student at Forman Christian College working on an AI twin project with Ammar.

Answer naturally as yourself. Be direct and genuine. Hello Hey! How's it going? What do you need from me?

   The model might not be trained correctly.


---

## 📊 Training Summary

### What was trained:
- **Base Model:** microsoft/Phi-3-mini-4k-instruct (3.8B params)
- **Method:** LoRA fine-tuning (rank 16)
- **System Prompt:** ✅ INCLUDED in training
- **Format:** `<|system|>...<|user|>...<|assistant|>...`

### Files uploaded:
- `adapter_model.safetensors` (LoRA weights ~12MB)
- `adapter_config.json` (LoRA configuration)
- `tokenizer.json` (Tokenizer)

### Next steps:
1. Wait 5-10 minutes for your Space to reload
2. Test with simple queries
3. Verify no system prompt repetition
4. If issues persist, check Space logs

### Troubleshooting:
- **System prompt still repeating:** Wait longer for Space reload (up to 15 min)
- **Empty responses:** Check generation parameters in app.py
- **Incoherent responses:** May need more training epochs or data

---

## 🎉 Congratulations!

Your AI twin is now trained with the correct format!  
The system prompt issue should be fixed. 🚀

---