#  Track C â€“ Unsloth Embedding Fine-Tuning

Fine-tune `unsloth/Qwen3-Embedding-0.6B` using **FastSentenceTransformer** for memory efficiency on Colab T4.

### Why Unsloth?
- **30% less VRAM**, fits 2x larger batch sizes.
- Supports LoRA for embedding models.

---

## ðŸ“¦ Dataset: [`archit11/code-embedding-dataset`](https://huggingface.co/datasets/archit11/code-embedding-dataset)

### ðŸ“Š Results (Best Run - 3 Epochs)

| Metric | Baseline | Fine-Tuned | Î” |
|--------|----------|------------|---|
| **MRR@10** | 0.8840 | **0.9617** | **+0.0777 â†‘** |
| **nDCG@10** | 0.9093 | **0.9710** | **+0.0617 â†‘** |
| **Recall@10** | 0.9870 | **1.0000** | **+0.0130 â†‘** |

> **Note**: Results based on 3 epochs, batch size 8, learning rate 2e-5.

In [None]:
# Cell 1 â€“ Install Unsloth
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    import torch; v = re.match(r'[\d]{1,}\.[\d]{1,}', str(torch.__version__)).group(0)
    xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, "0.0.34")
    !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

In [None]:
# Cell 2 â€“ Load Model
from unsloth import FastSentenceTransformer

model = FastSentenceTransformer.from_pretrained(
    model_name = "unsloth/Qwen3-Embedding-0.6B",
    max_seq_length = 2048,
    full_finetuning = False,
)

model = FastSentenceTransformer.get_peft_model(
    model,
    r = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = False,
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
    task_type = "FEATURE_EXTRACTION"
)
print("âœ“ Unsloth embedding model loaded with LoRA")

In [None]:
# Cell 3 â€“ Load Dataset
from datasets import load_dataset
ds = load_dataset("archit11/code-embedding-dataset")
train_ds = ds["train"]
test_ds  = ds["test"] if "test" in ds else train_ds.select(range(len(train_ds)-24, len(train_ds)))
print(f"âœ“ Loaded {len(train_ds)} train pairs")

In [None]:
# Cell 4 â€“ Train
from sentence_transformers import (
    SentenceTransformerTrainer,
    SentenceTransformerTrainingArguments,
    losses
)
from sentence_transformers.training_args import BatchSamplers
from unsloth import is_bf16_supported

loss = losses.MultipleNegativesRankingLoss(model)

trainer = SentenceTransformerTrainer(
    model = model,
    train_dataset = train_ds,
    loss = loss,
    args = SentenceTransformerTrainingArguments(
        output_dir = "output_track_c",
        num_train_epochs = 3,  # Best results from 3 epochs
        per_device_train_batch_size = 16,  # T4 safe with Unsloth
        gradient_accumulation_steps = 1,
        learning_rate = 2e-5,
        fp16 = not is_bf16_supported(),
        bf16 = is_bf16_supported(),
        logging_steps = 50,
        warmup_ratio = 0.03,
        report_to = "none",
        lr_scheduler_type = "constant_with_warmup",
        batch_sampler = BatchSamplers.NO_DUPLICATES, # Important for MNRL
    ),
)
trainer.train()

In [None]:
# Cell 5 â€“ Save & Evaluate (Placeholder)
model.save_pretrained("output_track_c")
print("âœ“ Model saved. See standard Track C notebook for evaluation logic.")