**Installation**

In [None]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
    !pip install --no-deps unsloth

**Mounting Google Drive**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

MessageError: Error: credential propagation was unsuccessful

**Importing required Libs** (Befor Imorting Unsloth Set Runtime as T4 with GPU)

In [None]:
from unsloth import FastLanguageModel
from datasets import Dataset
import pandas as pd
import torch

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [None]:
csv_path = "dataset.csv"  # <-- REPLACE WITH YOUR FILE PATH
df = pd.read_csv(csv_path)

# 2. Create conversation structure
system_prompt = """You are a resume screening assistant. \
Determine whether the given resume is suitable for the specified job role. \
Provide your decision and the reason for it."""

def create_conversations(examples):
    conversations = []
    for i in range(len(examples['Role'])): # Iterate through the batch using integer indices
        conversation = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"Role: {examples['Role'][i]}\nJob Description: {examples['Job_Description'][i]}\nResume: {examples['Resume'][i]}"},
            {"role": "assistant", "content": f"Decision: {examples['Decision'][i]}\nReason: {examples['Reason_for_decision'][i]}"}
        ]
        conversations.append(conversation)
    return {"conversations": conversations} # Return a dictionary containing the list of conversations

dataset = Dataset.from_pandas(df)
dataset = dataset.map(create_conversations, batched=True, batch_size=100)

# 3. Standardize format
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)

# 4. Prepare model (CHANGED MODEL NAME)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",  # Updated model
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

# 5. Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = True,
)

Map:   0%|          | 0/10174 [00:00<?, ? examples/s]

Unsloth: Standardizing formats (num_proc=2):   0%|          | 0/10174 [00:00<?, ? examples/s]

==((====))==  Unsloth 2025.3.15: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/1.03G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.7k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

Not an error, but Unsloth cannot patch MLP layers with our manual autograd engine since either LoRA adapters
are not enabled or a bias term (like in Qwen) is used.
Unsloth 2025.3.15 patched 16 layers with 16 QKV layers, 16 O layers and 0 MLP layers.


In [None]:
def formatting_prompts_func(examples):
    texts = [tokenizer.apply_chat_template(convo, tokenize=False)
             for convo in examples["conversations"]]
    return {"text": texts}

dataset = dataset.map(formatting_prompts_func, batched=True)

# 7. Training setup (ADJUSTED FOR 1B MODEL)
from transformers import TrainingArguments
from trl import SFTTrainer

trainer_args = TrainingArguments(
    output_dir = "./resume_screener_output",
    per_device_train_batch_size = 4,  # Increased batch size for smaller model
    gradient_accumulation_steps = 2,
    learning_rate = 2e-5,
    num_train_epochs = 1,
    fp16 = not torch.cuda.is_bf16_supported(),
    bf16 = torch.cuda.is_bf16_supported(),
    logging_steps = 1,
    optim = "adamw_8bit",
    weight_decay = 0.01,
    save_strategy = "epoch",
)

trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    tokenizer = tokenizer,
    args = trainer_args,
)


Map:   0%|          | 0/10174 [00:00<?, ? examples/s]

Unsloth: We found double BOS tokens - we shall remove one automatically.


Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/10174 [00:00<?, ? examples/s]

**Model Training and Saving**


In [None]:
trainer.train()

# 9. Save model
save_path = "./resume_screener_model"
trainer.model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

# Optional: Merge and save 16-bit model
trainer.model.save_pretrained_merged(
    save_path,
    tokenizer = tokenizer,
    save_method = "merged_8bit",
)

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 10,174 | Num Epochs = 1 | Total steps = 1,272
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 2 x 1) = 8
 "-____-"     Trainable parameters = 3,407,872/1,000,000,000 (0.34% trained)


Step,Training Loss
1,0.7973
2,0.7053
3,0.7545
4,0.7789
5,0.709
6,0.8407
7,0.7512
8,0.7198
9,0.746
10,0.836


Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 1.0G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 3.71 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 16/16 [00:00<00:00, 56.15it/s]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving ./resume_screener_model/pytorch_model.bin...
Done.


****

**Saving Model in 4bit**

In [None]:
save_path = "./resume_screener_model_4Bit"
trainer.model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

# Optional: Merge and save 16-bit model
trainer.model.save_pretrained_merged(
    save_path,
    tokenizer = tokenizer,
    save_method = "merged_4bit_forced",
)

Unsloth: Merging 4bit and LoRA weights to 4bit...
This might take 5 minutes...




Done.
Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 10 minutes for Llama-7b... Done.


**Create Zip File of Output Model**

In [None]:
!zip -r /content/resume_screener_model_4Bit.zip /content/resume_screener_model_4Bit

  adding: content/resume_screener_model_4Bit/ (stored 0%)
  adding: content/resume_screener_model_4Bit/tokenizer.json (deflated 85%)
  adding: content/resume_screener_model_4Bit/config.json (deflated 55%)
  adding: content/resume_screener_model_4Bit/README.md (deflated 66%)
  adding: content/resume_screener_model_4Bit/adapter_config.json (deflated 54%)
  adding: content/resume_screener_model_4Bit/adapter_model.safetensors (deflated 8%)
  adding: content/resume_screener_model_4Bit/generation_config.json (deflated 38%)
  adding: content/resume_screener_model_4Bit/special_tokens_map.json (deflated 71%)
  adding: content/resume_screener_model_4Bit/tokenizer_config.json (deflated 94%)
  adding: content/resume_screener_model_4Bit/model.safetensors (deflated 13%)
