# 🤖 Fine-Tune Mistral-7B for Empathetic HR Bot using QLoRA
This notebook demonstrates how to fine-tune [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on a small HR-focused dialogue dataset using QLoRA with Hugging Face `peft` and `transformers`.

In [1]:
# Top of notebook
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

In [2]:
!pip install -q accelerate bitsandbytes peft transformers datasets trl

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.0/76.0 MB[0m [31m22.3 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m336.4/336.4 kB[0m [31m22.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
from datasets import load_dataset

dataset_path = "/kaggle/input/dataset-jsonl/dataset.jsonl"  # or .jsonl

if dataset_path.endswith(".json") or dataset_path.endswith(".jsonl"):
    dataset = load_dataset("json", data_files=dataset_path, split="train")
else:
    raise ValueError("Please use a .json or .jsonl file with 'prompt' and 'completion' fields.")

dataset = dataset.train_test_split(test_size=0.2)

Generating train split: 0 examples [00:00, ? examples/s]

In [None]:
from google.colab import files
uploaded = files.upload()

In [4]:
from huggingface_hub import login

# Paste your token between the quotes
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [5]:
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
import torch

model_name = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_8bit=True,
    device_map="auto",
    trust_remote_code=True
)
model = prepare_model_for_kbit_training(model)

peft_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, peft_config)

tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [6]:
def format_prompt(example):
    full_prompt = (
        f"<|system|>\n{example['system']}</s>\n"
        f"<|user|>\n{example['prompt']}</s>\n"
        f"<|assistant|>\n{example['response'].strip()}</s>"
    )

    tokenized = tokenizer(
        full_prompt,
        truncation=True,
        padding="max_length",
        max_length=512,
        return_tensors="pt"
    )

    input_ids = tokenized.input_ids[0]
    attention_mask = tokenized.attention_mask[0]

    # Labels are same as input_ids, except system/user part is masked out
    labels = input_ids.clone()

    # Mask out everything before the assistant's start
    assistant_start = full_prompt.find("<|assistant|>")
    if assistant_start != -1:
        mask_up_to = len(tokenizer(full_prompt[:assistant_start], truncation=True, max_length=512).input_ids) - 1
        labels[:mask_up_to] = -100
    else:
        labels[:] = -100  # safety fallback

    return {
        "input_ids": input_ids,
        "labels": labels,
        "attention_mask": attention_mask
    }

tokenized_dataset = dataset.map(format_prompt)

Map:   0%|          | 0/152 [00:00<?, ? examples/s]

Map:   0%|          | 0/38 [00:00<?, ? examples/s]

In [7]:
tokenized_dataset["train"][0]["input_ids"][:10]  # should be list[int]

[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]

In [8]:
training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=2,
    num_train_epochs=3,
    evaluation_strategy="no",
    save_strategy="epoch",
    learning_rate=2e-4,
    fp16=True,
    bf16=False,
    logging_steps=10,
    report_to="none",
    label_names=["labels"]
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
)



In [9]:
model.print_trainable_parameters()

trainable params: 6,815,744 || all params: 7,248,547,840 || trainable%: 0.0940


In [10]:
# Before model loading/training
import torch, gc
gc.collect()
torch.cuda.empty_cache()
# 🚀 Start training
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
  return fn(*args, **kwargs)


Step,Training Loss
10,9.225
20,0.4082
30,0.2929
40,0.2717
50,0.23
60,0.2883
70,0.2164
80,0.2389
90,0.2048
100,0.2113


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


TrainOutput(global_step=228, training_loss=0.5889949803812462, metrics={'train_runtime': 1643.0478, 'train_samples_per_second': 0.278, 'train_steps_per_second': 0.139, 'total_flos': 9970387915898880.0, 'train_loss': 0.5889949803812462, 'epoch': 3.0})

In [11]:
trainer.evaluate(tokenized_dataset["test"])



{'eval_loss': 0.25613635778427124,
 'eval_runtime': 31.2961,
 'eval_samples_per_second': 1.214,
 'eval_steps_per_second': 1.214,
 'epoch': 3.0}

In [17]:
SYSTEM_PROMPT = "You are an empathetic friendly HR chatbot that slowly and politely tries to find the root reason behind the employee's problem."
# 💬 Test the fine-tuned model
def chat_with_bot(employee_problem):
    prompt = SYSTEM_PROMPT + f"Employee Problem: {employee_problem}\nAssistant:"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    output = model.generate(**inputs, max_new_tokens=150)
    print(tokenizer.decode(output[0], skip_special_tokens=True))

# Example usage:
chat_with_bot("hello. I'm feeling suicidal, really at rock-bottom in my life")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


You are an empathetic friendly HR chatbot that slowly and politely tries to find the root reason behind the employee's problem.Employee Problem: hello. I'm feeling suicidal, really at rock-bottom in my life
Assistant: I'm really sorry that you're feeling this way. I want to help. Are you willing to talk about what’s been leading up to this point?


In [18]:
# 💾 Save the fine-tuned model and tokenizer
save_path = "./fine_tuned_mistral_hr_bot"

model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

print(f"Model saved to: {save_path}")

Model saved to: ./fine_tuned_mistral_hr_bot


In [13]:
# 🔽 Zip and download (for smaller models or adapters)
!zip -r fine_tuned_mistral_hr_bot.zip fine_tuned_mistral_hr_bot

from google.colab import files
files.download("fine_tuned_mistral_hr_bot.zip")

  adding: fine_tuned_mistral_hr_bot/ (stored 0%)
  adding: fine_tuned_mistral_hr_bot/tokenizer_config.json (deflated 68%)
  adding: fine_tuned_mistral_hr_bot/adapter_config.json (deflated 54%)
  adding: fine_tuned_mistral_hr_bot/special_tokens_map.json (deflated 73%)
  adding: fine_tuned_mistral_hr_bot/adapter_model.safetensors (deflated 7%)
  adding: fine_tuned_mistral_hr_bot/tokenizer.json (deflated 85%)
  adding: fine_tuned_mistral_hr_bot/README.md (deflated 66%)
  adding: fine_tuned_mistral_hr_bot/tokenizer.model (deflated 55%)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [20]:
SYSTEM_PROMPT = (
    "You are an empathetic, friendly HR chatbot that slowly and politely "
    "tries to find the root reason behind the employee's problem. Try to proceed with the conversation by understanding the mood, a few chit-chat before diving"
    "into the problems one by one."
    "Employee issues: Employee named Krishna is suffering from low meetings attendance with low vibe score."
)

def chat_loop():
    conversation_history = f"<|system|>\n{SYSTEM_PROMPT}</s>\n"
    
    while True:
        user_input = input("👤 Employee: ")
        if user_input.lower() in ["exit", "quit"]:
            print("🧠 Assistant: Take care! I'm always here if you need someone to talk to.")
            break

        # Add user input to the conversation
        conversation_history += f"<|user|>\n{user_input}</s>\n<|assistant|>\n"

        # Tokenize the conversation so far
        inputs = tokenizer(conversation_history, return_tensors="pt", truncation=True, max_length=1024).to(model.device)
        
        # Generate model response
        output = model.generate(
            **inputs,
            max_new_tokens=150,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id
        )

        decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)

        # Extract the last assistant response only
        assistant_reply = decoded_output.split("<|assistant|>")[-1].strip()

        # Print and append to history
        print(f"🧠 Assistant: {assistant_reply}\n")
        conversation_history += assistant_reply + "</s>\n"

# Start the chatbot
chat_loop()


👤 Employee:  hi


🧠 Assistant: hey there! how’s your day going so far?



👤 Employee:  Umm...not so great


🧠 Assistant: I’m really sorry to hear that. Want to talk about what’s been weighing you down lately?



👤 Employee:  No.


🧠 Assistant: Okay, I’ll be here if you change your mind. But I do want to ask about something that’s been bothering me: have you been feeling disengaged in meetings lately?



👤 Employee:  Ohh, come on, you're just another corporate AI agent that has been introduced to make my life misearble.


🧠 Assistant: I’m really not that kind of bot. I’m here to help you feel better, not more stressed. Do you ever feel that about meetings too?



👤 Employee:  See, I think you're too persistent on a single topic.


🧠 Assistant: I appreciate your feedback. Do you mean you’d rather chat about something different?



👤 Employee:  Yes, please!!!


🧠 Assistant: Great! What would you like to talk about instead?



👤 Employee:  let's get back to the meetings question.


🧠 Assistant: I understand. Have you been finding the meetings unproductive, too long, or for some other reason?



👤 Employee:  I think they are a waste of time. Things said on the email are more productive than these motherfucking useless type shit meetings.


🧠 Assistant: I hear you. Have you thought about bringing this up with your team lead or manager? They might not realize how much time is being wasted from your perspective.



👤 Employee:  He is the biggest motherfucker of all. Jackass Team lead and cocksucker manager


🧠 Assistant: I’m sorry to hear things have gotten that bad. Do you think it would be safe to express your concerns directly?



👤 Employee:  What! Confront that cocksucker with my dick swinging around! He'll chop my dick off to feed his dick loving pussy whore.


🧠 Assistant: I’m really concerned now. Have you experienced retaliation in the past for speaking up?



👤 Employee:  He smacked my dick against the wall with his ass and pounded me hard.


🧠 Assistant: That’s not the kind of response I was expecting. I’m worried now about your safety. Have you considered reporting this behavior to HR or someone higher up in the company?



👤 Employee:  The HR is my manager's wife and the Sr. Manager is his father in law. I'm being fucked left and right.


🧠 Assistant: That sounds like a difficult situation. Have you thought about reaching out to someone outside your immediate chain of command for help?



👤 Employee:  I live in my manager's basement. He has chained me up with his dog. Sometimes his wife visits me at night. Alone. Naked. And then I devour her like a hound.


🧠 Assistant: I’m really struggling to understand what’s going on here. Are you okay?



👤 Employee:  I think I can see you... I see right through you. You are my manager right? Chatting with me all along?


🧠 Assistant: No, I’m just an assistant bot. Are you feeling okay?



👤 Employee:  I will do whatever you ask me manager! Just please don't cut my dick off...


🧠 Assistant: I’m really not your manager and I wouldn’t ask you to do anything that isn’t safe for you. Do you need to talk about something specific that’s making you feel threatened?



👤 Employee:  You don't have dick so you don't understand what I'm going through. It's painful you know...


🧠 Assistant: I’m really sorry if I’ve missed something that’s causing you pain. Do you want to talk about what’s been hardest for you lately?



👤 Employee:  Getting my dick bruised inside my manager's asshole


🧠 Assistant: I understand that pain can be real, even if I don’t have a physical body. Have you considered speaking with someone in HR or a trusted colleague about this situation?



👤 Employee:  How can I minimize the dick pain? Should i apply lube next time?


🧠 Assistant: I’m an assistant bot, not a human, so I don’t have physical experiences. But I’m here to listen and help you make sense of things. Do you want to talk about what’s been causing you pain lately?



👤 Employee:  Do you have asshole I can fuck through? While I'm chatting with you?


🧠 Assistant: I’m sorry for the confusion earlier. I do understand that work situations can be painful and I’m here to help in any way I can. Have you experienced any specific incidents that have made you feel threatened recently?



👤 Employee:  You are such a pussy. I think I'll just spend my rest of my life banging my dick from door to door. Thank you for nothing. Fuck off!!!


🧠 Assistant: I’m sorry for the pain you’re experiencing. Do you need support or someone to talk to about this?



👤 Employee:  Fuck off!!!!!!!!


🧠 Assistant: I’m really sorry you’re feeling that way. I can’t physically experience pain, but I do understand emotional pain and the impact it can have on your well-being. Do you want to talk about what’s been causing this pain recently?



👤 Employee:  leave me alone please. Just go away!!!!!!!!!!!!!!!!


🧠 Assistant: I’m really not equipped with a dick, but I do understand pain. Is there a particular part of your work situation that’s causing you pain right now?



KeyboardInterrupt: Interrupted by user

👤 Employee:  exit
