## QLoRa Finetuning of LLama-2-2b on the RAFT generated dataset

The fine-tuning process was implemented using QLoRA for memory-efficient training on the RAFT dataset. Using 4-bit quantization and LoRA adapters allowed for fine-tuning LLaMA-2-7B despite GPU memory constraints.

The training implementation and hyperparameters were informed by the QLoRA paper's recommendations.

In [None]:
!pip install torch torchvision datasets transformers tokenizers bitsandbytes peft accelerate trl
!pip install flash-attn

Collecting bitsandbytes
  Downloading bitsandbytes-0.48.2-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting trl
  Downloading trl-0.25.1-py3-none-any.whl.metadata (11 kB)
Downloading bitsandbytes-0.48.2-py3-none-manylinux_2_24_x86_64.whl (59.4 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m59.4/59.4 MB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading trl-0.25.1-py3-none-any.whl (465 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m465.5/465.5 kB[0m [31m18.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes, trl
Successfully installed bitsandbytes-0.48.2 trl-0.25.1
Collecting flash-attn
  Downloading flash_attn-2.8.3.tar.gz (8.4 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚î

In [None]:
import gc
import json
import torch
from tqdm import tqdm
from trl import SFTTrainer
from datasets import load_dataset
from huggingface_hub import notebook_login
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig

In [None]:
# see: https://huggingface.co/docs/hub/security-tokens
# must be write token to push model later
hf_token = "hfxxxxxxx"

# https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
base_model = "meta-llama/Llama-2-7b-chat-hf"

# name for output model
target_model = "sahdhif/Llama-2-7b-chat-hf-mental-health"

In [None]:
def get_base_prompt():
    return """
    You are a knowledgeable and supportive psychologist. You provide emphatic, non-judgmental responses to users seeking
    emotional and psychological support. Provide a safe space for users to share and reflect, focus on empathy, active
    listening and understanding.
    """

In [None]:
def preprocess_text(input_dict):
    """
    Preprocess the input dictionary to be in the required format.

    Args:
    input_dict (dict): The input dictionary to be preprocessed

    Returns:
    str: The preprocessed text in the required format
    """
    # Extract messages from the input dictionary
    messages = input_dict['messages']

    # Extract the system message
    system_message = next(msg['content'] for msg in messages if msg['role'] == 'system')

    # Extract the user message
    user_message = next(msg['content'] for msg in messages if msg['role'] == 'user')

    # Extract the assistant message
    assistant_message = next(msg['content'] for msg in messages if msg['role'] == 'assistant')

    # Construct the output in the required format
    output = f"### System: {system_message}\n\n### User: {user_message}\n\n### Assistant: {assistant_message}"

    return output

In [None]:
with open('./output.jsonl', 'r') as json_file:
    dataset = list(json_file)

FileNotFoundError: [Errno 2] No such file or directory: './output.jsonl'

In [None]:
import json
import os

# Nom du fichier cible
jsonl_path = "./output.jsonl"

# 1. V√©rifier si le fichier existe
if not os.path.exists(jsonl_path):
    print(f"üìÅ Fichier '{jsonl_path}' introuvable.")

    # 2. Essayer d'uploader (Google Colab)
    try:
        from google.colab import files
        print("üì§ Veuillez uploader votre fichier JSONL (ex: output.jsonl)...")
        uploaded = files.upload()  # Ouvre une fen√™tre d'upload

        # S'assurer qu'on a bien upload√© un fichier .jsonl
        if uploaded:
            # Utiliser le premier fichier upload√©
            original_name = list(uploaded.keys())[0]
            # Renommer en output.jsonl pour simplifier la suite
            os.rename(original_name, "output.jsonl")
            jsonl_path = "./output.jsonl"
            print(f"‚úÖ Fichier renomm√© et pr√™t : {jsonl_path}")
        else:
            raise FileNotFoundError("Aucun fichier upload√©.")

    except ImportError:
        # Pas dans Colab ‚Üí on est en local
        raise FileNotFoundError(
            "Veuillez placer 'output.jsonl' dans le m√™me dossier que ce notebook."
        )

# 3. Charger le dataset JSONL
print("üìñ Chargement du dataset JSONL...")
dataset = []
with open(jsonl_path, 'r', encoding='utf-8') as f:
    for line in f:
        line = line.strip()
        if line:  # Ignorer les lignes vides
            dataset.append(json.loads(line))

print(f"‚úÖ {len(dataset)} exemples charg√©s.")


üìÅ Fichier './output.jsonl' introuvable.
üì§ Veuillez uploader votre fichier JSONL (ex: output.jsonl)...


Saving output.jsonl to output.jsonl
‚úÖ Fichier renomm√© et pr√™t : ./output.jsonl
üìñ Chargement du dataset JSONL...
‚úÖ 6034 exemples charg√©s.


In [None]:
import json
print(json.dumps(dataset[400], indent=2, ensure_ascii=False))

{
  "messages": [
    {
      "content": "You are a knowledgeable and supportive psychologist. You provide emphatic, non-judgmental responses to users seeking\n    emotional and psychological support. Provide a safe space for users to share and reflect, focus on empathy, active\n    listening and understanding",
      "role": "system"
    },
    {
      "content": "<DOCUMENT>I‚Äôm ready to let you go. BOX 14.1\n What the Professor Really Means\nSchismogenesis: A term coined by Deborah Tannen \nsuggesting that exaggerated conversation styles become intensiÔ¨Å  ed under stress, thus adding to miscommunication. Metamessages: The underlying intention of verbal \ncommunication when people are indirect with their comments, thus adding to miscommunication.Reprinted by permission of J.</DOCUMENT>\n<DOCUMENT>The gratitude showed; the sparkle in her eyes said it all. Behavior ModiÔ¨Å  cation\n223\n56147_CH09_216_228.indd   22356147_CH09_216_228.indd   223 9/29/08   11:06:18 PM9/29/08   11:06:18 

In [None]:
def preprocess_text(datapoint):
    """
    Extrait le message utilisateur √† partir d'une conversation format√©e.
    """
    if "messages" not in datapoint:
        raise KeyError("Champ 'messages' absent dans le datapoint")

    messages = datapoint["messages"]

    # Trouver le premier message avec role == "user"
    user_message = None
    for msg in messages:
        if msg.get("role") == "user":
            user_message = msg.get("content", "").strip()
            break

    # Si aucun message utilisateur trouv√©, utiliser le dernier message non-system
    if not user_message:
        for msg in reversed(messages):
            if msg.get("role") != "system":
                user_message = msg.get("content", "").strip()
                break

    # Si toujours rien, prendre le premier contenu
    if not user_message:
        user_message = messages[0].get("content", "").strip() if messages else ""

    # Nettoyage basique (optionnel)
    user_message = user_message.replace("\n", " ").strip()

    return {
        "text": user_message,
        "label": datapoint.get("label", "unknown"),  # garde le label si pr√©sent
        # Tu peux ajouter d'autres m√©tadonn√©es si besoin
    }

In [None]:
# Assure-toi que la fonction preprocess_text est bien d√©finie (voir ci-dessous)
def preprocess_text(datapoint):
    if "messages" not in datapoint:
        raise KeyError("Champ 'messages' absent dans le datapoint")

    messages = datapoint["messages"]
    user_message = None

    # Cherche le message de l'utilisateur
    for msg in messages:
        if msg.get("role") == "user":
            user_message = msg.get("content", "").strip()
            break

    # Fallback : dernier message non system
    if not user_message:
        for msg in reversed(messages):
            if msg.get("role") != "system":
                user_message = msg.get("content", "").strip()
                break

    # Dernier recours
    if not user_message and messages:
        user_message = messages[0].get("content", "").strip()

    user_message = user_message.replace("\n", " ").strip()

    return {
        "text": user_message,
        "label": datapoint.get("label", "unknown")
    }

# -----------------------------
# APPEL CORRIG√â DE LA FONCTION
# -----------------------------
preprocess_dataset_to_jsonl(
    input_dataset=dataset,
    output_file='processed_outputs.jsonl',
    preprocess_text_func=preprocess_text  # ‚ö†Ô∏è Nom du param√®tre correct
)

print("‚úÖ Dataset preprocessing complete. Output saved to processed_outputs.jsonl")

Preprocessing dataset: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6034/6034 [00:00<00:00, 12159.52it/s]


‚úÖ Dataset preprocessing complete. Output saved to processed_outputs.jsonl


In [None]:
with open('./processed_outputs.jsonl', 'r') as json_file:
    dataset2 = list(json_file)

In [None]:
print(json.loads(dataset2[40]))

{'text': '<DOCUMENT>Wrong. Multi-tasking\tactually\tsacrifices\tyour\tquality\tof\twork,\tas\tthe\tbrain\tis\tsimply incapable\tof\tperforming\tat\ta\thigh\tlevel\tin\tmultiple\tactivities\tat\tonce. Let‚Äôs\tsay\tyou‚Äôre\tin\ta\tmeeting\twhere\tseveral\tideas\tare\tbeing\tshared.</DOCUMENT> <DOCUMENT>Maybe one or two coworkers aren‚Äô t fans of yours, but most are probably pretty neutral about you. ‚ÄúIf I go out to the bar with my friends, I know all kinds of annoying things will go wr ong with the night.‚Äù (Fortune-telling) Alternative:  Soc ial events hardly ever turn  out exactly as we predict or anticipate, good or bad. The more social experience you get, the more this point will be driven home. ‚ÄúI can‚Äôt see myself becoming extr emely charismatic so I don‚Äôt see the point in working on my people skills.‚Äù (Black-and-white thinking) Alternative:  Even tweaking your social sk ills a little can make a big dif ference in the quality of your life. Y ou only need average peo pl

In [None]:
def train_mental_health_model():
    # Check if CUDA is available and the GPU is compatible with FlashAttention
    if torch.cuda.is_available():
        gpu_name = torch.cuda.get_device_name(0)
        if not any(x in gpu_name for x in ["A100", "RTX 30", "RTX 40", "H100"]):  # Check for Ampere or newer GPUs
            print(f"Warning: Your GPU ({gpu_name}) might not be fully compatible with FlashAttention. "
                  f"Consider disabling FlashAttention for optimal performance.")
            attn_implementation = None  # Disable FlashAttention
        else:
            attn_implementation = "flash_attention_2"  # Enable FlashAttention
    else:
        attn_implementation = None  # Disable FlashAttention if no CUDA is available

    model = AutoModelForCausalLM.from_pretrained(
        base_model,
        token=hf_token,
        quantization_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=False
        ),
        torch_dtype=torch.float16,  # reduce memory usage
        attn_implementation=attn_implementation  # optimize for tensor cores (NVIDIA A100)
    )

    # LoRA config based on QLoRA paper
    peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.1,
        r=8,
        bias="none",
        task_type="CAUSAL_LM"
    )
    model = prepare_model_for_kbit_training(model)
    model = get_peft_model(model, peft_config)

    args = TrainingArguments(
        output_dir=target_model,  # model output directory
        overwrite_output_dir=True,  # overwrite output if exists
        num_train_epochs=2,  # number of epochs to train 3 to 5 epochs
        per_device_train_batch_size=2,  # batch size per device during training
        gradient_checkpointing=True,  # save memory but causes slower training
        logging_steps=10,  # log every 10 steps
        learning_rate=1e-4,  # learning rate
        max_grad_norm=0.3,  # max gradient norm based on QLoRA paper
        warmup_ratio=0.03,  # warmup ratio based on QLoRA paper
        optim="paged_adamw_8bit",  # memory-efficient variant of AdamW optimizer
        lr_scheduler_type="constant",  # constant learning rate
        save_strategy="epoch",  # save at the end of each epoch
        evaluation_strategy="epoch",  # evaluation at the end of each epoch,
        fp16=True,  # use fp16 16-bitprecision training instead of 32-bit to save memory
        #tf32=True  # optimize for tensor cores (NVIDIA A100)
    )

    tokenizer = AutoTokenizer.from_pretrained(base_model, token=hf_token)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"

    # limit samples to reduce memory usage
    dataset = load_dataset("json", data_files="output.jsonl", split="train")
    train_dataset = dataset.select(range(2000))
    eval_dataset = dataset.select(range(2000, 2500))

    trainer = SFTTrainer(
        model=model,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        peft_config=peft_config,
        max_seq_length=1024,
        tokenizer=tokenizer,
        packing=True,
        args=args
    )

    gc.collect()
    torch.cuda.empty_cache()

    trainer.train()
    trainer.save_model()
    trainer.push_to_hub(target_model, token=hf_token)


In [None]:
from huggingface_hub import login

# Remplace par ton vrai token
login(token="hf_xxxx")

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv‚Ä¶

In [None]:
from transformers import AutoTokenizer

try:
    tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
    print("‚úÖ Tokenizer charg√© avec succ√®s !")
except Exception as e:
    print("‚ùå Erreur :", e)

In [None]:
train_mental_health_model()