<img src="https://media.licdn.com/dms/image/D4D12AQGkHY89zhGDZA/article-cover_image-shrink_720_1280/0/1691431976579?e=2147483647&v=beta&t=lgk0jF-6kju4RyG68tF_4cxZklJqF3rtXrbQEcEYX7c">


# <b><span style='color:#F1A424'>|</span> Mission: <span style='color:#F1A424'>Llama2 Fine-Tuning</span><span style='color:#ABABAB'>

***

**Please consider upvoting the notebook if you found it useful** 

Welcome, Kaggle enthusiasts and AI pioneers! In the last year, we have witnessed an unprecedented boom in technology, particularly in the realm of Large Language Models (LLMs) and broader Artificial Intelligence (AI) systems. Tools like ChatGPT have not just entered the mainstream but have revolutionized it, reshaping how we gather and process information across various sectors.

In this notebook, we dive into the heart of this technological revolution by focusing on a specific aspect of AI - Large Language Models, with a special emphasis on a model we're referring to as "Llama2". You will embark on a journey to understand and master the art of fine-tuning Llama2 for text generation. By leveraging PyTorch, we aim to equip you with the skills and knowledge to harness the full potential of LLMs.
    
Hope you enjoy it and find it useful.
    

### <b><span style='color:#F1A424'>Table of Contents</span></b> <a class='anchor' id='top'></a>
<div style=" background-color:#3b3745; padding: 13px 13px; border-radius: 8px; color: white">
<li> <a href="#install_libraries">Install libraries</a></li>
<li><a href="#import_libraries">Import Libraries</a></li>
<li><a href="#load_data">Load Data</a></li>
<li><a href="#configuration">Configuration</a></li>
<li><a href="#configure_parameters">Configure Quantization and LORA-specific parameters</a></li>
<li><a href="#load_model">Load Model</a></li>
<li><a href="#training">Training</a></li>
<li><a href="#testing">Testing</a></li>
<li><a href="#save_model">Saving model for inference</a></li>
</div>




# <b><span style='color:#F1A424'>|</span> Install Libraries</b><a class='anchor' id='install_libraries'></a> [↑](#top) 

***

Install all the required libraries for this notebook.

In [1]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

In [2]:
pip install --upgrade bitsandbytes


Collecting bitsandbytes
  Using cached bitsandbytes-0.43.3-py3-none-win_amd64.whl.metadata (3.5 kB)
Using cached bitsandbytes-0.43.3-py3-none-win_amd64.whl (136.5 MB)
Installing collected packages: bitsandbytes
  Attempting uninstall: bitsandbytes
    Found existing installation: bitsandbytes 0.40.2
    Uninstalling bitsandbytes-0.40.2:
      Successfully uninstalled bitsandbytes-0.40.2
Successfully installed bitsandbytes-0.43.3
Note: you may need to restart the kernel to use updated packages.


In [3]:
!python -m bitsandbytes


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(8, 6), cuda_version_string='121', cuda_version_tuple=(12, 1))
PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: (8, 6).
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
SUCCESS!
Installation was successful!


The directory listed in your path is found to be non-existent: \ROBOTICS_1


# <b><span style='color:#F1A424'>|</span> Import Libraries</b><a class='anchor' id='import_libraries'></a> [↑](#top) 

***

Import all the required libraries for this notebook.

In [4]:
# Import necessary libraries
import pandas as pd
from tqdm import tqdm

import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

# <b><span style='color:#F1A424'>|</span> Configuration</b><a class='anchor' id='configuration'></a> [↑](#top) 

***

Central repository for this notebook's hyperparameters.

In [5]:
# Set up model configuration and training parameters
model_name = "NousResearch/llama-2-7b-chat-hf"
dataset_name = r"C:\Users\robot\Downloads\llama\archive\train1.jsonl"
new_model = "llama-2-7b-MentalHealthCare"
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
output_dir = "./results"
num_train_epochs = 2
fp16 = False
bf16 = False
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 1
gradient_checkpointing = True
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "constant"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True
save_steps = 25
logging_steps = 5
max_seq_length = None
packing = False
device_map = {"": 0}

In [6]:
# Load datasets
train_dataset = load_dataset('json', data_files=r"C:\Users\robot\Downloads\llama\archive\test1.jsonl", split="train")
valid_dataset = load_dataset('json', data_files= r"C:\Users\robot\Downloads\llama\archive\train1.jsonl", split="train")

# Preprocess datasets
train_dataset_mapped = train_dataset.map(lambda examples: {'text': [input_text + ' [/INST] ' + output_text for input_text, output_text in zip(examples['input'], examples['output'])]},batched=True)
valid_dataset_mapped = valid_dataset.map(lambda examples: {'text': [input_text + ' [/INST] ' + output_text for input_text, output_text in zip(examples['input'], examples['output'])]},batched=True)


# <b><span style='color:#F1A424'>|</span> Configuration of Quantization and LORA parameters</b><a class='anchor' id='configure_parameters'></a> [↑](#top) 

***

As model size is big it is loaded in 4 bit.

In [8]:
# Configure quantization parameters
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Load pre-trained model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Configure LoRA-specific parameters
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

# <b><span style='color:#F1A424'>|</span> Training</b><a class='anchor' id='training'></a> [↑](#top) 

***


In [None]:
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="all",
    evaluation_strategy="steps",
    eval_steps=50  # Evaluate every 50 steps
)
# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset_mapped,
    eval_dataset=valid_dataset_mapped, 
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

# Train the model
trainer.train()


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss


In [None]:
# Save the fine-tuned model
trainer.model.save_pretrained(new_model)

# <b><span style='color:#F1A424'>|</span> Testing</b><a class='anchor' id='testing'></a> [↑](#top) 

***

Testing on test data

In [None]:
# Suppress logging messages to avoid unnecessary output
logging.set_verbosity(logging.CRITICAL)

# Create text generation pipelines using the specified model and tokenizer
# Define two pipelines with different maximum lengths
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=250)
pipe2 = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=500)

# Initialize an empty list to store generated text
generated_text = []

# Iterate over the test data
for i in tqdm(range(len(final_test_data))):
    # Extract the prompt from the test data
    prompt = final_test_data['prompt'].iloc[i]
    
    # Attempt to generate text using the first pipeline with a max length of 250
    try:
        result = pipe(prompt)
        # Append the generated text to the list, extracting the relevant part after '[/INST]'
        generated_text.append(result[0]['generated_text'].split('[/INST]')[1])
    except:
        # If an exception occurs, try the second pipeline with a max length of 500
        try:
            result = pipe2(prompt)
            # Append the generated text to the list, extracting the relevant part after '[/INST]'
            generated_text.append(result[0]['generated_text'].split('[/INST]')[1])
        except:
            # If both pipelines fail, append a default placeholder text
            generated_text.append("ABCD1234@#")

# The 'generated_text' list now contains the generated text for each prompt in the test data

In [None]:
# Assign the generated text to a new column 'generated_text' in the 'final_test_data' DataFrame
final_test_data['generated_text'] = generated_text

# Reset the index of the DataFrame for a cleaner representation in the CSV file
final_test_data = final_test_data.reset_index(drop=True)

# Save the DataFrame to a CSV file at the specified path
final_test_data.to_csv('/content/drive/MyDrive/llama2_finetune_output_1128.csv', index=False)

# <b><span style='color:#F1A424'>|</span> Saving Model for inference</b><a class='anchor' id='save_model'></a> [↑](#top) 

***


In [None]:
# Set the path where the merged model will be saved
model_path = "/content/drive/MyDrive/llama-2-7b-custom" 

# Reload the base model in FP16 and configure settings
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,  
    return_dict=True,        
    torch_dtype=torch.float16,  
    device_map=device_map,    
)

# Instantiate a PeftModel using the base model and the new model
model = PeftModel.from_pretrained(base_model, new_model)  # Combine the base model and the fine-tuned weights

# Merge the base model with LoRA weights and unload unnecessary parts
model = model.merge_and_unload()  # Finalize the model by merging and unloading any redundant components

# Reload the tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) 
tokenizer.pad_token = tokenizer