# Finetune Mistral7b LLM for Sentiment Analysis

Large Language Models (LLMs) like Mistral 7B are powerful but computationally expensive to fine-tune. This is due to their massive number of parameters. To address this, a technique called Parameter-Efficient Fine-Tuning (PEFT) has emerged.

PEFT involves various methods to fine-tune LLMs efficiently. One such method is Low-Rank Adaptation (LoRA), which decomposes weight matrices into smaller, trainable matrices. This significantly reduces the number of parameters to be trained, leading to lower memory usage and faster training times.

QLoRA is an extension of LoRA that further enhances efficiency by quantizing the base model to 4-bit floating-point precision. Quantization reduces memory footprint and computational cost without sacrificing much performance.

By combining LoRA and quantization, QLoRA enables efficient fine-tuning of large language models like Mistral 7B, making it accessible to a wider range of users and applications. In this notebook, we will fine-tune Mistral 7b with QLoRA.


To implement QLoRA, we'll leverage these essential libraries:

1. Hugging Face Transformers: A powerful library for working with state-of-the-art language models. It provides access to a wide range of pre-trained models and tokenizers.
2. PEFT (Parameter-Efficient Fine-Tuning): This library allows us to fine-tune only a small subset of model parameters, significantly reducing computational costs and training time.
3. BitsandBytes: This library enables quantization, a technique that reduces the precision of model parameters to 8-bit, further optimizing memory usage and accelerating training.
4. Accelerate: This library optimizes PyTorch training, making it more efficient and faster.
5. trl (Transformers Reinforcement Learning): This library is used for reinforcement learning tasks, enabling us to train language models using RL techniques.
6. Einops: This library simplifies tensor operations, making code more concise and readable.
7. Datasets: This library provides easy access to a variety of datasets, including those from the Hugging Face Hub.

By combining these libraries, we can efficiently fine-tune large language models like Mistral 7B using QLoRA.

In [None]:
!pip install -q trl transformers accelerate peft datasets bitsandbytes einops

### Import the libraries

In [None]:
import os
import torch
import gc
import transformers
import pandas as pd
from trl import SFTTrainer
from peft import LoraConfig, PeftModel, PeftConfig
from sklearn.model_selection import train_test_split
from datasets import load_dataset, Dataset, DatasetDict
from peft import  prepare_model_for_kbit_training, get_peft_model, TaskType


from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    GenerationConfig,
    AutoTokenizer,
    TrainingArguments,
    pipeline,
    logging
    )



### Dataset
 FinGPT twitter [data](https://huggingface.co/datasets/FinGPT/fingpt-sentiment-train)

Load the dataset and convert it into dataframe

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
data = load_dataset('FinGPT/fingpt-sentiment-train', split='train')

In [None]:
data = pd.DataFrame(data)

data.head(10)

Unnamed: 0,input,output,instruction
0,"Teollisuuden Voima Oyj , the Finnish utility k...",neutral,What is the sentiment of this news? Please cho...
1,Sanofi poaches AstraZeneca scientist as new re...,neutral,What is the sentiment of this news? Please cho...
2,Starbucks says the workers violated safety pol...,moderately negative,What is the sentiment of this news? Please cho...
3,$brcm raises revenue forecast,positive,What is the sentiment of this tweet? Please ch...
4,Google parent Alphabet Inc. reported revenue a...,moderately negative,What is the sentiment of this news? Please cho...
5,The Finnish company Stockmann has signed the c...,neutral,What is the sentiment of this news? Please cho...
6,"Bernie Madoff, the former Wall Street investme...",neutral,What is the sentiment of this tweet? Please ch...
7,Wall Street's 7 Highest-Rated Stocks to Buy,neutral,What is the sentiment of this tweet? Please ch...
8,Here we highlight some top-ranked technology E...,mildly positive,What is the sentiment of this news? Please cho...
9,Financial terms were not disclosed .,neutral,What is the sentiment of this news? Please cho...


In [None]:
data = data[['input', 'output']]
data = data.rename(columns={'input': 'text', 'output': 'label'})
data.head()

Unnamed: 0,text,label
0,"Teollisuuden Voima Oyj , the Finnish utility k...",neutral
1,Sanofi poaches AstraZeneca scientist as new re...,neutral
2,Starbucks says the workers violated safety pol...,moderately negative
3,$brcm raises revenue forecast,positive
4,Google parent Alphabet Inc. reported revenue a...,moderately negative


Convert the label from number to text.

We'll format the data to include both the text input and the desired output. This combined dataset will then be used to train the LLM. By learning from this enriched dataset, the LLM will be better equipped to generate accurate and relevant outputs.

In [None]:
data['formatted_data'] = data.apply(lambda row: str(row['text']) + " ->: " + row['label'], axis = 1)
data.head()

Unnamed: 0,text,label,formatted_data
0,"Teollisuuden Voima Oyj , the Finnish utility k...",neutral,"Teollisuuden Voima Oyj , the Finnish utility k..."
1,Sanofi poaches AstraZeneca scientist as new re...,neutral,Sanofi poaches AstraZeneca scientist as new re...
2,Starbucks says the workers violated safety pol...,moderately negative,Starbucks says the workers violated safety pol...
3,$brcm raises revenue forecast,positive,$brcm raises revenue forecast ->: positive
4,Google parent Alphabet Inc. reported revenue a...,moderately negative,Google parent Alphabet Inc. reported revenue a...


In [None]:
data['formatted_data'][0]

'Teollisuuden Voima Oyj , the Finnish utility known as TVO , said it shortlisted Mitsubishi Heavy s EU-APWR model along with reactors from Areva , Toshiba Corp. , GE Hitachi Nuclear Energy and Korea Hydro & Nuclear Power Co. . ->: neutral'

We are using only a few data points because we have low computation resources. Let's split the data into train and test splits.

In [None]:
train_df, test_df = train_test_split(data, test_size=0.2, random_state=42)

In [None]:
train_df.head()

Unnamed: 0,text,label,formatted_data
65378,$FB bot some @78.47 breakout from the consolid...,positive,$FB bot some @78.47 breakout from the consolid...
30092,It is expected to be online by 2011 .,neutral,It is expected to be online by 2011 . ->: neutral
38866,$PCLN Back to Back intraday reversals look out...,negative,$PCLN Back to Back intraday reversals look out...
75692,The FRED Blog maps economic uncertainty around...,neutral,The FRED Blog maps economic uncertainty around...
34795,Intel (INTC) stock is falling on Wednesday aft...,moderately negative,Intel (INTC) stock is falling on Wednesday aft...


In [None]:
test_df.head()

Unnamed: 0,text,label,formatted_data
55022,Netflix is rolling out a test to limit passwor...,mildly negative,Netflix is rolling out a test to limit passwor...
46669,Should We Worry About Monster Beverage Corpora...,neutral,Should We Worry About Monster Beverage Corpora...
62135,$FIO won't stay down,positive,$FIO won't stay down ->: positive
54882,The new shares will provide the shareholders w...,neutral,The new shares will provide the shareholders w...
46836,Finnlines estimated in its annual general meet...,negative,Finnlines estimated in its annual general meet...


Huggingface transformer models expect the training data in datasetdict format.

In [None]:
train_dict = DatasetDict({
    'train': Dataset.from_pandas(train_df)
})

In [None]:
train_dict

DatasetDict({
    train: Dataset({
        features: ['text', 'label', 'formatted_data', '__index_level_0__'],
        num_rows: 61417
    })
})

## Load the base model with quantization.

To optimize the model for efficient training, we'll employ quantization techniques using the BitsandBytes library. This library allows us to load the base model in 4-bit precision, significantly reducing memory consumption.

We'll utilize the AutoModelForCausalLM class from the Hugging Face Transformers library to load the model and configure it for training. The trust_remote_code parameter is set to True to access the model directly from the Hugging Face Hub. To prevent unnecessary memory usage during training, we'll disable the KV cache by setting model.config.use_cache to False.

To further optimize the training process, we'll enable gradient checkpointing. This technique trades off computation time for memory savings, making it suitable for large models.

Additionally, we'll explore double quantization, a method that can further reduce the memory footprint by quantizing the quantization constants themselves. While this can lead to a slight performance degradation, it can be beneficial for extremely large models or resource-constrained environments.

## Configure LoRA Adapter.

LoRA leverages an adapter technique to introduce trainable parameters into the pre-trained model. This allows for efficient fine-tuning without modifying the original weights.

In the code snippet, we configure the LoRA adapter with the following parameters:

lora_alpha: This scaling factor controls the magnitude of updates to the model's weights. A higher value allows for more aggressive updates.

lora_dropout: This parameter introduces dropout to the LoRA layers, helping to prevent overfitting.

r: This parameter defines the rank of the low-rank matrices used in the adapter. A higher rank allows for more complex adaptations but also increases the number of trainable parameters.

bias: This parameter determines whether the bias terms of the target modules should be fine-tuned. Setting it to "none" preserves the original bias values.

task_type: This parameter specifies the task for which the model is being fine-tuned, such as text generation or classification.

target_modules: This parameter specifies the layers of the model that will be modified by the LoRA adapter. Typically, linear layers, attention layers, or both are targeted.

By carefully tuning these parameters, we can achieve effective fine-tuning while minimizing the computational cost.

In [None]:
# Model
base_model = "mistralai/Mistral-7B-Instruct-v0.2"
new_model = "mistral-7b-Sentiment"

# QLoRA config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# LoRA config
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
)

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)
#tokenizer.pad_token = tokenizer.unk_token
tokenizer.pad_token = tokenizer.eos_token

# Load model
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
)

model.gradient_checkpointing_enable()

# Prepares the model for kbit training
model = prepare_model_for_kbit_training(model)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Get the model tokenizer and set the padding token to be the same as the end-of-sequence token.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Combine the quantized base model with LoRA adapter using get_peft_model and pass the peft_config along with the pretrained base model.

In [None]:
# Now you get a model ready for QLoRA training
lora_model = get_peft_model(model, peft_config)
lora_model.print_trainable_parameters()

trainable params: 41,943,040 || all params: 7,283,675,136 || trainable%: 0.5758


We'll fine-tune the constructed custom LoRA model using huggingface Trainer API.
First, we'll set the training arguments.

**Training Arguments**

The following training arguments influence the efficiency and effectiveness of the fine-tuning process:

output_dir: Specifies the directory where the trained model and its checkpoints will be saved.

per_device_train_batch_size: Determines the number of training samples processed per GPU in each training step.

gradient_accumulation_steps: Accumulates gradients over multiple steps before updating the model, effectively increasing the batch size without requiring more memory.

optimizer: The optimizer used to update the model's parameters. paged_adamw_32bit is a memory-efficient optimizer suitable for 32-bit GPUs.

save_steps: Specifies the frequency at which model checkpoints are saved.

fp16: Enables mixed-precision training, reducing memory usage but potentially impacting accuracy.

logging_steps: Determines the frequency of logging training progress.

learning_rate: Sets the initial learning rate for the optimizer.

max_grad_norm: Clips the gradient norm to prevent exploding gradients and improve stability.

max_steps: Sets the total number of training steps.

warmup_ratio: Specifies the proportion of training steps during which the learning rate is gradually increased.

lr_scheduler_type: Determines the learning rate schedule, influencing the learning rate's behavior over time.

The optimal values for these hyperparameters can vary depending on factors like the specific language model, the downstream task, and available computational resources. Experimentation is often necessary to find the best configuration.

In [None]:
training_arguments = TrainingArguments(
    output_dir='./results',
    per_device_train_batch_size=10,
    per_device_eval_batch_size=10,
    gradient_accumulation_steps=1,
    eval_steps=0.2,
    optim="paged_adamw_8bit",
    num_train_epochs=3,
    save_steps=10,
    logging_steps=10,
    learning_rate=2e-4,
    fp16=True,
    max_grad_norm=0.3,
    max_steps=150,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type='constant'
)

Get tokenizer specific to the pre-trained model and set the padding token to end-of-sequence token.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Initialize an object of the huggingface SFTTrainer class, pass both the dataset and the column that we want to use to train the model for our specific use case. We'll also pass the peft configurations as inputs(to use the Lora configuration that we set earlier), tokenizer, maximum sequence length, model and the training arguments.

SFTTrainer is optimized for Supervised Fine-tuning.

In [None]:
max_seq_length = 512

trainer = SFTTrainer(
    model= lora_model,
    train_dataset=train_dict['train'],
    peft_config=peft_config,
    dataset_text_field="formatted_data",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/61417 [00:00<?, ? examples/s]

  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
max_steps is given, it will override any value given in num_train_epochs


We will freeze the base model's weights to preserve its original knowledge and stability. To ensure optimal performance, we'll maintain the layer normalization layers in 32-bit precision. This allows for more accurate calculations during training.

By freezing the base model and using 32-bit precision for layer normalization, we can achieve a balance between computational efficiency and model accuracy. This approach accelerates the fine-tuning process while maintaining the quality of the resulting model.

We will convert the model layer norms to float 32. This step is taken to ensure more stable training.After this, we will proceed with training the model.

This will cast the weights to higher precision floats in the layers at the time of computation resulting in higher speed of fine-tuning.

In [None]:
# Loop through the named modules of the trainer's model
#for name, module in trainer.model.named_modules():

# Check if the name contains "norm"
    #if "norm" in name:
	# Convert the module to use torch.float32 data type
	    #module = module.to(torch.float32)

We will set torch dtype and attention implementation based on device capability

In [None]:
if torch.cuda.get_device_capability()[0] >= 8:
    !pip install -qqq flash-attn
    torch_dtype = torch.bfloat16
    attn_implementation = "flash_attention_2"
else:
    torch_dtype = torch.float16
    attn_implementation = "eager"

## Train the model.

In [None]:
# Disabling cache usage in the model configuration
lora_model.config.use_cache = False

trainer.train()
trainer.model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
10,2.6095
20,2.5197
30,2.5812
40,2.5102
50,2.7655
60,2.4324
70,2.576
80,2.4942
90,2.3862
100,2.5814


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

('mistral-7b-Sentiment-v0.2/tokenizer_config.json',
 'mistral-7b-Sentiment-v0.2/special_tokens_map.json',
 'mistral-7b-Sentiment-v0.2/tokenizer.model',
 'mistral-7b-Sentiment-v0.2/added_tokens.json',
 'mistral-7b-Sentiment-v0.2/tokenizer.json')

## Save the model and tokenizer.


In [None]:
lora_model.save_pretrained("mistral-7b-Sentiment/")

In [None]:
tokenizer.save_pretrained("mistral-7b-Sentiment/")

('mistral-7b-Sentiment/tokenizer_config.json',
 'mistral-7b-Sentiment/special_tokens_map.json',
 'mistral-7b-Sentiment/tokenizer.model',
 'mistral-7b-Sentiment/added_tokens.json',
 'mistral-7b-Sentiment/tokenizer.json')

## Inference

In [None]:
peft_model = './mistral-7b-Sentiment'
config = PeftConfig.from_pretrained(peft_model)

peft_base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    trust_remote_code=True,
    device_map='auto'
    )

# Load the Lora model
trained_model = PeftModel.from_pretrained(peft_base_model, peft_model)
trained_model_tokenizer = AutoTokenizer.from_pretrained(
    config.base_model_name_or_path,
    trust_remote_code=True
)

trained_model_tokenizer.pad_token = trained_model_tokenizer.eos_token

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

## Create generation config for prediction.

We are  getting a sample text from the test dataset for inference.

In [None]:
sample = test_df.iloc[7, :]
sample_text = sample['text']

In [None]:
sample_text

"One of the installed elevators will be a double-deck elevator , which enables more efficient use of the building 's core space ."

Tokenize the sample text.

In [None]:
batch = tokenizer(sample_text, return_tensors='pt').to("cuda")

We need to create a generation configuration for the inference.

In [None]:
gen_config = GenerationConfig(
    max_new_tokens = 5,
    attention_mask=batch.attention_mask,
    pad_token_id = trained_model_tokenizer.pad_token_id,
    eos_token_id = trained_model_tokenizer.eos_token_id,
    repetition_penalty=2.0,
    num_return_sequences=1
 )

Pass the generation configuration into pytorch's inference mode. Pytorch inference mode is a better version of torch.no_grad which disables computing gradients.

In [None]:
with torch.inference_mode():
    result = trained_model.generate(
        input_ids=batch.input_ids,
        generation_config=gen_config,
    )

final_output = trained_model_tokenizer.decode(result[0], skip_special_tokens=True)
final_output


"One of the installed elevators will be a double-deck elevator , which enables more efficient use of the building 's core space . ->: neutral <->"

In [None]:
def chat():
    while True:
        user_prompt = input("\nType a sentence for sentiment classification: ")
        if user_prompt.lower() in ["exit", "quit"]:
            print("Goodbye!")
            break
        #response = generate_response(user_prompt)
        # Encode the user prompt and generate the text
        inputs = tokenizer(user_prompt, return_tensors="pt").to("cuda")
        output = trained_model.generate(inputs.input_ids, generation_config=gen_config)
        # Decode the generated tokens back to text
        generated_text = trained_model_tokenizer.decode(output[0], skip_special_tokens=True)
        print(generated_text)

In [None]:
chat()