# Fine-tune LLaMA2 7b Model with PEFT method for Stock Price Prediction

reference
- https://www.kaggle.com/code/lucamassaron/fine-tune-llama-2-for-sentiment-analysis

As a first step, install the specific libraries necessary to make this work
- accelerate is a distributed traing library for PyTorch by HugglingFace. it allows you to train your models on mutiple GPU or CPUs in parallel(distributed configurations) which can significatly spped up traing in presense of multiple GPUs(I won't use it in this work.)
- peft is a python library by HuggingFace for effiecient adaptation of pre-trained language models(PLMs) to various downstream applications without fine-tuning all the model's parameters. PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs.
- bitsandbytes by Time Dettmers,is a lightweight wrapper around CUDA custom functions,in particular 8-bit optimizers,matrix multiplication(LLM.int8()), and quantization functions.It allows to run models stored in 4-bit precision: while 4-bit bitsandbytes stores weights in 4-bits, the computation still happens in 16 or 32-bit and here any combination can be chosen(float16,bfloat16,float32, and so on).
- transformers is a Python library for NLP, it provides a number of pre-trained models for NLP tasks such as text classification, question answering, and machine translation.
- trl is a full stack library by HuggingFace providing a set of tools to train transfomer language model with Reinforcement Learning, from the Supervised Fine-tuning step(SFT), Reward Modeling step(RM) to the Proximal Policy Optimization(PPO) step. 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

The code imports the os module and sets two environment variables:
- CUDA_VISIBLE_DEVICES: This environment variables tells PyTorch which GPUs to use. In this case, the code is setting the environment variable to 0, which means that PyTorch will use the first GPU.
- CUDA_VISIBLE_DEVICES: This environment variable tells the HuggingFace Transfomers library whether to parallelize the tokenization process. In this case, the code is setting the environment variable to false, which means the the tokenization process will not be parallelized.

In [None]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["TOKENIZERS_PARALLELISM"] = "false"

- The code import warnings;warnings.fiterwarnings("ignore") imports the warnings module and sets the warning filter to ignore. This means all warnings will be suppressed and will not be displayed. Actually during training there are many warnings that do not prevent the fine-tuning but can be distracting and make you wonder if you are doing the correct things.

In [None]:
import warnings
warnings.filterwarnings("ignore")
print("1")

In [None]:
!pip install -q -U "accelerate==0.26.1" 

In [None]:
!pip install -q -U "bitsandbytes==0.42.0"

In [None]:
!pip install -q -U  "transformers==4.38.2"

In [None]:
!pip install -q -U  "datasets==2.16.1"

In [None]:
!pip install tensorflow[and-cuda]

In [None]:

!pip install -q -U git+https://github.com/huggingface/peft@4a1559582281fc3c9283892caea8ccef1d6f5a4f

In [None]:
!pip install --upgrade pip
!pip uninstall keras
!pip install tensorflow

In [None]:
!pip3 install -q -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

In [None]:
!pip install git+https://github.com/huggingface/trl.git@7630f877f91c556d9e5a3baa4b6e2894d90ff84c

In [None]:
import numpy as np
import pandas as pd
import os
from tqdm import tqdm
import bitsandbytes as bnb
import torch
import torch.nn as nn
import transformers
from datasets import Dataset
from peft import LoraConfig, PeftConfig
from trl import SFTTrainer
from trl import setup_chat_format
from transformers import (AutoModelForCausalLM, 
                          AutoTokenizer, 
                          BitsAndBytesConfig, 
                          TrainingArguments, 
                          pipeline, 
                          logging)
from sklearn.metrics import (accuracy_score, 
                             classification_report, 
                             confusion_matrix)
from sklearn.model_selection import train_test_split

In [None]:
print(f"pytorch version {torch.__version__}")

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"working on {device}")

# Preparing the data and the core evaluation functioins
The code in the next cell performs the following steps:
1. Reads the input dataset from the all-data.csv file, which is a comma-separated value(CSV) file with two columns: sentiment and text.
2. Splits the dataset into training and test sets,with 300 samples in each set. The split is stratified by sentiment, so that each set contains a representative of positive,neutral, and negative sentiments.
3. Shuffles the train data in a replicable order(random_state=10)
4. Transfoms the texts contained in the train and test data into prompts to be used by LLamMa: the train prompts contains the expected answer we want to fine-tune the model-with
5. The residual examples not in train or test, for reporting purposes during during training (but it won't be used for early stopping), is treated as evaluatio  data, which is sampled with repetition in order to have a 50/50/50 sample (negative instances are very few, hence the shoud be repeated)
6. The train and eval data are wrapped by the class from HuggingFace's datasets library(backed by the Apache Arrow format)

This prepares in a single cell train_data, eval_data and test_data datasets to be used in the fine tuning.

In [None]:
filename = "../input/sentiment-analysis-for-financial-news/all-data.csv"

df = pd.read_csv(filename, 
                 names=["sentiment", "text"],
                 encoding="utf-8", encoding_errors="replace")

X_train = list()
X_test = list()
for sentiment in ["positive", "neutral", "negative"]:
    train, test  = train_test_split(df[df.sentiment==sentiment], 
                                    train_size=300,
                                    test_size=300, 
                                    random_state=42)
    X_train.append(train)
    X_test.append(test)

X_train = pd.concat(X_train).sample(frac=1, random_state=10)
X_test = pd.concat(X_test)

eval_idx = [idx for idx in df.index if idx not in list(X_train.index) + list(X_test.index)]
X_eval = df[df.index.isin(eval_idx)]
X_eval = (X_eval
          .groupby('sentiment', group_keys=False)
          .apply(lambda x: x.sample(n=50, random_state=10, replace=True)))
X_train = X_train.reset_index(drop=True)

def generate_prompt(data_point):
    return f"""
            Analyze the sentiment of the news headline enclosed in square brackets, 
            determine if it is positive, neutral, or negative, and return the answer as 
            the corresponding sentiment label "positive" or "neutral" or "negative".

            [{data_point["text"]}] = {data_point["sentiment"]}
            """.strip()

def generate_test_prompt(data_point):
    return f"""
            Analyze the sentiment of the news headline enclosed in square brackets, 
            determine if it is positive, neutral, or negative, and return the answer as 
            the corresponding sentiment label "positive" or "neutral" or "negative".

            [{data_point["text"]}] = """.strip()

X_train = pd.DataFrame(X_train.apply(generate_prompt, axis=1), 
                       columns=["text"])
X_eval = pd.DataFrame(X_eval.apply(generate_prompt, axis=1), 
                      columns=["text"])

y_true = X_test.sentiment
X_test = pd.DataFrame(X_test.apply(generate_test_prompt, axis=1), columns=["text"])

train_data = Dataset.from_pandas(X_train)
eval_data = Dataset.from_pandas(X_eval)

Next part to do is creating a function to evaluate the results from the fine-tuned sentiment model. The function performs the following setps"
1. Maps the sentiment labels to a numeriacal representation, where 2 represents positive, 1 represents neutral, and 0 represents negative.
2. Calculates the accuracy of the model on the test data.
3. Generates an accuracy report for each sentiment labal.
4. Generates a classification report for the model.
5. Generates a confusion matrix for the model.

In [None]:
def evaluate(y_true, y_pred):
    labels = ['positive', 'neutral', 'negative']
    mapping = {'positive': 2, 'neutral': 1, 'none':1, 'negative': 0}
    def map_func(x):
        return mapping.get(x, 1)
    
    y_true = np.vectorize(map_func)(y_true)
    y_pred = np.vectorize(map_func)(y_pred)
    
    # Calculate accuracy
    accuracy = accuracy_score(y_true=y_true, y_pred=y_pred)
    print(f'Accuracy: {accuracy:.3f}')
    
    # Generate accuracy report
    unique_labels = set(y_true)  # Get unique labels
    
    for label in unique_labels:
        label_indices = [i for i in range(len(y_true)) 
                         if y_true[i] == label]
        label_y_true = [y_true[i] for i in label_indices]
        label_y_pred = [y_pred[i] for i in label_indices]
        accuracy = accuracy_score(label_y_true, label_y_pred)
        print(f'Accuracy for label {label}: {accuracy:.3f}')
        
    # Generate classification report
    class_report = classification_report(y_true=y_true, y_pred=y_pred)
    print('\nClassification Report:')
    print(class_report)
    
    # Generate confusion matrix
    conf_matrix = confusion_matrix(y_true=y_true, y_pred=y_pred, labels=[0, 1, 2])
    print('\nConfusion Matrix:')
    print(conf_matrix)

# Testing the model without fine-tuning

Next we need to take care of the model, which is a 7b-hf(7 billion parameters, no RLHF(Reinforcement Learning From Human Feedback), in the HuggingFace compatible format), loading from Kaggle models and quantization.

Model loading and quantization:
- First the code loads the LLaMA2 language model from the HuggingFace Hub.
- the code gets the float16 type from the torch library. This is the data type that will be used for the computations.
- Next, it creates a BitsAndBytesConfig object with the following setting:
    1. load_in_4bit: Load the model weights in 4-bit format.
    2. bnb-4bit-quant_type: Use the "nf4" quantization type. 4-bit NormalFloat(NF4),is a new data type that is information theoretically optimal for normally distributed weights.
    3. bnb_4bit_compute_dtype: Use the float16 data type for computations.
    4. bnb_4bit_use_double_quant: Do not use double quantization(reduces the average memory footprint by quantizing also the quantization constants and saves an additional 0.4 bits per parameter.)
- Then the code creates a AutoModelForCasualLM object from the pre-trained LLaMA2 language model, using the BitAndBytesConfig object for quantization.
- After that, the code disables caching for model.
- Finally the code sets the pre-training token probability to 1.

Tokenizer loading:
- First, the code loads the tokenizer for the LLaMA2 language model.
- Then it sets the padding token to be the end-of-sequnce(EOS) token.
- Finally, the code sets the padding side to be "right",which means that the inpus sequences will be padded on the right side. this is crucial for correct padding direction (this is the way with LLaMA2).

#### docs of BitsAndByteConfig(https://huggingface.co/docs/transformers/main/en/main_classes/quantization#transformers.BitsAndBytesConfig)
- load_in_4bit (bool, optional, defaults to False) — This flag is used to enable 4-bit quantization by replacing the Linear layers with FP4(4-bit floating-point)/NF4((normalized float 4) layers from bitsandbytes.
- bnb_4bit_quant_type (str, optional, defaults to "fp4") — This sets the quantization data type in the bnb.nn.Linear4Bit layers. Options are FP4 and NF4 data types which are specified by fp4 or nf4.
- bnb_4bit_compute_dtype (torch.dtype or str, optional, defaults to torch.float32) — This sets the computational type which might be different than the input type. For example, inputs might be fp32, but computation can be set to bf16 for speedups.
-bnb_4bit_use_double_quant (bool, optional, defaults to False) — This flag is used for nested quantization where the quantization constants from the first quantization are quantized again.

In [None]:
model_name = "../input/llama-2/pytorch/7b-hf/1"

compute_dtype = getattr(torch, "float16")


bnb_config = BitsAndBytesConfig(
    
    load_in_4bit=True, 
    bnb_4bit_quant_type="nf4", 
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map=device,
    torch_dtype=compute_dtype,
    quantization_config=bnb_config, 
)

model.config.use_cache = False
model.config.pretraining_tp = 1

tokenizer = AutoTokenizer.from_pretrained(model_name, 
                                          trust_remote_code=True,
                                         )
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

model, tokenizer = setup_chat_format(model, tokenizer)

In the next cell, we set a function for predicting the sentiment of a neas headline using the LLaMA2 language model. The function takes three arguments.

test: a Pandas DataFrame containing the news headlines to be predicted. model: The pre-trained LLaMA2 language model. tokenizer: The tokenizer for the LLaMA2 language model.

The function works as follows:
1. For each news headling in the test DataFrame:
    - Create a prompt for the language model, which asks it to analyze the sentiment of the news headline and return the corresponding sentiment label.
    - Use the pipeline() function from HuggingFace Transformers library to generate text from the language model, using the prompt.
    - Extract the predicted sentiment label from the generated text.
    - Append the predicted sentiment label to the y-pred list.
2. Return the y_pred list

- The pipeline() function from the HuggingFace Transformers library is used to generate text from the language model.The task argument specifies that the task is text generation. The model and tokenizer argument specify the pre-trained LLaMa2 language model and the tokenizer for the language model. The max_new_tokens argument specifies the maximum number of new tokens to generate. The teperature argument controls the randomness of the generated text. A lower temperature will produce mode predictable text, while a higher temperature will produce more creative and unexpected text.

- The if statement checks if the generated text contains the word "positive". If it does then the predicted sentiment label is "positive". Otherwise, the if statement checks if the generated text contains the word "negative". If it does, the the predicted sentiment label is "negative".  Otherwise, the if statement checks if the generated text contains the word "neutral". If it does, the the predicted sentiment label is "neutral".

In [None]:
def predict(test, model, tokenizer):
    y_pred = []
    for i in tqdm(range(len(X_test))):
        prompt = X_test.iloc[i]["text"]
        pipe = pipeline(task="text-generation", 
                        model=model, 
                        tokenizer=tokenizer, 
                        max_new_tokens = 1, 
                        temperature = 0.0,
                       )
        result = pipe(prompt)
        answer = result[0]['generated_text'].split("=")[-1]
        if "positive" in answer:
            y_pred.append("positive")
        elif "negative" in answer:
            y_pred.append("negative")
        elif "neutral" in answer:
            y_pred.append("neutral")
        else:
            y_pred.append("none")
    return y_pred

At this point, we are ready to test the LLaMA2 7b-hf model and see how it performs on this problem without any fine-tuning. This allows to get insights on the model itself and establish a baseline.

In [None]:
y_pred = predict(test,model,tokenizer)

In the following cell, we evaluate the results. There is little to be said, it is performing really terribly because the 7b-hf model tends to just predict a neutral sentiment and seldom it detects positive or negative sentiment.

In [None]:
evaluate(y_true,y_pred)

# Fine-tuning

In the next cell we set everything ready. for the fine-tuning, We configures and initializes a Simple Fine-tuning Trainer(SFTTrainer) for training a large language model using the PEFT method, which should save time as it operates on a reduced number of parameters compared to the model's overall size. The PEFT method focuses on refining a limited set of (additional) model parameters, while keeping the majority of the pre-trained LLM parameters fixed. This signifucatly reduces both computatioanl and storage expenses. Additionally, this strategy addresses the challenge of catastrophic forgetting, which often occurs during the coplete fine-tuning of LLMs/

### PEFTConfig:

The peft_config object specified the parameters for PEFT. THe following are some of most important parameters:

- lora_alpha: The learning rate for the LoRA update metrices.
- lora_dropout: The dropout probability for the LoRA updata matrices.
- r: The rank of the LoRA update matrics.
- bias: The type of bias to use. The possible values are none,additive, and learned.
- task_type: The type of task that the model is being trained for, The possible valuse are CAUSAL_LM and MASKED_LM.

### TrainingArguments:

The traing arguments object specifies the parameters for training the model. The following are some of the most important parameters:

- output_dir: The directory where the training logs and checkpoints will be saved.
- num_train_epochs: The number of epochs to train the model for.
- per_device_train_batch_size: The number of samples in each batch on each device.
- gradient_accumulation_steps: The number of batches to accumulate gradients before updating the model parameters.
- optim: The optimizer to use for training the model.
- save_steps: The number of steps after which to save a checkpoint.
- logging_steps: The number of steps after which to log the training metrics.
- learning_rate: The learning rate for the optimizer.
- weight_decay: The weight decay parameter for the optimizer.
- fp16: Whether to use 16-bit floating-point precision.
- bf16: Whether to use BFloat16 precision.
- max_grad_norm: The maximum gradient norm.
- max_steps: The maximum number of steps to train the model for.
- warmup_ratio: The proportion of the training steps to use for warming up the learning rate.
- group_by_length: Whether to group the training samples by length.
- lr_scheduler_type: The type of learning rate scheduler to use.
- report_to: The tools to report the training metrics to.
- evaluation_strategy: The strategy for evaluating the model during training. 

### SFTTrainer:

The SFTTrainer is a custom trainer class from the TRL library. It is used to train large language models (also using the PEFT method).

The SFTTrainer object is initialized with the following arguments:
- model: The model to be trained.
- train_dataset: The training dataset.
- eval_dataset: The evaluation dataset.
- peft_config: The PEFT configuration.
- dataset_text_field: The name of the text field in the dataset.
- tokenizer: The tokenizer to use.
- args: The training arguments.
- packing: Whether to pack the training samples.
- max_seq_length: The maximum sequence length.

Once the SFTTrainer object is intialized, it can be used to train the model by calling the train() method.

In [None]:
output_dir="trained_weigths"

peft_config = LoraConfig(
        lora_alpha=16, 
        lora_dropout=0.1,
        r=64,
        bias="none",
        target_modules="all-linear",
        task_type="CAUSAL_LM",
)

training_arguments = TrainingArguments(
    output_dir=output_dir,                    # directory to save and repository id
    num_train_epochs=3,                       # number of training epochs
    per_device_train_batch_size=1,            # batch size per device during training
    gradient_accumulation_steps=8,            # number of steps before performing a backward/update pass
    gradient_checkpointing=True,              # use gradient checkpointing to save memory
    optim="paged_adamw_32bit",
    save_steps=0,
    logging_steps=25,                         # log every 10 steps
    learning_rate=2e-4,                       # learning rate, based on QLoRA paper
    weight_decay=0.001,
    fp16=True,
    bf16=False,
    max_grad_norm=0.3,                        # max gradient norm based on QLoRA paper
    max_steps=-1,
    warmup_ratio=0.03,                        # warmup ratio based on QLoRA paper
    group_by_length=True,
    lr_scheduler_type="cosine",               # use cosine learning rate scheduler
    report_to="tensorboard",                  # report metrics to tensorboard
    evaluation_strategy="epoch"               # save checkpoint every epoch
)

trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=train_data,
    eval_dataset=eval_data,
    peft_config=peft_config,
    dataset_text_field="text",
    tokenizer=tokenizer,
    max_seq_length=1024,
    packing=False,
    dataset_kwargs={
        "add_special_tokens": False,
        "append_concat_token": False,
    }
)

The following code will train the model using the trainer.train() method and then save the trained model to the trained model to the trained-model directory.

In [None]:
trainer.train()

The model and the tokenizer are saved to disk for later usage.

In [None]:
# Save trained model and tokenizer
trainer.save_model()
tokenizer.save_pretrained(output_dir)

Afterwards, loading the TensorBoard extension and start TensorBoard, pointing to the logs/runs directory, which is assumed to contain the training logs and checkpoints for your model, will allow you to understand how the models fits during the training.

In [None]:
%load_ext tensorboard
%tensorboard --logdir logs/runs

This is a static image of an interactive graphic.

# Saving model to disk for later usage

At this point, in order to demonstrate how to re-utilize the model, we reload it from the disk and merge it with the original LLaMA model.

In factm when working with QLoRA, we exclusively train adapters instead of the entire model. So, when you save the model during training, you're only preserving the adapter weights, not the entire model. If you can merge the adapter weights into the model weights using the merge_and_upload method. Then, you can save the model using the model using the save_pretrained method. This will create a default model that's ready for inference tasks.

Before proceeding, we first remove the previous model and clean up the memory from various onjects we won't use anymore.

In [None]:
import gc

del [model, tokenizer, peft_config, trainer, train_data, eval_data, bnb_config, training_arguments]
del [df, X_train, X_eval]
del [TrainingArguments, SFTTrainer, LoraConfig, BitsAndBytesConfig]

In [None]:
for _ in range(100)
    torch.cuda.empty_cache()
    gc.collect()

In [None]:
!nvidia-smi

Then we can proceed to merging the weights and we will be using the merged model for our purposes.

In [None]:
from peft import AutoPeftModelForCausalLM

finetuned_model = "./trained_weigths/"
compute_dtype = getattr(torch,'float16')
tokenizer = AutoTokenizer.from_pretrained("/kaggle/input/llama-2/pytorch/7b-hf/1")

model =  AutoPeftModelForCausalLM.from_pretrained(
     finetuned_model,
     torch_dtype=compute_dtype,
     return_dict=False,
     low_cpu_mem_usage=True,
     device_map=device,
)

merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged_model",safe_serialization=True, max_shard_size="2GB")
tokenizer.save_pretrained("./merged_model")

# Testing 

The following code will first predict the sentiment labels for the test set using the predict() function. Then, it will evaluate the model's perfomance on the test set using the evaluate() function. The result now should be impressive with an overall accuracy of over 0.8 and high accuracy, precision and recall for the single sentiment labels. The prediction of the neutral label can still be improved, yet it is impressive how much could be done with little data and some fine-tuning.

In [None]:
y_pred = predict(test, merged_model, tokenizer)
evaluate(y_true, y_pred)

The following code will create a Pandas DataFrame called evaluation containing the text,true labels, and predicted labels from the test set. This is expectially useful for understanding the errors that the fine-tuned model makes, and getting insights on how to improve the prompt.

In [None]:
evaluation = pd.DataFrame({'text': X_test["text"], 
                           'y_true':y_true, 
                           'y_pred': y_pred},
                         )
evaluation.to_csv("test_predictions.csv", index=False)