# Fine-tune LLaMA2 7b Model with PEFT method for Stock Price Prediction

reference
- https://www.kaggle.com/code/lucamassaron/fine-tune-llama-2-for-sentiment-analysis

As a first step, install the specific libraries necessary to make this work
- accelerate is a distributed traing library for PyTorch by HugglingFace. it allows you to train your models on mutiple GPU or CPUs in parallel(distributed configurations) which can significatly spped up traing in presense of multiple GPUs(I won't use it in this work.)
- peft is a python library by HuggingFace for effiecient adaptation of pre-trained language models(PLMs) to various downstream applications without fine-tuning all the model's parameters. PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs.
- bitsandbytes by Time Dettmers,is a lightweight wrapper around CUDA custom functions,in particular 8-bit optimizers,matrix multiplication(LLM.int8()), and quantization functions.It allows to run models stored in 4-bit precision: while 4-bit bitsandbytes stores weights in 4-bits, the computation still happens in 16 or 32-bit and here any combination can be chosen(float16,bfloat16,float32, and so on).
- transformers is a Python library for NLP, it provides a number of pre-trained models for NLP tasks such as text classification, question answering, and machine translation.
- trl is a full stack library by HuggingFace providing a set of tools to train transfomer language model with Reinforcement Learning, from the Supervised Fine-tuning step(SFT), Reward Modeling step(RM) to the Proximal Policy Optimization(PPO) step. 

In [3]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/sentiment-analysis-for-financial-news/all-data.csv
/kaggle/input/sentiment-analysis-for-financial-news/FinancialPhraseBank/Sentences_66Agree.txt
/kaggle/input/sentiment-analysis-for-financial-news/FinancialPhraseBank/Sentences_AllAgree.txt
/kaggle/input/sentiment-analysis-for-financial-news/FinancialPhraseBank/README.txt
/kaggle/input/sentiment-analysis-for-financial-news/FinancialPhraseBank/License.txt
/kaggle/input/sentiment-analysis-for-financial-news/FinancialPhraseBank/Sentences_75Agree.txt
/kaggle/input/sentiment-analysis-for-financial-news/FinancialPhraseBank/Sentences_50Agree.txt
/kaggle/input/llama-2/pytorch/7b-hf/1/model.safetensors.index.json
/kaggle/input/llama-2/pytorch/7b-hf/1/config.json
/kaggle/input/llama-2/pytorch/7b-hf/1/model-00001-of-00002.safetensors
/kaggle/input/llama-2/pytorch/7b-hf/1/Responsible-Use-Guide.pdf
/kaggle/input/llama-2/pytorch/7b-hf/1/model-00002-of-00002.safetensors
/kaggle/input/llama-2/pytorch/7b-hf/1/pytorch_model-00002-of-00002.b

The code imports the os module and sets two environment variables:
- CUDA_VISIBLE_DEVICES: This environment variables tells PyTorch which GPUs to use. In this case, the code is setting the environment variable to 0, which means that PyTorch will use the first GPU.
- CUDA_VISIBLE_DEVICES: This environment variable tells the HuggingFace Transfomers library whether to parallelize the tokenization process. In this case, the code is setting the environment variable to false, which means the the tokenization process will not be parallelized.

In [None]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["TOKENIZERS_PARALLELISM"] = "false"

- The code import warnings;warnings.fiterwarnings("ignore") imports the warnings module and sets the warning filter to ignore. This means all warnings will be suppressed and will not be displayed. Actually during training there are many warnings that do not prevent the fine-tuning but can be distracting and make you wonder if you are doing the correct things.

In [None]:
import warnings
warnings.filterwarnings("ignore")
print("1")

In [None]:
!pip install -q -U "accelerate==0.26.1" 

In [None]:
!pip install -q -U "bitsandbytes==0.42.0"

In [None]:
!pip install -q -U  "transformers==4.38.2"

In [None]:
!pip install -q -U  "datasets==2.16.1"

In [None]:
!pip install tensorflow[and-cuda]

In [None]:

!pip install -q -U git+https://github.com/huggingface/peft@4a1559582281fc3c9283892caea8ccef1d6f5a4f

In [None]:
!pip install --upgrade pip
!pip uninstall keras
!pip install tensorflow

In [None]:
!pip3 install -q -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

In [None]:
!pip install git+https://github.com/huggingface/trl.git@7630f877f91c556d9e5a3baa4b6e2894d90ff84c

In [None]:
import numpy as np
import pandas as pd
import os
from tqdm import tqdm
import bitsandbytes as bnb
import torch
import torch.nn as nn
import transformers
from datasets import Dataset
from peft import LoraConfig, PeftConfig
from trl import SFTTrainer
from trl import setup_chat_format
from transformers import (AutoModelForCausalLM, 
                          AutoTokenizer, 
                          BitsAndBytesConfig, 
                          TrainingArguments, 
                          pipeline, 
                          logging)
from sklearn.metrics import (accuracy_score, 
                             classification_report, 
                             confusion_matrix)
from sklearn.model_selection import train_test_split

In [None]:
print(f"pytorch version {torch.__version__}")

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"working on {device}")

# Preparing the data and the core evaluation functioins
The code in the next cell performs the following steps:
1. Reads the input dataset from the all-data.csv file, which is a comma-separated value(CSV) file with two columns: sentiment and text.
2. Splits the dataset into training and test sets,with 300 samples in each set. The split is stratified by sentiment, so that each set contains a representative of positive,neutral, and negative sentiments.
3. Shuffles the train data in a replicable order(random_state=10)
4. Transfoms the texts contained in the train and test data into prompts to be used by LLamMa: the train prompts contains the expected answer we want to fine-tune the model-with
5. The residual examples not in train or test, for reporting purposes during during training (but it won't be used for early stopping), is treated as evaluatio  data, which is sampled with repetition in order to have a 50/50/50 sample (negative instances are very few, hence the shoud be repeated)
6. The train and eval data are wrapped by the class from HuggingFace's datasets library(backed by the Apache Arrow format)

This prepares in a single cell train_data, eval_data and test_data datasets to be used in the fine tuning.

In [2]:
filename = "../input/sentiment-analysis-for-financial-news/all-data.csv"

df = pd.read_csv(filename, 
                 names=["sentiment", "text"],
                 encoding="utf-8", encoding_errors="replace")

X_train = list()
X_test = list()
for sentiment in ["positive", "neutral", "negative"]:
    train, test  = train_test_split(df[df.sentiment==sentiment], 
                                    train_size=300,
                                    test_size=300, 
                                    random_state=42)
    X_train.append(train)
    X_test.append(test)

X_train = pd.concat(X_train).sample(frac=1, random_state=10)
X_test = pd.concat(X_test)

eval_idx = [idx for idx in df.index if idx not in list(X_train.index) + list(X_test.index)]
X_eval = df[df.index.isin(eval_idx)]
X_eval = (X_eval
          .groupby('sentiment', group_keys=False)
          .apply(lambda x: x.sample(n=50, random_state=10, replace=True)))
X_train = X_train.reset_index(drop=True)

def generate_prompt(data_point):
    return f"""
            Analyze the sentiment of the news headline enclosed in square brackets, 
            determine if it is positive, neutral, or negative, and return the answer as 
            the corresponding sentiment label "positive" or "neutral" or "negative".

            [{data_point["text"]}] = {data_point["sentiment"]}
            """.strip()

def generate_test_prompt(data_point):
    return f"""
            Analyze the sentiment of the news headline enclosed in square brackets, 
            determine if it is positive, neutral, or negative, and return the answer as 
            the corresponding sentiment label "positive" or "neutral" or "negative".

            [{data_point["text"]}] = """.strip()

X_train = pd.DataFrame(X_train.apply(generate_prompt, axis=1), 
                       columns=["text"])
X_eval = pd.DataFrame(X_eval.apply(generate_prompt, axis=1), 
                      columns=["text"])

y_true = X_test.sentiment
X_test = pd.DataFrame(X_test.apply(generate_test_prompt, axis=1), columns=["text"])

train_data = Dataset.from_pandas(X_train)
eval_data = Dataset.from_pandas(X_eval)

NameError: name 'pd' is not defined

Next part to do is creating a function to evaluate the results from the fine-tuned sentiment model. The function performs the following setps"
1. Maps the sentiment labels to a numeriacal representation, where 2 represents positive, 1 represents neutral, and 0 represents negative.
2. Calculates the accuracy of the model on the test data.
3. Generates an accuracy report for each sentiment labal.
4. Generates a classification report for the model.
5. Generates a confusion matrix for the model.

In [None]:
def evaluate(y_true, y_pred):
    labels = ['positive', 'neutral', 'negative']
    mapping = {'positive': 2, 'neutral': 1, 'none':1, 'negative': 0}
    def map_func(x):
        return mapping.get(x, 1)
    
    y_true = np.vectorize(map_func)(y_true)
    y_pred = np.vectorize(map_func)(y_pred)
    
    # Calculate accuracy
    accuracy = accuracy_score(y_true=y_true, y_pred=y_pred)
    print(f'Accuracy: {accuracy:.3f}')
    
    # Generate accuracy report
    unique_labels = set(y_true)  # Get unique labels
    
    for label in unique_labels:
        label_indices = [i for i in range(len(y_true)) 
                         if y_true[i] == label]
        label_y_true = [y_true[i] for i in label_indices]
        label_y_pred = [y_pred[i] for i in label_indices]
        accuracy = accuracy_score(label_y_true, label_y_pred)
        print(f'Accuracy for label {label}: {accuracy:.3f}')
        
    # Generate classification report
    class_report = classification_report(y_true=y_true, y_pred=y_pred)
    print('\nClassification Report:')
    print(class_report)
    
    # Generate confusion matrix
    conf_matrix = confusion_matrix(y_true=y_true, y_pred=y_pred, labels=[0, 1, 2])
    print('\nConfusion Matrix:')
    print(conf_matrix)

# Testing the model without fine-tuning

Next we need to take care of the model, which is a 7b-hf(7 billion parameters, no RLHF(Reinforcement Learning From Human Feedback), in the HuggingFace compatible format), loading from Kaggle models and quantization.

Model loading and quantization:
- First the code loads the LLaMA2 

#### docs of BitsAndByteConfig(https://huggingface.co/docs/transformers/main/en/main_classes/quantization#transformers.BitsAndBytesConfig)
- load_in_4bit (bool, optional, defaults to False) — This flag is used to enable 4-bit quantization by replacing the Linear layers with FP4(4-bit floating-point)/NF4((normalized float 4) layers from bitsandbytes.
- bnb_4bit_quant_type (str, optional, defaults to "fp4") — This sets the quantization data type in the bnb.nn.Linear4Bit layers. Options are FP4 and NF4 data types which are specified by fp4 or nf4.
- bnb_4bit_compute_dtype (torch.dtype or str, optional, defaults to torch.float32) — This sets the computational type which might be different than the input type. For example, inputs might be fp32, but computation can be set to bf16 for speedups.
-bnb_4bit_use_double_quant (bool, optional, defaults to False) — This flag is used for nested quantization where the quantization constants from the first quantization are quantized again.

In [None]:
model_name = "../input/llama-2/pytorch/7b-hf/1"

compute_dtype = getattr(torch, "float16")


bnb_config = BitsAndBytesConfig(
    
    load_in_4bit=True, 
    bnb_4bit_quant_type="nf4", 
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map=device,
    torch_dtype=compute_dtype,
    quantization_config=bnb_config, 
)

model.config.use_cache = False
model.config.pretraining_tp = 1

tokenizer = AutoTokenizer.from_pretrained(model_name, 
                                          trust_remote_code=True,
                                         )
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

model, tokenizer = setup_chat_format(model, tokenizer)