##  Get Insight from your Business Data - Build LLM application with PEFT (with LoRA) using 🤗 Hugging Face by Ashish Kumar Jain

### LoRA is one of widely used PEFT method even in most cases when someone says PEFT, they typically mean LoRA only. In the LoRA, we kept original weights of model frozen and inject the small new trainable parameters with low dimensions matrices. Idea here is that we can injects two new low dimension matrices (assume A and B) along side of actual weights matrices. Sizes of these two new matrices can be selected be in such a way so there product (Assume C) should be same dimension  as the size of actual model weights. During the fine tuning of model for a task, all pre-trained model parameters are kept frozen, and only A and B matrices are trainable. Once the fine tuning is completed, we have these new weights which is product of A and B matrix. These weights is now trained for a specific tasks.

### In the notebook In the blog we will use 🤗 Hugging Face, it is a platform where machine learning community collaborates on models, datasets and applications. Hugging Face introduce the 🤗 PEFT library, which provides the latest Parameter-Efficient Fine-tuning techniques seamlessly integrated with Transformers. It support LoRA technique as well. We will use this PEFT library for our implementation. We will also use Hugging Face to download one of the open source LLM model FLAN_T5 from Google . We will load this model from the local machine. You can easily download this model from Hugging Face by cloning the model repository. It will help you to run this code without internet or in very constrained environment. Downloading model can take time depending on your network speed.

In [None]:
! pip install --upgrade pip
! pip install 'transformers[torch]'
! pip install datasets
! pip install evaluate==0.4.0
! pip install rouge_score==0.1.2
! pip install peft==0.3.0 
! pip install loralib==0.1.1

#### Hugging Face provides Datasets library for easily accessing and sharing datasets. We can load the dataset in a single line of code from multiple sources (Hugging Face hub, local files systems and memory etc) in different formats (CSV, JSON, parquet, arrow, sql for reading from database etc) and use its powerful data processing methods to quickly get our dataset ready for training with LLM.

#### Sample Data Format

```json
{
    "version": "0.1.0",
    "data": [
        {
            "Question": "Generate a list of three uses of big data?",
            "Answer": "1. Big data can be used to identify patterns and trends in customer behavior.\\n2. Big data can be used to improve customer service and experience.\\n3. Big data can be used to develop predictive models for marketing and sales."
        },
        {
            "Question": "Predict the weather tomorrow morning.?",
            "Answer": "Tomorrow morning is expected to be sunny with temperatures ranging from 15 to 19 degrees Celsius."
        }
   ]
}
```

In [None]:
from datasets import load_dataset

data_files = {"train":"dataset/GK/train.json",
              "test":"dataset/GK/validation.json",
              "validation":"dataset/GK/test.json"
             }
dataset = load_dataset("json",data_files = data_files,field="data")

#### Hugging Face provides tokenizer class which is in charge of preparing the inputs for a model. We will use open source FLAN-T5-LARGE model from Hugging Face and load from local. It is a good encode-decoder instruct model. It shows good capability in many tasks.

#### You can easily download flan-t5-large model from Hugging Face by cloning the model repository. 
#### git clone https://huggingface.co/google/flan-t5-large

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
modelPath = "model/flan-t5-large"
tokenizer = AutoTokenizer.from_pretrained(modelPath)
base_model = AutoModelForSeq2SeqLM.from_pretrained(modelPath)

#### We will create instructed dataset for the training. We will to convert the question-answer pairs into explicit instructions for the LLM. Lets create a prompt instruction having instruction start and end of prompt.

In [None]:
def prompt_generator(batchData):
    start = 'Assuming you are working as General Knowladge instructor. Can you please answer the below question?\n\n'
    end = '\n Answer: '
    training_prompt = [start + question + end for question in batchData['Question']]
    batchData['input_ids'] = tokenizer(training_prompt, padding="max_length", return_tensors="pt").input_ids
    batchData['labels'] = tokenizer(batchData['Answer'], padding="max_length", return_tensors="pt").input_ids
    return batchData

instructed_datasets = dataset.map(prompt_generator, batched=True)
instructed_datasets = instructed_datasets.remove_columns(['id','Question', 'Answer'])
#print(instructed_datasets)

#### Let's create the LoRA configuration where we will specify rank r as hyper parameter and other configuration parameter. We will then create the PEFT version of base model using which we can train the new LoRA matrices (adapter).

In [None]:
from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32, # Rank
    lora_alpha=32,#LoRA scaling factor
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)
peft_model = get_peft_model(base_model, 
                            lora_config)

#### We will use the PyTorch framework for peft/LoRA fine tuning the model. The Hugging Face Trainer class provides an API for feature-complete training in PyTorch for most standard use cases. Before instantiating our Trainer object, we will create a TrainingArguments to access all the points of customization during training. In below code i am using only 1 epoch for model training, you can choose no. of epochs and other training parameter based on your compute, memory available and based on the final model evaluation result.

In [None]:
from transformers import TrainingArguments, Trainer
import time

output_dir = f'./model/peft-trained-model-output/flan-output-{str(int(time.time()))}'

training_args = TrainingArguments(
    output_dir=output_dir,
    evaluation_strategy="epoch",
    learning_rate=1e-3, 
    num_train_epochs=1, 
    weight_decay=0.01,
    logging_steps=1,
    max_steps =1
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=instructed_datasets['train'],
    eval_dataset=instructed_datasets['validation']
)

In [None]:
trainer.train()

#### After training you can save the PEFT model for future evaluation and inference use. if you check saved model, it saves only adapter part of it and size is very less compared to base model.

In [None]:
saved_dir = f'./model/peft-trained-model/flan-trained-{str(int(time.time()))}'
tokenizer.save_pretrained(saved_dir)
peft_model.save_pretrained(saved_dir)

#### We can load the saved PEFT adapter from local file system along with its base FLAN_T5 model. We are passing is_trainable=false as we will use it only for inference not for further training.

In [None]:
from peft import PeftModel

peft_model = PeftModel.from_pretrained(base_model,'model/peft-trained-model/flan-trained-1693655699', is_trainable=False)

#### For Generative AI applications, a qualitative approach where we ask our-self the question "Is my model behaving right way?" is usually a good starting point. We can see that by manually seeing the difference between actual answer with answers given by the peft model. We can use our test dataset for evaluation.

In [None]:
from transformers import GenerationConfig
import pandas as pd

questions = dataset['test']['Question']
actual_answers = dataset['test']['Answer']
peft_model_answers = []

for _, question in enumerate(questions):
    prompt = f"""

Assuming you are working as General Knowladge instructor. Can you please answer the below question?

{question}
Answer:""";
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)
    peft_model_answers.append(peft_model_text_output)
    
answers = list(zip(questions,actual_answers,peft_model_answers))
df = pd.DataFrame(answers, columns = ['question','actual answer','peft model answer'])
df

#### Other evaluating approach is qualitative. The ROUGE metric helps quantify the validity of answers produced by models. It compares answers to a actual answer which is part of our test dataset.You can read more about this from ROUGE metric.

In [None]:
import evaluate
rouge = evaluate.load('rouge')
peft_model_results = rouge.compute(
    predictions=peft_model_answers,
    references=actual_answers,
    use_aggregator=True,
    use_stemmer=True,
)
print(peft_model_results)