**Inference using LLama 2 finetuned model**


References:

https://www.youtube.com/watch?v=MDA3LUKNl1E

https://github.com/curiousily/Get-Things-Done-with-Prompt-Engineering-and-LangChain/blob/master/14.fine-tuning-llama-2-7b-on-custom-dataset.ipynb

https://www.kaggle.com/code/mahimairaja/fine-tuning-llama-2-tweet-summarization

**Install necessary packages**

In [None]:
!pip install --quiet transformers
!pip install --quiet pytorch-lightning
!pip install torchtext==0.6.0
!pip install -qqq peft==0.5.0 --progress-bar off
!pip install -qqq trl==0.7.1 --progress-bar off
!pip install bitsandbytes-cuda110 bitsandbytes
!pip install accelerate
!pip install -i https://test.pypi.org/simple/ bitsandbytes

**Import libraries**

In [2]:
import json
import re
from pprint import pprint

import pandas as pd
import numpy as np
import torch
from huggingface_hub import notebook_login
from peft import LoraConfig, PeftModel
torch.cuda.empty_cache()
from pathlib import Path
from torch.utils.data import Dataset, DataLoader
import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint
from pytorch_lightning.loggers import TensorBoardLogger
from sklearn.model_selection import train_test_split
from termcolor import colored
import textwrap
import transformers
from transformers import (
    AutoModelForCausalLM,
    TrainingArguments,
    AutoTokenizer,
    BitsAndBytesConfig,
)
from trl import SFTTrainer
from tqdm.auto import tqdm
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc

%matplotlib inline
%config InlineBackend.figure_format='retina'
sns.set(style='whitegrid', palette='muted', font_scale=1.2)
rcParams['figure.figsize'] = 16, 10
pl.seed_everything(42)
DEVICE = "cude:0" if torch.cuda.is_available() else 'cpu'


**Specify the base model name**<br>
Specify the base model name if the finetuned weights have not been merged with the base model

In [3]:
base = 'meta-llama/Llama-2-7b-hf'


In [4]:
# output_dir = "experiments-llama"
output_dir = "experiments-llama/v2"


**Mount your drive folder if you run this on colab**

In [7]:
from google.colab import drive
drive.mount('/content/drive')
cd "/content/drive/My Drive/ML project"

Mounted at /content/drive


**Login to your huggingface account to be able to access the finetuned model** <br>
Note that to access Meta AI open-source LLama 2 model, you are required to submit a request which needs to be approved by Meta AI and huggingface.

In [9]:
notebook_login()
## you will need your huggingface access token [HUGGING_FACE_TOKEN]

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## load base model and merge your weights from fine-tuning


In [None]:
## ref: https://colab.research.google.com/drive/134o_cXcMe_lsvl15ZE_4Y75Kstepsntu?usp=sharing#scrollTo=wxQJJoJnZV27
## load base model and merge your weights from training
device_map = {"": 0}

# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    base,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, output_dir)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [None]:
new_model = 'finetuned-llama-2-test_v2'
model.push_to_hub(new_model, max_shard_size='2GB')
tokenizer.push_to_hub(new_model)

model-00001-of-00007.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00002-of-00007.safetensors:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

model-00003-of-00007.safetensors:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

model-00006-of-00007.safetensors:   0%|          | 0.00/1.93G [00:00<?, ?B/s]

model-00007-of-00007.safetensors:   0%|          | 0.00/1.66G [00:00<?, ?B/s]

Upload 7 LFS files:   0%|          | 0/7 [00:00<?, ?it/s]

model-00004-of-00007.safetensors:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

model-00005-of-00007.safetensors:   0%|          | 0.00/1.93G [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/aj30/finetuned-llama-2-test_subset1024_v1/commit/0e8babe82d317cdb55bb98c6c0774121dc265c94', commit_message='Upload tokenizer', commit_description='', oid='0e8babe82d317cdb55bb98c6c0774121dc265c94', pr_url=None, pr_revision=None, pr_num=None)

## loading model from huggingface after restarting the session


In [10]:
use_4bit = True
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
# Load the entire model on the GPU 0
device_map = {"": 0}
lora_alpha = 16
lora_dropout = 0.1
lora_r = 64
bnb_4bit_compute_dtype = "float16"


def load_model(model_name):
    # Load tokenizer and model with QLoRA configuration
    compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=use_4bit,
        bnb_4bit_quant_type=bnb_4bit_quant_type,
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=use_nested_quant,
    )

    if compute_dtype == torch.float16 and use_4bit:
        major, _ = torch.cuda.get_device_capability()
        if major >= 8:
            print("=" * 80)
            print("Your GPU supports bfloat16, you can accelerate training with the argument --bf16")
            print("=" * 80)

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map=device_map,
        quantization_config=bnb_config
    )

    model.config.use_cache = False
    model.config.pretraining_tp = 1

    # Load LoRA configuration
    peft_config = LoraConfig(
        lora_alpha=lora_alpha,
        lora_dropout=lora_dropout,
        r=lora_r,
        bias="none",
        task_type="CAUSAL_LM",
    )

    # Load Tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"

    return model, tokenizer, peft_config

In [11]:
new_model = 'finetuned-llama-2-test_v2'

huggingface_profile = "aj30"
full_path = huggingface_profile + "/" + new_model

model, tokenizer,peft_config = load_model(full_path)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/656 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/7 [00:00<?, ?it/s]

model-00001-of-00007.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00002-of-00007.safetensors:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

model-00003-of-00007.safetensors:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

model-00004-of-00007.safetensors:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

model-00005-of-00007.safetensors:   0%|          | 0.00/1.93G [00:00<?, ?B/s]

model-00006-of-00007.safetensors:   0%|          | 0.00/1.93G [00:00<?, ?B/s]

model-00007-of-00007.safetensors:   0%|          | 0.00/1.66G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/183 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/869 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/437 [00:00<?, ?B/s]

Preliminaries to create the prompts

In [12]:
DEFAULT_SYSTEM_PROMPT = """
Below is a scientific paper from pubmed. Write a summary.
""".strip()


In [13]:
def generate_prompt(
    paper: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT
) -> str:
    paper_chunk = paper[:1024]

    return f"""### Instruction: {system_prompt}
### Input:
{paper_chunk.strip()}

### Response:
""".strip()

In [14]:
from datasets import load_dataset, load_from_disk

In [15]:
dataset = load_from_disk('experiments-llama/dataset.hf')

In [16]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'summary', 'paper'],
        num_rows: 1440
    })
    validation: Dataset({
        features: ['text', 'summary', 'paper'],
        num_rows: 160
    })
    test: Dataset({
        features: ['text', 'summary', 'paper'],
        num_rows: 400
    })
})

In [17]:
test_df = pd.DataFrame(dataset['test'][:100])
test_df

Unnamed: 0,text,summary,paper
0,### Instruction: Below is a scientific paper f...,study design comparative effectiveness revie...,patients with cervical radiculopathy due to si...
1,### Instruction: Below is a scientific paper f...,difficulties with temporal coordination or se...,functional imaging studies investigating ther...
2,### Instruction: Below is a scientific paper f...,we examined small mammals as hosts for anapla...,from may through october 2008 we trapped smal...
3,### Instruction: Below is a scientific paper f...,glaucoma is a common disease that leads to lo...,the eye is an immune privileged site where the...
4,### Instruction: Below is a scientific paper f...,maxillary sinus aplasia and hypoplasia are ra...,the maxillary sinuses develop in the 3rd month...
...,...,...,...
95,### Instruction: Below is a scientific paper f...,orbital cellulitis is an infection of soft ti...,a 10year old asian indian male child was brou...
96,### Instruction: Below is a scientific paper f...,a family history of prostate cancer pca i...,a positive family history of prostate cancer ...
97,### Instruction: Below is a scientific paper f...,the near universal presence of the rhomboid ...,inspection of the database of clusters of orth...
98,### Instruction: Below is a scientific paper f...,the neural efficiency hypothesis suggests tha...,neural plasticity is the brain s ability to ch...


In [18]:
def summarize(model, text: str):
    inputs = tokenizer(text, return_tensors="pt").to('cuda')
    inputs_length = len(inputs["input_ids"][0])
    with torch.inference_mode():
        outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0001)
    return tokenizer.decode(outputs[0][inputs_length:], skip_special_tokens=True)

In [19]:
example = test_df.iloc[60]

In [20]:
example['paper']

'aging  a multifactorial process of enormous complexity  is characterized by impairment of physiochemical and biological aspects of cellular functions   harman 1992    oxidative stress  an unavoidable consequence in the metabolism of oxygen in aerobic cells  is a major factor in the aging process and  in the course of many chronic diseases  associated with aging   mattson 2002    many predisposing conditions which increase in prevalence during aging  such as obesity  insulin resistance  inflammation  changes in the activity of the hypothalamus hypophysis suprarenal axis  stress  and hypertension  contribute to increase prevalence of cardiovascular diseases   veronica and esther 2012    lipid infiltration in the myocardium is the foremost disorder encountered in the development of the aging process   johannsen and ravussin 2010    aging is frequently accompanied by several pathological conditions and some associated phenomena such as increased lipid peroxidation  generation of free radi

In [21]:
torch.cuda.empty_cache()

summary_org = summarize(model, generate_prompt(example.paper)) ## cannot load full text ..


In [22]:
generate_prompt(example.paper)

'### Instruction: Below is a scientific paper from pubmed. Write a summary.\n### Input:\naging  a multifactorial process of enormous complexity  is characterized by impairment of physiochemical and biological aspects of cellular functions   harman 1992    oxidative stress  an unavoidable consequence in the metabolism of oxygen in aerobic cells  is a major factor in the aging process and  in the course of many chronic diseases  associated with aging   mattson 2002    many predisposing conditions which increase in prevalence during aging  such as obesity  insulin resistance  inflammation  changes in the activity of the hypothalamus hypophysis suprarenal axis  stress  and hypertension  contribute to increase prevalence of cardiovascular diseases   veronica and esther 2012    lipid infiltration in the myocardium is the foremost disorder encountered in the development of the aging process   johannsen and ravussin 2010    aging is frequently accompanied by several pathological conditions and

In [23]:
summary_org

'\n aging is a multifactorial process of enormous complexity  which is characterized by impairment of physiochemical and biological aspects of cellular functions  oxidative stress  an unavoidable consequence in the metabolism of oxygen in aerobic cells  is a major factor in the aging process and in the course of many chronic diseases  associated with aging  many predisposing conditions which increase in prevalence during aging  such as obesity  insulin resistance  inflammation  changes in the activity of the hypothalamus hypophysis suprarenal axis  stress  and hypertension  contribute to increase prevalence of cardiovascular diseases   veronica and esther 2012    lipid infiltration in the myocardium is the foremost disorder encounte\n\n   keywords   aging  oxidative stress  lipid peroxidation  antioxidant enzymes  myocardium   background  aging is a multifactorial process of enormous complexity  which is characterized by impairment of physiochemical and bi'

In [24]:
example['summary']

' aging has been defined as the changes that occur in living organisms with the passage of time that lead to functional impairment and ultimately to death  free radical  induced oxidative damage has long been thought to be the most important consequence of the aging process  in the present study  an attempt has been made to study the salubrious effects of dietary supplementation of chitosan on glutathione  dependent antioxidant defense system in young and aged rats  the dietary supplementation of chitosan significantly reduced the age  associated dyslipidemic abnormalities noted in the levels of total cholesterol  hdl  cholesterol  and ldl  cholesterol in plasma and heart tissue  its administration significantly   p  005   attenuated the oxidative stress in the heart tissue of aged rats through the counteraction of free radical formation by maintaining the enzymatic  glutathione peroxidase   gpx   and glutathione reductase   gr    and non  enzymatic  reduced glutathione   gsh    status

In [25]:

def eval(df,model):
  model_outputs = []
  for index, row in df.iterrows():
    # Apply the model to the current row
    output = summarize(model,generate_prompt(row.paper))
    model_outputs.append(output)

  df['model_summary'] = model_outputs
  return df


In [26]:
test_df2 = test_df.copy()

In [None]:
test_df2 = eval(test_df2,model)

In [None]:
test_df2.to_excel(f'{output_dir}/runs/evaluation_v2_jan7.xlsx')