# LlaMa2 + Vast.ai- zero shot example

The guide is a companion to the paper *"Generative LLMs and Textual Analysis in Accounting:(Chat)GPT as Research Assistant?"* ([SSRN](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4429658))

**Author:** [Ties de Kok](https://www.tiesdekok.com)    

----
# Imports
----

**Important:** please read the instructions in the `readme.md` file to set up a Vast.AI instance with the appropriate setup and packages.

**Python built-in libraries**

In [1]:
import os, sys, re, copy, random, json, time, datetime
from pathlib import Path

**General helper libraries**

In [2]:
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm
import getpass

**Libraries for interacting with the OpenAI API**

In [3]:
from vllm import LLM, SamplingParams

### Settings

In [4]:
pd.options.mode.chained_assignment = None  # default='warn'
pd.set_option('display.max_columns', 150)
pd.set_option('display.max_rows', 150)

### Utility functions

In [5]:
## This function makes it easier to print rendered markdown through a code cell.

from IPython.display import Markdown

def mprint(text, *args, **kwargs):
    if 'end' in kwargs.keys():
        text += kwargs['end']
        
    display(Markdown(text))

### Set up huggingface

In order to access the LlaMa2 models on Huggingface you need to request access and provide your hugginface hub token through your download requests.

You can request access here:    
https://huggingface.co/meta-llama/Llama-2-7b-hf

You can find your hugginface hub token here:    
https://huggingface.co/settings/tokens

You can set it by running the following:

In [6]:
if 'HUGGING_FACE_HUB_TOKEN' not in os.environ:
    os.environ['HUGGING_FACE_HUB_TOKEN'] = getpass.getpass(prompt='Enter your API key: ')

Enter your API key:  ········


-----
# Toy example
----

I will use a hypothetical dataset of earnings call sentences and try to identify sentences with a forward-looking statement.

## Load data

In [7]:
with open(Path.cwd() / "data" / "statements.json", "r", encoding = "utf-8") as f:
    statement_list = json.load(f)

statement_df = pd.DataFrame(statement_list)

In [8]:
sentence_1 = statement_df.iloc[0].statement
print(sentence_1)

In the last quarter, we managed to increase our revenue by 15% due to the successful launch of our new product line.


In [9]:
sentence_2 = statement_df.iloc[1].statement
print(sentence_2)

We anticipate that our investments in R&D will lead to a 20% improvement in efficiency in the next two years.


----
## Prompt engineering
----

### Define prompt template

In [10]:
prompt_template = """
Task: classify whether the statement below contains a forward looking statements (fls).
Rules:
- Answer using JSON in the following format: {{"contains_fls" : 0 or 1}}
Statement:
> {statement}
JSON =
""".strip()

## Note, the curly braces are what we will fill in for each observation

### Create prompts

In [11]:
prompt_list = []
id_list = []
for i, row in statement_df.iterrows():
    prompt = prompt_template.format(**{
        "statement" : row["statement"]
    })
    prompt_list.append(prompt)
    id_list.append(row["i"])

In [12]:
print(prompt_list[0])

Task: classify whether the statement below contains a forward looking statements (fls).
Rules:
- Answer using JSON in the following format: {"contains_fls" : 0 or 1}
Statement:
> In the last quarter, we managed to increase our revenue by 15% due to the successful launch of our new product line.
JSON =


----
# Run zero-shot inference
---

#### Set up parameters

In [13]:
sampling_params = SamplingParams(
    max_tokens=100,
    temperature=0,
    stop = """
}

JSON =     
""".strip()
)

#### Adapt prompts to fit llama2 instruct style

In [14]:
chat_format = """<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>

{user_message} [/INST]""".strip()

In [15]:
system_message = "You are a helpful assistant who only returns valid JSON."
llama_prompts = []
for prompt in prompt_list:
    llama_prompts.append(
        chat_format.format(system_prompt = system_message, user_message = prompt)
    )

#### Initialize llama2 7b model

In [16]:
model = "meta-llama/Llama-2-7b-chat-hf"
model_label = model.split("/")[-1].replace("-", "_")
model_pred_loc = Path.cwd() / "data" / f"{model_label}_{int(time.time())}.json"

In [17]:
llm = LLM(model=model, max_num_batched_tokens=4096)

INFO 10-05 04:06:17 llm_engine.py:72] Initializing an LLM engine with config: model='meta-llama/Llama-2-7b-chat-hf', tokenizer='meta-llama/Llama-2-7b-chat-hf', tokenizer_mode=auto, revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0)
INFO 10-05 04:06:17 tokenizer.py:30] For some LLaMA V1 models, initializing the fast tokenizer may take a long time. To reduce the initialization time, consider using 'hf-internal-testing/llama-tokenizer' instead of the original tokenizer.
INFO 10-05 04:06:22 llm_engine.py:205] # GPU blocks: 1050, # CPU blocks: 512


#### Generate inferences

In [18]:
%%time
llama2_7b_ouputs = llm.generate(llama_prompts, sampling_params=sampling_params)

Processed prompts: 100%|██████████████████████████████████████████████████████████████████████| 60/60 [00:00<00:00, 64.83it/s]

CPU times: user 943 ms, sys: 1.11 ms, total: 944 ms
Wall time: 941 ms





#### Process predictions

In [19]:
preds = []
for i, item in enumerate(llama2_7b_ouputs):
    full_item = copy.deepcopy(item.__dict__)
    full_item["completion_obj"] = full_item["outputs"][0].__dict__
    del full_item["outputs"]
    preds.append({
        "i" : id_list[i],
        "completion" : item.outputs[0].text,
        "raw_obj" : full_item
    })

with open(model_pred_loc, "w", encoding = "utf-8") as f:
    json.dump(preds, f)
    
llama2_7b_preds = copy.deepcopy(preds)

### Evaluate performance

#### Process into dataframe

In [25]:
res_list = []
for item in llama2_7b_preds:
    result = json.loads(item["completion"])
    res_list.append({
        "i" : item["i"],
        "fls_prediction" : result["contains_fls"]
    })
    
res_df = pd.DataFrame(res_list)

In [27]:
res_df.fls_prediction.value_counts()

1    51
0     9
Name: fls_prediction, dtype: int64

#### Show confusion matrix

In [31]:
from sklearn.metrics import classification_report

In [32]:
combo_df = pd.merge(statement_df, res_df, on = "i", how = "left")

In [34]:
print(classification_report(
    combo_df["contains_fls"], 
    combo_df["fls_prediction"]
))

              precision    recall  f1-score   support

           0       0.89      0.27      0.41        30
           1       0.57      0.97      0.72        30

    accuracy                           0.62        60
   macro avg       0.73      0.62      0.56        60
weighted avg       0.73      0.62      0.56        60

