# Replicate + Llama2 - zero shot example

The guide is a companion to the paper *"Generative LLMs and Textual Analysis in Accounting:(Chat)GPT as Research Assistant?"* ([SSRN](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4429658))

**Author:** [Ties de Kok](https://www.tiesdekok.com)    

----
# Imports
----


All the dependencies required for this notebook are provided in the `environment.yml` file.

To install: `conda env create -f environment.yml` --> this creates the `gllm` environment.

I recommend using Python 3.9 or higher to avoid dependency conflicts.

**Python built-in libraries**

In [4]:
import os, sys, re, copy, random, json, time, datetime
from pathlib import Path
import getpass

**Libraries for interacting with the OpenAI API**

In [5]:
import requests

**General helper libraries**

In [6]:
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm

### Settings

In [7]:
pd.options.mode.chained_assignment = None  # default='warn'
pd.set_option('display.max_columns', 150)
pd.set_option('display.max_rows', 150)

### Utility functions

In [8]:
## This function makes it easier to print rendered markdown through a code cell.

from IPython.display import Markdown

def mprint(text, *args, **kwargs):
    if 'end' in kwargs.keys():
        text += kwargs['end']
        
    display(Markdown(text))

-----
# Toy example
----

I will use a hypothetical dataset of earnings call sentences and try to identify sentences with a forward-looking statement.

## Load data

In [9]:
with open(Path.cwd() / "data" / "statements.json", "r", encoding = "utf-8") as f:
    statement_list = json.load(f)

statement_df = pd.DataFrame(statement_list)

In [10]:
sentence_1 = statement_df.iloc[0].statement
print(sentence_1)

In the last quarter, we managed to increase our revenue by 15% due to the successful launch of our new product line.


In [11]:
sentence_2 = statement_df.iloc[1].statement
print(sentence_2)

We anticipate that our investments in R&D will lead to a 20% improvement in efficiency in the next two years.


----
## Prompt engineering
----

### Define prompt template

In [12]:
prompt_template = """
Task: classify whether the statement below contains a forward looking statements (fls).
Rules:
- Answer using JSON in the following format: {{"contains_fls" : 0 or 1}}
Statement:
> {statement}
JSON =
""".strip()

## Note, the curly braces are what we will fill in for each observation

### Create prompt

In [13]:
prompt = prompt_template.format(**{
    "statement" : sentence_1
})

In [14]:
print(prompt)

Task: classify whether the statement below contains a forward looking statements (fls).
Rules:
- Answer using JSON in the following format: {"contains_fls" : 0 or 1}
Statement:
> In the last quarter, we managed to increase our revenue by 15% due to the successful launch of our new product line.
JSON =


---
## Set up replicate
---

In order to run the code below you need to install the replicate Python package:

```
## In your terminal / command line:
pip install replicate

## Or inside a Jupyter cell:
!pip install replicate
```

In [19]:
import replicate

In [18]:
if 'REPLICATE_API_TOKEN' not in os.environ:
    os.environ['REPLICATE_API_TOKEN'] = getpass.getpass(prompt='Enter your API key: ')
    
replicate_key = os.environ['REPLICATE_API_TOKEN']
    
## KEEP YOUR KEY SECURE, ANYONE WITH ACCESS TO IT CAN GENERATE COSTS ON YOUR ACCOUNT!

Enter your API key:  ········


### Provide a demo generation

In [26]:
output = replicate.run(
    "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    input={"prompt": "Tell me a funny joke about accountants"},
    stream = False
)

In [28]:
for item in output:
    print(item, end = "")

 Sure, here's a joke for you:

Why did the accountant quit his job?

Because he wanted to take his career to the next level!

(Get it? "Next level" is a phrase often used in business and finance to describe growth or advancement, but in this case, it's a play on words because "level" can also refer to a floor or step in a building, implying that the accountant wants to leave his current job and move up to a better one.)

I hope that brought a smile to your face! Is there anything

### Make generation based on our prompt

In [55]:
output = replicate.run(
    "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    input={
        "prompt": prompt, 
        "stop_sequences" : "}",
        "temperature" : 0.01 ## The replicate API currently has a bug where it doesn't accept 0. 
    },
)

res = "".join(list(output))

res = (res + "}").strip()

In [56]:
json.loads(res)

{'contains_fls': 0}

### Wrap into a function

In [64]:
def make_prediction(prompt):
    output = replicate.run(
        "meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
        input={
            "prompt": prompt, 
            "stop_sequences" : "}",
            "temperature" : 0.01
        },
    )

    res = "".join(list(output))

    res = (res + "}").strip()
    
    json_res = json.loads(res)
    
    return json_res

#### Apply to all

In [65]:
for i, row in statement_df.head().iterrows():
    prompt = prompt_template.format(**{
        "statement" : row["statement"]
    })
    
    prediction = make_prediction(prompt) 
    mprint(f"""
**Item:** `{i}`
> {row["statement"]}    

*Prediction - contains FLS:* `{prediction["contains_fls"]}`
<br><br>
""".strip())

**Item:** `0`
> In the last quarter, we managed to increase our revenue by 15% due to the successful launch of our new product line.    

*Prediction - contains FLS:* `0`
<br><br>

**Item:** `1`
> We anticipate that our investments in R&D will lead to a 20% improvement in efficiency in the next two years.    

*Prediction - contains FLS:* `1`
<br><br>

**Item:** `2`
> Our recent acquisition of XYZ Company has already started to show positive results in terms of cost savings and market reach.    

*Prediction - contains FLS:* `0`
<br><br>

**Item:** `3`
> We expect to see continued growth in the Asian market, with a potential increase in revenue of 25% over the next three years.    

*Prediction - contains FLS:* `1`
<br><br>

**Item:** `4`
> In the past year, we have successfully reduced our operational costs by 10% through process improvements and better supply chain management.    

*Prediction - contains FLS:* `0`
<br><br>