# Build Your Own 10K Agent: Transform a Unstructured Financial Report to an Finetuned LLM
Do you want to build an agent so that you can ask it anything about the annual report (10K)? In this example, we will show you how use `uniflow` and `pykoi` to extract knowledge from a unstructured annual report (10K) and then finetune an LLM on these knowledge.

First, we'll use `uniflow` to generate question-answers (QAs) from a pdf using OpenAI's models via `uniflow`'s `MultiFlowPipeline`.

Next, we'll use `pykoi` to run supervised fine-tuning (SFT) on the QAs generated by `uniflow`.

Finally, we'll use `pykoi`'s Chatbot to run the SFT model, so you can ask questions about the 10K and get answers.

For this example, we're using a 10K from [Nike](https://investors.nike.com/investors/news-events-and-reports/), [Amazon](https://ir.aboutamazon.com/sec-filings/sec-filings-details/default.aspx?FilingId=16361618), and [Alphabet](https://abc.xyz/investor/sec-filings/annual-filings/2023/).

>*Note: In order to run this notebook, you need a GPU (for the `RLHF`).*

### Before running the code

You will need to set up a conda environment to run this notebook. You can set up the environment following the [instruction](https://github.com/CambioML/cambio-recipes/tree/main#installation).

We are using uniflow and several of the pykoi modules, so you will need to install these in your environment as well:
```
pip3 install uniflow
pip3 install "pykoi[huggingface, rag, rlhf]"
```
Finally, you will need to install torch:
```
pip3 uninstall torch
pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121  # cu121 means cuda 12.1
```

Next, you will need a valid [OpenAI API key](https://platform.openai.com/api-keys) to run the code. Once you have the key, set it as the environment variable `OPENAI_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/cambio-recipes/tree/main#api-keys)

## 1. Generate QAs from a 10K using `uniflow`

### Update System Path

In [None]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

### Install helper packages
If you already have these installed, feel free to skip this step.

In [None]:
!{sys.executable} -m pip install pandas

### Import Dependency

In [None]:
from dotenv import load_dotenv
import os
import pandas as pd

from uniflow.pipeline import MultiFlowsPipeline
from uniflow.flow.config import PipelineConfig
from uniflow.flow.config import TransformOpenAIConfig, ExtractPDFConfig
from uniflow.op.model.model_config import OpenAIModelConfig, NougatModelConfig
from uniflow.op.prompt import PromptTemplate, Context
from uniflow.op.extract.split.constants import PARAGRAPH_SPLITTER

load_dotenv()

### Prepare the input data
First, uncomment the 10k that you want to use.

In [None]:
pdf_file = "nike-10k-2023.pdf"
# pdf_file = "amazon-10k-2023.pdf"
# pdf_file = "alphabet-10k-2023.pdf"

##### Set current directory and input data directory.

In [None]:
dir_cur = os.getcwd()
input_file = os.path.join(f"{dir_cur}/data/raw_input/", pdf_file)

#### Load the pdf using Nougat
For this example, we'll run the `ExtractPDF` flow to extract the text from the 10K pdf. This uses the [Nougat](https://pypi.org/project/nougat-ocr/0.1.17/) PDF parser.

In [None]:
data = [
    {"pdf": input_file},
]

extract_config = ExtractPDFConfig(
    model_config=NougatModelConfig(
        batch_size = 4 # When batch_size>1, nougat will run on CUDA, otherwise it will run on CPU
    ),
    splitter=PARAGRAPH_SPLITTER,
)


Now we need to write a little bit prompts to generate question and answer for a given paragraph, each promopt data includes a instruction and a list of examples with "context", "question" and "answer".

In [None]:
prompt_template = PromptTemplate(
    instruction="""Generate one question and its corresponding answer based on the last context in the last
    example. Follow the format of the examples below to include context, question, and answer in the response""",
    few_shot_prompt=[
        Context(
            context="In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",
            question="Who published A Mathematical Theory of Communication in 1948?",
            answer="Claude E. Shannon.",
        ),
])

### Use LLM to generate data

In this example, we will use the [OpenAIModelConfig](https://github.com/CambioML/uniflow/blob/main/uniflow/model/config.py#L17)'s default LLM to generate questions and answers.

In [None]:
transform_config = TransformOpenAIConfig()
transform_config.prompt_template = prompt_template

If we want the response format to be JSON, we need to update two aspects of the default config:

1. Change the model_name to "gpt-4-1106-preview", which is the only GPT-4 model that supports the JSON format.
1. Change the response_format to a json_object.

In [None]:
transform_config.model_config.model_name = "gpt-4-1106-preview"
transform_config.model_config.response_format = {"type": "json_object"}
transform_config.model_config.num_call = 1
transform_config.model_config.temperature = 0.0

Finally, we update the `num_threads` and `batch_size`. You'll want to optimize this number to maximize efficiency. Note that these must be the same number.

In [None]:
from pprint import pprint

num_thread_batch_size = 32
transform_config.model_config.num_thread = num_thread_batch_size
transform_config.model_config.batch_size = num_thread_batch_size
pprint(transform_config)

Now we call the `run` method on the `client` object to execute the question-answer generation operation on the data shown above.

Note sometimes the LLM doesn't return a JSON output, then uniflow will handle the failure and auto retry generating a new output.

In [None]:
p = MultiFlowsPipeline(PipelineConfig(
    extract_config=extract_config,
    transform_config=transform_config,
))
output = p.run(data)

### Process the output

Let's take a look of the generation output. We need to do a little postprocessing on the raw output.

In [None]:
# Extracting context, question, and answer into a DataFrame
contexts = []
questions = []
answers = []

for item in output:
    for i in item.get('output', []):
        for response in i.get('response', []):
            if any(key not in response for key in ['context', 'question', 'answer']):
                print("[WARNING] Missing context, question or answer in response, skipping:\n", response)
                continue
            if "Claude E. Shannon" in response['context']:
                print("[WARNING] Used example context, skipping:\n", response["context"])
                continue
            if len(response['context']) < 50:
                continue
            contexts.append(response['context'])
            questions.append(response['question'])
            answers.append(response['answer'])

# Set display options
pd.set_option('display.max_colwidth', None)  # or use a specific width like 50
pd.set_option('display.width', 1000)

df = pd.DataFrame({
    'Context': contexts,
    'Question': questions,
    'Answer': answers
})

df.head(100)

Finally, we can save the `uniflow` output to a `.csv` file.

In [None]:
output_df = df[['Question', 'Answer']]

output_dir = 'data/output'

uniflow_output_path = f"{output_dir}/Nike_10k_QApairs.csv"

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

output_df.to_csv(uniflow_output_path, index=False)

#### Release GPU Memory
We'll need to use our GPU for future steps, so let's release the memory.

In [None]:
import torch
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU memory has been released.")
else:
    print("No GPU devices found.")


## 2. Running `pykoi` `SupervisedFineTuning` on the QA pairs

### Install helper packages
If you already have these installed, feel free to skip this step.

In [None]:
!{sys.executable} -m pip install peft

### Import Dependency

In [None]:
from pykoi.rlhf import RLHFConfig
from pykoi.rlhf import SupervisedFinetuning
from peft import LoraConfig, TaskType

### Set the parameters

In [None]:
base_model_path = "meta-llama/Llama-2-7b-chat-hf"
dataset_name = uniflow_output_path
peft_model_path = "./models/rlhf_step1_sft"
dataset_type = "local_csv"
learning_rate = 1e-3
weight_decay = 0.0
max_steps = 1600
per_device_train_batch_size = 1
per_device_eval_batch_size = 4
log_freq = 20
eval_freq = 2000
save_freq = 200
train_test_split_ratio = 0.0001
dataset_subset_sft_train = 999999999
size_valid_set = 0

r = 8
lora_alpha = 16
lora_dropout = 0.05
bias = "none"
task_type = TaskType.CAUSAL_LM

In [None]:
lora_config = LoraConfig(
    r=r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    bias=bias,
    task_type=task_type,
    )


# run supervised finetuning
config = RLHFConfig(
    base_model_path=base_model_path,
    dataset_type=dataset_type,
    dataset_name=dataset_name,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    max_steps=max_steps,
    per_device_train_batch_size=per_device_train_batch_size,
    per_device_eval_batch_size=per_device_eval_batch_size,
    log_freq=log_freq,
    eval_freq=eval_freq,
    save_freq=save_freq,
    train_test_split_ratio=train_test_split_ratio,
    dataset_subset_sft_train=dataset_subset_sft_train,
    size_valid_set=size_valid_set,
    lora_config_rl=lora_config
    )

### Run the SupervisedFineTuning

In [None]:
rlhf_step1_sft = SupervisedFinetuning(config)
rlhf_step1_sft.train_and_save(peft_model_path)

#### Release GPU Memory
We'll need to use our GPU for future steps, so let's release the memory.

In [None]:
import torch
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU memory has been released.")
else:
    print("No GPU devices found.")


## 3. Running a `pykoi` `Chatbot` on the fine-tuned model

### Import pykoi components

In [None]:
from pykoi.application import Application
from pykoi.chat import ModelFactory
from pykoi.chat import QuestionAnswerDatabase
from pykoi.component import Chatbot, Dashboard

### Create the Model

In [None]:
model = ModelFactory.create_model(
    model_source="peft_huggingface",
    base_model_path="meta-llama/Llama-2-7b-chat-hf",
    lora_model_path="/home/ubuntu/pykoi/models/rlhf_step1_sft",
)

### Create the Chatbot with the model

In [None]:
database = QuestionAnswerDatabase(debug=True)
chatbot = Chatbot(model=model, feedback="vote")
dashboard = Dashboard(database=database)

### Run the Chatbot app!

#### Add `nest_asyncio` 
Add `nest_asyncio` to avoid error such as `asyncio.run() cannot be called from a running event loop`. Since we're running another interface inside a Jupyter notebook where an asyncio event loop is already running, we'll encounter the error. (since The uvicorn.run() function uses asyncio.run(), which isn't compatible with a running event loop.)

In [None]:
# !pip install -q nest_asyncio
import nest_asyncio
nest_asyncio.apply()

In [None]:
app = Application(debug=False, share=False)
app.add_component(chatbot)
app.add_component(dashboard)
app.run()

## End of the notebook

Check more use cases in the [example folder](../../examples/)!

<a href="https://www.cambioml.com/" title="Title">
    <img src="../image/cambioml_logo_large.png" style="height: 100px; display: block; margin-left: auto; margin-right: auto;"/>
</a>