# Example of generating QAs and running SFT for a 10K
In this example, we will show you how use `uniflow` and `pykoi` to evaluate a 10k.

First, we'll use `uniflow` to generate question-answers (QAs) from a pdf using OpenAI's models via `uniflow`'s `MultiFlowPipeline`.

Next, we'll use `pykoi` to run supervised fine-tuning (SFT) on the QAs generated by `uniflow`.

Finally, we'll use `pykoi`'s Chatbot to run the SFT model, so you can ask questions about the 10K and get answers.

For this example, we're using a 10K from [Nike](https://investors.nike.com/investors/news-events-and-reports/), [Amazon](https://ir.aboutamazon.com/sec-filings/sec-filings-details/default.aspx?FilingId=16361618), and [Alphabet](https://abc.xyz/investor/sec-filings/annual-filings/2023/).

>*Note: In order to run this notebook, you need a GPU (for the `RLHF`).*

### Before running the code

You will need to set up a conda environment to run this notebook. You can set up the environment following the [instruction](https://github.com/CambioML/cambio-recipes/tree/main#installation).

We are using uniflow and several of the pykoi modules, so you will need to install these in your environment as well:
```
pip3 install uniflow
pip3 install "pykoi[huggingface, rag, rlhf]"
```
Finally, you will need to install torch:
```
pip3 uninstall torch
pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121  # cu121 means cuda 12.1
```

Next, you will need a valid [OpenAI API key](https://platform.openai.com/api-keys) to run the code. Once you have the key, set it as the environment variable `OPENAI_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/cambio-recipes/tree/main#api-keys)

## 1. Generate QAs from a 10K using `uniflow`

### Update System Path

In [1]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

### Install helper packages
If you already have these installed, feel free to skip this step.

In [2]:
!{sys.executable} -m pip install pandas nougat-ocr



### Import Dependency

In [3]:
from dotenv import load_dotenv
import os
import pandas as pd

from uniflow.pipeline import MultiFlowsPipeline
from uniflow.flow.config import PipelineConfig
from uniflow.flow.config import TransformOpenAIConfig, ExtractPDFConfig
from uniflow.flow.config import OpenAIModelConfig, NougatModelConfig
from uniflow.op.prompt_schema import GuidedPrompt, Context

load_dotenv()

  from .autonotebook import tqdm as notebook_tqdm


True

### Prepare the input data
First, uncomment the 10k that you want to use.

In [4]:
pdf_file = "nike-10k-2023.pdf"
# pdf_file = "amazon-10k-2023.pdf"
# pdf_file = "alphabet-10k-2023.pdf"

##### Set current directory and input data directory.

In [5]:
dir_cur = os.getcwd()
input_file = os.path.join(f"{dir_cur}/data/raw_input/", pdf_file)

#### Load the pdf using Nougat
For this example, we'll run the `ExtractPDF` flow to extract the text from the 10K pdf. This uses the [Nougat](https://pypi.org/project/nougat-ocr/0.1.17/) PDF parser.

In [6]:
data = [
    {"pdf": input_file},
]

extract_config = ExtractPDFConfig(
    model_config=NougatModelConfig(
        model_name = "0.1.0-small",
        batch_size = 1 # When batch_size>1, nougat will run on CUDA, otherwise it will run on CPU
    )
)


Now we need to write a little bit prompts to generate question and answer for a given paragraph, each promopt data includes a instruction and a list of examples with "context", "question" and "answer".

In [7]:
guided_prompt = GuidedPrompt(
    examples=[
        Context(
            context="In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",
            question="Who published A Mathematical Theory of Communication in 1948?",
            answer="Claude E. Shannon.",
        ),
])

### Use LLM to generate data

In this example, we will use the [OpenAIModelConfig](https://github.com/CambioML/uniflow/blob/main/uniflow/model/config.py#L17)'s default LLM to generate questions and answers.

Here, we pass in our `guided_prompt` to the `OpenAIConfig` to use our customized instructions and examples, instead of the `uniflow` default ones.

We also want to get the response in the `json` format instead of the `text` default, so we set the `response_format` to `json_object`.

In [8]:
transform_config = TransformOpenAIConfig(
    guided_prompt_template=guided_prompt,
    model_config=OpenAIModelConfig(response_format={"type": "json_object"}),
)

Now we call the `run` method on the `client` object to execute the question-answer generation operation on the data shown above.

Note sometimes the LLM doesn't return a JSON output, then uniflow will handle the failure and auto retry generating a new output.

In [9]:
p = MultiFlowsPipeline(PipelineConfig(
    extract_config=extract_config,
    transform_config=transform_config,
))
output = p.run(data)

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
100%|██████████| 1/1 [08:31<00:00, 511.46s/it]
100%|██████████| 1099/1099 [1:10:58<00:00,  3.88s/it]  


### Process the output

Let's take a look of the generation output. We need to do a little postprocessing on the raw output.

In [11]:
# Extracting context, question, and answer into a DataFrame
contexts = []
questions = []
answers = []

for item in output[0]:
    for i in item.get('output', []):
        for response in i.get('response', []):
            if any(key not in response for key in ['context', 'question', 'answer']):
                print("[WARNING] Missing context, question or answer in response, skipping:\n", response)
                continue
            if "Claude E. Shannon" in response['context']:
                print("[WARNING] Used example context, skipping:\n", response["context"])
                continue
            contexts.append(response['context'])
            questions.append(response['question'])
            answers.append(response['answer'])

# Set display options
pd.set_option('display.max_colwidth', None)  # or use a specific width like 50
pd.set_option('display.width', 1000)

df = pd.DataFrame({
    'Context': contexts,
    'Question': questions,
    'Answer': answers
})

df

Unnamed: 0,Context,Question,Answer
0,". Jain, Phys. Rev. Lett. **78**, 1238 (19UNITED STATES",What is the title of the article published by Jain in Phys. Rev. Lett.?,The title of the article is not provided in the given context.
1,SECURITIES AND EXCHANGE COMMISSION,What is the role of the SEC?,"The SEC oversees and regulates the securities industry, the nation's stock and options exchanges, and other electronic securities markets."
2,"Washington, D.C. 20549","What is the zip code for Washington, D.C.?",20549.
3,FORM 10-K,What is the purpose of a FORM 10-K?,A FORM 10-K is a comprehensive report filed annually by publicly traded companies to provide a summary of their financial performance and regulatory compliance.
4,"FOR THE FISCAL YEAR ENDED MAY 31, 2023",What is the end date of the fiscal year mentioned?,"May 31, 2023"
...,...,...,...
913,"NICE, the Swochs Design, and Just Do It are registered trademarks of NICE, Inc.","What are the registered trademarks of NICE, Inc.?","NICE, the Swochs Design, and Just Do It."
914,"NIKE, INC.",What is the name of the company?,"NIKE, INC."
915,One Bowerman Drive,What is the address of One Bowerman Drive?,One Bowerman Drive
916,"Bawerman, OR 97005-6453","What is the zip code for Bawerman, OR?",97005-6453


Finally, we can save the `uniflow` output to a `.csv` file.

In [12]:
output_df = df[['Question', 'Answer']]

output_dir = 'data/output'

uniflow_output_path = f"{output_dir}/Nike_10k_QApairs.csv"

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

output_df.to_csv(uniflow_output_path, index=False)

#### Release GPU Memory
We'll need to use our GPU for future steps, so let's release the memory.

In [13]:
import torch
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU memory has been released.")
else:
    print("No GPU devices found.")


GPU memory has been released.


## 2. Running `pykoi` `SupervisedFineTuning` on the QA pairs

### Install helper packages
If you already have these installed, feel free to skip this step.

In [14]:
!{sys.executable} -m pip install peft



### Import Dependency

In [15]:
from pykoi.rlhf import RLHFConfig
from pykoi.rlhf import SupervisedFinetuning
from peft import LoraConfig, TaskType



### Set the parameters

In [16]:
base_model_path = "meta-llama/Llama-2-7b-chat-hf"
dataset_name = uniflow_output_path
peft_model_path = "./models/rlhf_step1_sft"
dataset_type = "local_csv"
learning_rate = 1e-3
weight_decay = 0.0
max_steps = 1600
per_device_train_batch_size = 1
per_device_eval_batch_size = 4
log_freq = 20
eval_freq = 2000
save_freq = 200
train_test_split_ratio = 0.0001
dataset_subset_sft_train = 999999999
size_valid_set = 0

r = 8
lora_alpha = 16
lora_dropout = 0.05
bias = "none"
task_type = TaskType.CAUSAL_LM

In [17]:
lora_config = LoraConfig(
    r=r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    bias=bias,
    task_type=task_type,
    )


# run supervised finetuning
config = RLHFConfig(
    base_model_path=base_model_path,
    dataset_type=dataset_type,
    dataset_name=dataset_name,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    max_steps=max_steps,
    per_device_train_batch_size=per_device_train_batch_size,
    per_device_eval_batch_size=per_device_eval_batch_size,
    log_freq=log_freq,
    eval_freq=eval_freq,
    save_freq=save_freq,
    train_test_split_ratio=train_test_split_ratio,
    dataset_subset_sft_train=dataset_subset_sft_train,
    size_valid_set=size_valid_set,
    lora_config_rl=lora_config
    )

### Run the SupervisedFineTuning

In [18]:
rlhf_step1_sft = SupervisedFinetuning(config)
rlhf_step1_sft.train_and_save(peft_model_path)

Downloading data files: 100%|██████████| 1/1 [00:00<00:00, 10407.70it/s]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 1367.11it/s]
Generating train split: 918 examples [00:00, 110585.65 examples/s]


Size of the train set: 917.               Size of the validation set: 1


Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.03it/s]
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss


...


#### Release GPU Memory
We'll need to use our GPU for future steps, so let's release the memory.

In [3]:
import torch
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU memory has been released.")
else:
    print("No GPU devices found.")


GPU memory has been released.


## 3. Running a `pykoi` `Chatbot` on the fine-tuned model

### Import pykoi components

In [4]:
from pykoi.application import Application
from pykoi.chat import ModelFactory
from pykoi.chat import QuestionAnswerDatabase
from pykoi.component import Chatbot, Dashboard

### Create the Model

In [5]:
model = ModelFactory.create_model(
    model_source="peft_huggingface",
    base_model_path="meta-llama/Llama-2-7b-chat-hf",
    lora_model_path="/home/ubuntu/pykoi/models/rlhf_step1_sft",
)

  from .autonotebook import tqdm as notebook_tqdm


[HuggingfaceModel] loading base model...


Loading checkpoint shards: 100%|██████████| 2/2 [01:42<00:00, 51.00s/it]


[HuggingfaceModel] loading perf model...
[HuggingfaceModel] loading tokenizer...


### Create the Chatbot with the model

In [6]:
database = QuestionAnswerDatabase(debug=True)
chatbot = Chatbot(model=model, feedback="vote")
dashboard = Dashboard(database=database)

Table contents after creating table:
ID: 1, Question: Who is on Nike's board, Answer: Who is on Nike's board of directors?[Page 145]

            Answer: John J. Donahoe II, Matthew Friend, Johanna Nielsen, Mark G. Parker, Cathleen A. Benko, Timothy D. Cook, Thasunda B. Duckett, Mónica Gil, Alan B. Graf, Jr., Maria Henry, Peter B. Henry, Travis A. Knight, Michelle A., Vote Status: n/a, Timestamp: 2023-12-28 17:22:18.342838


### Run the Chatbot app!

#### Add `nest_asyncio` 
Add `nest_asyncio` to avoid error such as `asyncio.run() cannot be called from a running event loop`. Since we're running another interface inside a Jupyter notebook where an asyncio event loop is already running, we'll encounter the error. (since The uvicorn.run() function uses asyncio.run(), which isn't compatible with a running event loop.)

In [7]:
# !pip install -q nest_asyncio
import nest_asyncio
nest_asyncio.apply()

In [8]:
app = Application(debug=False, share=False)
app.add_component(chatbot)
app.add_component(dashboard)
app.run()

INFO:     Started server process [2254]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)


## End of the notebook

Check more use cases in the [example folder](../../examples/)!

<a href="https://www.cambioml.com/" title="Title">
    <img src="../image/cambioml_logo_large.png" style="height: 100px; display: block; margin-left: auto; margin-right: auto;"/>
</a>