# Example of generating QAs for a 10K
In this example, we will show you how to generate question-answers (QAs) from a pdf using OpenAI's models via `uniflow`'s [OpenAIJsonModelFlow](https://github.com/CambioML/uniflow/blob/main/uniflow/flow/model_flow.py#L125).

For this example, we're using a [10K from Nike](https://investors.nike.com/investors/news-events-and-reports/).

### Before running the code

You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation.

Next, you will need a valid [OpenAI API key](https://platform.openai.com/api-keys) to run the code. Once you have the key, set it as the environment variable `OPENAI_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main#api-keys)

Finally, we are storing the Nike 10K in the `data\raw_input` directory as "nike-10k-2023.pdf". You can download the file from [here](https://s1.q4cdn.com/806093406/files/doc_downloads/2023/414759-1-_5_Nike-NPS-Combo_Form-10-K_WR.pdf).

### Update system path

In [15]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

### Install helper packages

In [16]:
!{sys.executable} -m pip install -q python-dotenv openai
!{sys.executable} -m pip uninstall -y uniflow

[0m

### Import Dependency

In [17]:
from dotenv import load_dotenv
from uniflow.flow.client import TransformClient
from uniflow.flow.config import TransformOpenAIConfig
from uniflow.op.model.model_config import OpenAIModelConfig
from uniflow.op.prompt import Context, PromptTemplate

load_dotenv()


True

### Prepare sample prompts

First, we need to demonstrate sample prompts for LLM. We do this by giving a sample list of `Context` examples to the `PromptTemplate` class.

In [18]:
guided_prompt = PromptTemplate(
    instruction="""Generate one question and its corresponding answer based on the last context in the last
    example. Follow the format of the examples below to include context, question, and answer in the response""",
    few_shot_prompt=[
        Context(
            context="In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",
            question="Who published A Mathematical Theory of Communication in 1948?",
            answer="Claude E. Shannon.",
        ),
        Context(
            context="""The Compute & Networking segment is comprised of our Data Center accelerated computing platforms and end-to-end networking platforms including Quantum
for InfiniBand and Spectrum for Ethernet; our NVIDIA DRIVE automated-driving platform and automotive development agreements; """,
            question="What does the Compute & Networking segment include?",
            answer="""The Compute & Networking segment includes Data Center accelerated computing platforms, end-to-end networking platforms (Quantum for InfiniBand and Spectrum for Ethernet), the NVIDIA DRIVE automated-driving platform, and automotive development agreements.""",
        ),
])

Next, for the given `page_contents` above, we convert them to the `Context` class to be processed by `uniflow`.

### Use LLM to generate data

In this example, we will use the [OpenAIModelConfig](https://github.com/CambioML/uniflow/blob/main/uniflow/model/config.py#L17)'s default LLM to generate questions and answers.

Here, we pass in our `guided_prompt` to the `OpenAIConfig` to use our customized instructions and examples, instead of the `uniflow` default ones.

We also want to get the response in the `json` format instead of the `text` default, so we set the `response_format` to `json_object`.

In [19]:
config = TransformOpenAIConfig(
    prompt_template=guided_prompt,
    model_config=OpenAIModelConfig(response_format={"type": "text"}),
)
client = TransformClient(config)

In [20]:
def read_and_chunk(file_path, words_per_chunk=2500):
    # Initialize variables
    contexts = []
    current_chunk_words = []

    # Open and read the file
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            # Split the line into words
            words = line.split()
            for word in words:
                current_chunk_words.append(word)
                # Check if the current chunk reached the specified number of words
                if len(current_chunk_words) >= words_per_chunk:
                    # Join the words to form a context and add to the list
                    contexts.append(Context(context=' '.join(current_chunk_words)))
                    current_chunk_words = []  # Reset for the next chunk

    # Add the last chunk if there are any remaining words
    if current_chunk_words:
        contexts.append(Context(context=' '.join(current_chunk_words)))

    return contexts

# Example usage
file_path = './data/raw_input/book-war-and-peace.txt'
contexts = read_and_chunk(file_path)
for context in contexts[:1]:  # Just printing the first Context for brevity
    print(f"---\nContext(context='{context.context[:50]}...')\n---")

---
Context(context='CHAPTER I "Well, Prince, so Genoa and Lucca are no...')
---


Now we call the `run` method on the `client` object to execute the question-answer generation operation on the data shown above.

In [21]:
output = client.run(contexts[:15])

  0%|          | 0/15 [00:00<?, ?it/s]

100%|██████████| 15/15 [00:12<00:00,  1.24it/s]


### Output

In [27]:
from pprint import pprint
pprint(len(output))
for o in output:
    pprint(o['output'][0]['response'])

15
['question: Who is the young Princess Bolkonskaya married to?\n'
 'answer: The young Princess Bolkonskaya is married to Prince Bolkonski, and '
 'she is known as the most fascinating woman in Petersburg.']
['question: Who entered the drawing room as another visitor?\n'
 "answer: Prince Andrew Bolkonski, the little princess' husband."]
['question: Who was speaking French and stressing the last syllable of the '
 "general's name like a Frenchman?\n"
 'answer: Bolkonski.']
["question: Who was the lady's gown for a house dress as fresh and elegant as "
 'the other?\n'
 'answer: The princess.']
['question: Who was betting with Stevens, an English naval officer, that he '
 'would drink a bottle of rum sitting on the outer ledge of the third floor '
 'window with his legs hanging out?\n'
 'answer: Dolokhov was betting with Stevens.']
['question: Who are the three men known for their misadventures with a bear?\n'
 "answer: Anatole Kuragin, Prince Vasili's son, and a certain Dolokhov."]
['qu

## End of the notebook

Check more Uniflow use cases in the [example folder](https://github.com/CambioML/uniflow/tree/main/example/model#examples)!

<a href="https://www.cambioml.com/" title="Title">
    <img src="../image/cambioml_logo_large.png" style="height: 100px; display: block; margin-left: auto; margin-right: auto;"/>
</a>