# Notebook for Report Questions Flow via OpenAI
In this example, we will show you how to generate question-answers (QAs) from a pdf using OpenAI's models via `uniflow`'s [OpenAIJsonModelFlow](https://github.com/CambioML/uniflow/blob/main/uniflow/flow/model_flow.py#L125).


### Before running the code

You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation.

Next, you will need a valid [OpenAI API key](https://platform.openai.com/api-keys) to run the code. Once you have the key, set it as the environment variable `OPENAI_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main#api-keys)

In this example, we'll be using two papers in markdown format from under 'example/transform/data/raw_input/'

### Update system path

In [28]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

### Install helper packages

### Import Dependency

In [29]:
from dotenv import load_dotenv

from uniflow.flow.flow_factory import FlowFactory
from uniflow.flow.client import TransformClient
from uniflow.flow.config import TransformOpenAIConfig
from uniflow.op.model.model_config import OpenAIModelConfig
from uniflow.op.prompt import Context, PromptTemplate

load_dotenv()


True

In [30]:
FlowFactory.list()

{'extract': ['ExtractHTMLFlow',
  'ExtractImageFlow',
  'ExtractIpynbFlow',
  'ExtractMarkdownFlow',
  'ExtractPDFFlow',
  'ExtractTxtFlow',
  'ExtractGmailFlow'],
 'transform': ['TransformAzureOpenAIFlow',
  'TransformComparisonGoogleFlow',
  'TransformComparisonOpenAIFlow',
  'TransformCopyFlow',
  'TransformGoogleFlow',
  'TransformGoogleMultiModalModelFlow',
  'TransformHuggingFaceFlow',
  'TransformLMQGFlow',
  'TransformOpenAIFlow',
  'TransformQuestionExtractionOpenAIFlow',
  'TransformNewsFeedOpenAIFlow'],
 'rater': ['RaterFlow']}

### Prepare the input data
They are in preprocessed in markdown formats

In [31]:
import os
import sys

from any_parser import AnyParser 

example_apikey = os.getenv("CAMBIO_API_KEY")
example_local_file = "data/raw_input/PDD/Hayden.pdf"

op = AnyParser(example_apikey)
content_result = op.extract(example_local_file)
content_result

['# Pinduoduo (Nasdaq: PDD)\n\nPinduoduo is the third largest ecommerce platform in China. We believe the stock is under-\nvalued, as PDD is now trading at ~10.4x EV/FCF (adjusted for stock-based comp) or ~6.4x EV\n/ FCF on 2025 estimates. This valuation seems far too cheap, for a company who is growing\ncurrent revenues at +65% y/y (as of Q3 2022), expected to grow top-line at ~24% y/y over the\nnext three years, and where we expect operating income to 2 - 3x over the same timeframe¹.\n\nIt seems the basis of this opportunity, lies in the investment community\'s broad aversion to\nChinese equities (especially internet companies), in addition to several company specific\nconcerns:\n\n1. The Recent Chinese Equity Sell-off is Due to Politics, Not Fundamentals: At the\nlowest point on Oct 24th, PDD\'s stock price was down by -34% in a single day, due to\nforeign capital fleeing the Chinese equity markets after the President Xi\'s "landslide"\nreelection victory and the appointment of his 

In [52]:
# TODO: Fix later, change Open-parser librray to Any-parser
from uniflow.flow.client import ExtractClient
from uniflow.flow.config import ExtractPDFConfig
from uniflow.op.model.model_config import OpenParserModelConfig

input_file = "data/raw_input/PDD/Hayden.pdf"

data = [
    {"filename": input_file},
]

config = ExtractPDFConfig(
    model_config=OpenParserModelConfig(
        model_name = "CambioML/open-parser",
        api_key = os.getenv("CAMBIO_API_KEY"),
    ),
)
openparser_client = ExtractClient(config)

output = openparser_client.run(data)
# content_result = output[0]['output'][0]['text']
# content_result

  0%|          | 0/1 [00:00<?, ?it/s]

Upload response: 204


100%|██████████| 1/1 [00:23<00:00, 23.16s/it]

Extraction success.





['# HAYDEN CAPITAL',
 '1345 AVENUE OF THE AMERICAS',
 '33RD FLOOR',
 'NEW YORK, NY 10105',
 'HAYDENCAP ITAL.CO',
 '## Pinduoduo (Nasdaq: PDD)',
 'Pinduoduo is the third largest ecommerce platform in China. We believe the stock is under-',
 'valued, as PDD is now trading at ~10.4x EV/FCF (adjusted for stock-based comp) or ~6.4x EV',
 '/ FCF on 2025 estimates. This valuation seems far too cheap, for a company who is growing',
 'current revenues at +65% y/y (as of Q3 2022), expected to grow top-line at ~24% y/y over the',
 'next three years, and where we expect operating income to 2 - 3x over the same timeframe¹.',
 "It seems the basis of this opportunity, lies in the investment community's broad aversion to",
 'Chinese equities (especially internet companies), in addition to several company specific',
 'concerns:',
 '1. The Recent Chinese Equity Sell-off is Due to Politics, Not Fundamentals: At the',
 "lowest point on Oct 24th, PDD's stock price was down by -34% in a single day, due to",

### Use LLM to generate question/answer pairs for given reports

In this example, we will use the [OpenAIModelConfig](https://github.com/CambioML/uniflow/blob/main/uniflow/model/config.py#L17)'s default LLM to generate questions and answers.

In [32]:
# initialize questions bank
questions_bank = []

In [33]:
input_data = [Context(Context=content_result[0])]
config = TransformOpenAIConfig(
    flow_name="TransformQuestionExtractionOpenAIFlow",
    model_config=OpenAIModelConfig(),
)
client = TransformClient(config)

Now we call the `run` method on the `client` object to execute the question-answer generation operation on the data shown above.

In [34]:
output = client.run(input_data)

100%|██████████| 1/1 [00:23<00:00, 23.40s/it]


### Pipeline

### Process the output

Let's take a look of the generated output. We need to do a little postprocessing on the raw output.

In [35]:
# PDD_ hayden new
output

[{'output': [{'response': ['question: What is the ticker symbol for Pinduoduo?\nanswer: The ticker symbol for Pinduoduo is PDD.'],
    'error': 'No errors.'},
   {'response': ['question: What are the current trading multiples for Pinduoduo, and how do they compare to 2025 estimates?\nanswer: Pinduoduo is currently trading at around 10.4x EV/FCF, adjusted for stock-based compensation, or around 6.4x EV/FCF based on 2025 estimates. These multiples suggest that the stock is undervalued.'],
    'error': 'No errors.'},
   {'response': ['question: What is the basis of the current investment opportunity in Chinese equities?\nanswer: The investment opportunity arises from the broad aversion to Chinese equities, particularly internet companies, as well as specific concerns related to individual companies.'],
    'error': 'No errors.'},
   {'response': ["question: What is the main cause of the recent Chinese equity sell-off?\nanswer: The recent Chinese equity sell-off is primarily due to foreign

In [36]:
# Extracting questions

for item in output:
    for output in item['output']:
        for response in output['response']:
            if response.startswith('question:'):
                question = response.split('\n')[0].replace('question:', '').strip()
                questions_bank.append(question)

questions_bank

['What is the ticker symbol for Pinduoduo?',
 'What are the current trading multiples for Pinduoduo, and how do they compare to 2025 estimates?',
 'What is the basis of the current investment opportunity in Chinese equities?',
 'What is the main cause of the recent Chinese equity sell-off?',
 'How has the political stability of China been communicated in advance, and how does it affect the Chinese economy?',
 'What events have occurred since October 24th that could impact the stock?',
 "How has Pinduoduo's lack of communication with investors affected its relationship with the investment community?",
 "What challenges does Pinduoduo's financial reporting pose to investors looking to analyze the success of its grocery business and international business segment?",
 "What factors will impact the magnitude of potential profitability for Pinduoduo's grocery business and investment into Temu?",
 'What is the name of the entity mentioned in the context?']

### Now process incoming news feed

In [37]:
example_news = "data/raw_input/PDD/PDD_news_1.pdf"
news_content = op.extract(example_news)
news_content

['# PDD\'s Temu shopping app targeted in EU consumer group\'s complaint to EU tech regulator | South China Morning Post\n\n## South China Morning Post\n\n![Temu logo](https://www.scmp.com/sites/default/files/styles/1200x800/public/2023/05/21/d00eb5c1-2a39-45d9-bdd9-c8a44a88a1fc.png?itok=5OQ7HuMM)\n\nTEMU\n\nWe use cookies to tailor your experience\nand present relevant ads. By clicking\n"Accept", you agree that cookies can be\nplaced per our Privacy Policy.\n\n**Tech / Policy**\n\n[https://www.scmp.com/tech/policy/article/3262916/pdds-temu-shopping-app-targeted-eu-consumer-groups-complaint-eu-tech-regulator](https://www.scmp.com/tech/policy/article/3262916/pdds-temu-shopping-app-targeted-eu-consumer-groups-complaint-eu-tech-regulator)\n',
 "# PDD's Temu shopping app targeted in EU consumer group's complaint to EU tech regulator\n\n## PDD's Temu shopping app targeted in EU consumer group's complaint to EU tech regulator\n\n- Pan-European consumers organisation BEUC says Temu, which laun

In [38]:
questions_bank

['What is the ticker symbol for Pinduoduo?',
 'What are the current trading multiples for Pinduoduo, and how do they compare to 2025 estimates?',
 'What is the basis of the current investment opportunity in Chinese equities?',
 'What is the main cause of the recent Chinese equity sell-off?',
 'How has the political stability of China been communicated in advance, and how does it affect the Chinese economy?',
 'What events have occurred since October 24th that could impact the stock?',
 "How has Pinduoduo's lack of communication with investors affected its relationship with the investment community?",
 "What challenges does Pinduoduo's financial reporting pose to investors looking to analyze the success of its grocery business and international business segment?",
 "What factors will impact the magnitude of potential profitability for Pinduoduo's grocery business and investment into Temu?",
 'What is the name of the entity mentioned in the context?']

In [47]:
input_data = [Context(Context=news_content[0])]
config = TransformOpenAIConfig(
    flow_name="TransformNewsFeedOpenAIFlow",
    prompt_template= PromptTemplate(instruction='\n'.join(questions_bank)),
    model_config=OpenAIModelConfig(),
)
client = TransformClient(config)

DEBUG question bank in session:  ['What is the ticker symbol for Pinduoduo?', 'What are the current trading multiples for Pinduoduo, and how do they compare to 2025 estimates?', 'What is the basis of the current investment opportunity in Chinese equities?', 'What is the main cause of the recent Chinese equity sell-off?', 'How has the political stability of China been communicated in advance, and how does it affect the Chinese economy?', 'What events have occurred since October 24th that could impact the stock?', "How has Pinduoduo's lack of communication with investors affected its relationship with the investment community?", "What challenges does Pinduoduo's financial reporting pose to investors looking to analyze the success of its grocery business and international business segment?", "What factors will impact the magnitude of potential profitability for Pinduoduo's grocery business and investment into Temu?", 'What is the name of the entity mentioned in the context?']


In [48]:
output = client.run(input_data)
output

  0%|          | 0/1 [00:00<?, ?it/s]

DEBUG:  {'response': ['question: What is the ticker symbol for Pinduoduo?\nanswer: PDD'], 'error': 'No errors.'}
DEBUG:  {'response': ['question: What are the current trading multiples for Pinduoduo, and how do they compare to 2025 estimates?\nanswer: N/A'], 'error': 'No errors.'}
DEBUG:  {'response': ['answer: N/A'], 'error': 'No errors.'}
DEBUG:  {'response': ['answer: N/A'], 'error': 'No errors.'}
DEBUG:  {'response': ['answer: N/A'], 'error': 'No errors.'}
DEBUG:  {'response': ['answer: N/A'], 'error': 'No errors.'}
DEBUG:  {'response': ["question: How has Pinduoduo's lack of communication with investors affected its relationship with the investment community?\nanswer: Pinduoduo's lack of communication with investors has likely eroded trust and confidence in the company, leading to a strained relationship with the investment community. Investors may feel uncertain about the company's direction and performance, impacting their willingness to invest or maintain their current holdings

100%|██████████| 1/1 [00:06<00:00,  6.14s/it]

DEBUG:  {'response': ['answer: PDD (Pinduoduo)'], 'error': 'No errors.'}





[{'error': "'list' object has no attribute 'value_dict'",
  'traceback': 'Traceback (most recent call last):\n  File "c:\\Users\\Pumpkinfries\\Desktop\\Cambio\\uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering\\example\\transform\\../..\\uniflow\\flow\\server.py", line 165, in _run_flow\n    output = f(input_list)\n  File "c:\\Users\\Pumpkinfries\\Desktop\\Cambio\\uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering\\example\\transform\\../..\\uniflow\\flow\\flow.py", line 37, in __call__\n    output_dict = self._exit(nodes)\n  File "c:\\Users\\Pumpkinfries\\Desktop\\Cambio\\uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering\\example\\transform\\../..\\uniflow\\flow\\flow.py", line 83, in _exit\n    constants.OUTPUT_NAME: [copy.deepcopy(node.value_dict) for node in nodes],\n  File "c:\\Users\\Pumpkinfries\\Desktop\\Cambio\\uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering\\example\\transform\\../..\\uniflow\\flow\\flow.py", line 83, in <l

In [26]:
# PDD_Hayden
output

[{'error': "'list' object has no attribute 'value_dict'",
  'traceback': 'Traceback (most recent call last):\n  File "c:\\Users\\Pumpkinfries\\Desktop\\Cambio\\uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering\\example\\transform\\../..\\uniflow\\flow\\server.py", line 165, in _run_flow\n    output = f(input_list)\n  File "c:\\Users\\Pumpkinfries\\Desktop\\Cambio\\uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering\\example\\transform\\../..\\uniflow\\flow\\flow.py", line 37, in __call__\n    output_dict = self._exit(nodes)\n  File "c:\\Users\\Pumpkinfries\\Desktop\\Cambio\\uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering\\example\\transform\\../..\\uniflow\\flow\\flow.py", line 83, in _exit\n    constants.OUTPUT_NAME: [copy.deepcopy(node.value_dict) for node in nodes],\n  File "c:\\Users\\Pumpkinfries\\Desktop\\Cambio\\uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering\\example\\transform\\../..\\uniflow\\flow\\flow.py", line 83, in <l

In [35]:
# making question bank

# Extracting questions
questions_bank = []
for item in output:
    for output in item['output']:
        for response in output['response']:
            if response.startswith('question:'):
                question = response.split('\n')[0].replace('question:', '').strip()
                questions_bank.append(question)

questions_bank

['What is the stock ticker symbol for Pinduoduo?',
 'What is the current valuation of Pinduoduo and how does it compare to estimated future performance?',
 "What are the key concerns contributing to the investment community's aversion to Chinese equities, particularly internet companies?",
 "What recent event led to a significant drop in Pinduoduo's stock price on Oct 24th?",
 "How has the anticipation of Xi Jinping's reelection impacted the political landscape in China?",
 'What are some examples of positive news that contributed to stock prices after October 24th?',
 'What issues does Pinduoduo have in terms of communication with investors?',
 'What factors will influence the magnitude of investment spend into Duoduo grocery and Temu, and their respective profitability?',
 'What is the name of the financial analyst firm mentioned in the context?']

In [25]:
# PDD_Hayden
output

[{'output': [{'response': ['question: What is the stock symbol for Pinduoduo?\nanswer: The stock symbol for Pinduoduo is PDD.'],
    'error': 'No errors.'},
   {'response': ["question: What is the current valuation of Pinduoduo's stock based on EV/FCF?\nanswer: Pinduoduo's stock is currently trading at ~10.4x EV/FCF, adjusted for stock-based comp, or ~6.4x EV/FCF on 2025 estimates.\n\nquestion: What is the revenue growth rate of Pinduoduo as of Q3 2022?\nanswer: Pinduoduo's current revenues are growing at +65% year-over-year as of Q3 2022.\n\nquestion: What is the expected top-line growth rate for Pinduoduo over the next three years?\nanswer: Pinduoduo is expected to grow its top-line at ~24% year-over-year over the next three years.\n\nquestion: What is the expected growth in operating income for Pinduoduo over the next three years?\nanswer: Operating income for Pinduoduo is expected to 2 - 3x over the next three years."],
    'error': 'No errors.'},
   {'response': ["question: What i

In [20]:
# PDD_H3_AP
output

[{'output': [{'response': ['question: What is the name of the company under the stock symbol PDD US?\nanswer: PDD Holdings.'],
    'error': 'No errors.'},
   {'response': ['question: What strategy boosted revenue and earnings growth?\nanswer: Consumer wallet share gain strategy.'],
    'error': 'No errors.'},
   {'response': ["question: What were PDD Holdings' 2Q23 revenue and non-GAAP net income results?\nanswer: 2Q23 revenue was up 66.3% YoY to RMB52.3bn, and non-GAAP net income increased by 42% YoY to RMB15.3bn.\n\nquestion: What factors contributed to the strong revenue and net income results for PDD Holdings in 2Q23?\nanswer: The strong results were attributed to robust GMV growth, continuous increase in monetization, and operating leverage, as well as sales and marketing spending geared at generating higher GMV from key consumer wallet-share gain categories.\n\nquestion: What is the analyst's recommendation for PDD Holdings after the 2Q23 results?\nanswer: The analyst maintains a

In [14]:
# PDD_US
output

[{'output': [{'response': ['question: What is the name of the research group?\nanswer: DBS Group Research.'],
    'error': 'No errors.'},
   {'response': ['question: What is the focus of the research?\nanswer: The focus of the research is on United States equity.'],
    'error': 'No errors.'},
   {'response': ['question: What significant event happened on 22 Mar 2024?\nanswer: There is not enough context to determine the significance of this date for stock analysis.'],
    'error': 'No errors.'},
   {'response': ['question: What is the name of the company associated with "## PDD Holdings"?\nanswer: The name of the company associated with "## PDD Holdings" is Pinduoduo Inc.'],
    'error': 'No errors.'},
   {'response': ['question: What is the title of the article published by Claude E. Shannon in 1948 that established the theory of information?\nanswer: A Mathematical Theory of Communication.\nquestion: What concept did Claude E. Shannon introduce for the first time in his article?\nan

## End of the notebook

Check more Uniflow use cases in the [example folder](https://github.com/CambioML/uniflow/tree/main/example/model#examples)!

<a href="https://www.cambioml.com/" title="Title">
    <img src="../image/cambioml_logo_large.png" style="height: 100px; display: block; margin-left: auto; margin-right: auto;"/>
</a>