# Notebook for Report Questions Flow via OpenAI
In this example, we will show you how to generate question-answers (QAs) from a pdf using OpenAI's models via `uniflow`'s [OpenAIJsonModelFlow](https://github.com/CambioML/uniflow/blob/main/uniflow/flow/model_flow.py#L125).


### Before running the code

You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation.

Next, you will need a valid [OpenAI API key](https://platform.openai.com/api-keys) to run the code. Once you have the key, set it as the environment variable `OPENAI_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main#api-keys)

In this example, we'll be using two papers in markdown format from under 'example/transform/data/raw_input/'

### Update system path

In [28]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

### Install helper packages

### Import Dependency

In [158]:
from dotenv import load_dotenv

from uniflow.flow.flow_factory import FlowFactory
from uniflow.flow.client import TransformClient
from uniflow.flow.config import TransformOpenAIConfig
from uniflow.op.model.model_config import OpenAIModelConfig
from uniflow.op.prompt import Context, PromptTemplate

load_dotenv()


True

In [159]:
FlowFactory.list()

{'extract': ['ExtractHTMLFlow',
  'ExtractImageFlow',
  'ExtractIpynbFlow',
  'ExtractMarkdownFlow',
  'ExtractPDFFlow',
  'ExtractTxtFlow',
  'ExtractGmailFlow'],
 'transform': ['TransformAzureOpenAIFlow',
  'TransformComparisonGoogleFlow',
  'TransformComparisonOpenAIFlow',
  'TransformCopyFlow',
  'TransformGoogleFlow',
  'TransformGoogleMultiModalModelFlow',
  'TransformHuggingFaceFlow',
  'TransformLMQGFlow',
  'TransformOpenAIFlow',
  'TransformQuestionExtractionOpenAIFlow',
  'TransformNewsFeedOpenAIFlow',
  'TransformReportGenerationOpenAIFlow'],
 'rater': ['RaterFlow']}

### Prepare the input data
They are in preprocessed in markdown formats

In [156]:
import os
import sys

from any_parser import AnyParser 

example_apikey = os.getenv("CAMBIO_API_KEY")
example_local_file = "data/raw_input/PDD/Hayden.pdf"

op = AnyParser(example_apikey)
content_result = op.extract(example_local_file)
content_result

['# Pinduoduo (Nasdaq: PDD)\n\nPinduoduo is the third largest ecommerce platform in China. We believe the stock is under-\nvalued, as PDD is now trading at ~10.4x EV/FCF (adjusted for stock-based comp) or ~6.4x EV\n/ FCF on 2025 estimates. This valuation seems far too cheap, for a company who is growing\ncurrent revenues at +65% y/y (as of Q3 2022), expected to grow top-line at ~24% y/y over the\nnext three years, and where we expect operating income to 2 - 3x over the same timeframe¹.\n\nIt seems the basis of this opportunity, lies in the investment community\'s broad aversion to\nChinese equities (especially internet companies), in addition to several company specific\nconcerns:\n\n1. The Recent Chinese Equity Sell-off is Due to Politics, Not Fundamentals: At the\nlowest point on Oct 24th, PDD\'s stock price was down by -34% in a single day, due to\nforeign capital fleeing the Chinese equity markets after the President Xi\'s "landslide"\nreelection victory and the appointment of his 

In [157]:
# TODO: Fix later, change Open-parser librray to Any-parser
# TODO: need to fix pages 
from uniflow.flow.client import ExtractClient
from uniflow.flow.config import ExtractPDFConfig
from uniflow.op.model.model_config import OpenParserModelConfig

input_file = "data/raw_input/PDD/Hayden.pdf"

data = [
    {"filename": input_file},
]

config = ExtractPDFConfig(
    model_config=OpenParserModelConfig(
        model_name = "CambioML/open-parser",
        api_key = os.getenv("CAMBIO_API_KEY"),
    ),
)
openparser_client = ExtractClient(config)

output = openparser_client.run(data)
# content_result = output[0]['output'][0]['text']
# content_result

  0%|          | 0/1 [00:00<?, ?it/s]

Upload response: 204


100%|██████████| 1/1 [00:23<00:00, 23.54s/it]

Extraction success.





### Use LLM to generate question/answer pairs for given reports

In this example, we will use the [OpenAIModelConfig](https://github.com/CambioML/uniflow/blob/main/uniflow/model/config.py#L17)'s default LLM to generate questions and answers.

In [32]:
# initialize questions bank
questions_bank = []

In [203]:
input_data = [Context(Context=content_result[0])]
config = TransformOpenAIConfig(
    flow_name="TransformQuestionExtractionOpenAIFlow",
    model_config=OpenAIModelConfig(),
)
client = TransformClient(config)

Now we call the `run` method on the `client` object to execute the question-answer generation operation on the data shown above.

In [204]:
temp_output = client.run(input_data)
temp_output

100%|██████████| 1/1 [00:09<00:00,  9.14s/it]


[{'output': [{'response': ["question: ['What is the current stock price and trend of Pinduoduo (PDD) over the past year?', 'What are the key factors impacting Pinduoduo’s stock price, and how do these factors align with the company’s long-term prospects?']"],
    'error': 'No errors.'},
   {'response': ["question: ['What is the current valuation of Pinduoduo (PDD) and how does this compare to its expected future growth?', 'What are the key factors driving Pinduoduo’s undervaluation, and do you anticipate these factors changing in the near future?']"],
    'error': 'No errors.'},
   {'response': ["question: ['What are the specific concerns that investors have about Chinese equities and internet companies?', 'How is the investment community's aversion to Chinese equities impacting potential opportunities for investment in Chinese companies like Pinduoduo?']"],
    'error': 'No errors.'},
   {'response': ['question: [\'To what extent did foreign capital fleeing the Chinese equity markets 

In [206]:
temp = []

for item in temp_output:
        for output in item['output']:
            for response in output['response']:
                if response.lower().startswith('question:'):
                    # Extract the list of questions as a string
                    question_list_str = response.replace('question:', '').strip()
                    question_list_str = question_list_str[1:-1]
                    question_list = [q.strip().strip('"').strip("'") for q in question_list_str.split(', "')]
                    temp.extend(question_list)
            
temp

["What is the current stock price and trend of Pinduoduo (PDD) over the past year?', 'What are the key factors impacting Pinduoduo’s stock price, and how do these factors align with the company’s long-term prospects?",
 "What is the current valuation of Pinduoduo (PDD) and how does this compare to its expected future growth?', 'What are the key factors driving Pinduoduo’s undervaluation, and do you anticipate these factors changing in the near future?",
 "What are the specific concerns that investors have about Chinese equities and internet companies?', 'How is the investment community's aversion to Chinese equities impacting potential opportunities for investment in Chinese companies like Pinduoduo?",
 'To what extent did foreign capital fleeing the Chinese equity markets affect PDD?',
 "How has President Xi's reelection impacted foreign investor sentiment towards Chinese equities?",
 "How have the issues with Xi Jinping's potential third term been communicated to investors and the pu

### Pipeline

In [212]:
# Extract PDF, generate questions, then store them into a question bank
questions_bank = []

def extract_and_get_questions(file_path):
    content_result = op.extract(file_path)
    input_data = [Context(Context=content_result[0])]
    config = TransformOpenAIConfig(
        flow_name="TransformQuestionExtractionOpenAIFlow",
        model_config=OpenAIModelConfig(),
    )
    client = TransformClient(config)
    output = client.run(input_data)

    # for item in output:
    #     for output in item['output']:
    #         for response in output['response']:
    #             if response.startswith('question:'):
    #                 question = response.split('\n')[0].replace('question:', '').strip()
    #                 questions_bank.append(question)
    for item in output:
        for cur_output in item['output']:
            for response in cur_output['response']:
                if response.lower().startswith('question:'):
                    # Extract the list of questions as a string
                    question_list_str = response.replace('question:', '').strip()
                    question_list_str = question_list_str[1:-1]
                    question_list = [q.strip().strip('"').strip("'") for q in question_list_str.split(', "')]
                    questions_bank.extend(question_list)


In [213]:
dir_cur = os.getcwd()  
target_dir = os.path.join(dir_cur, "data/raw_input/PDD/reports")
for root, dirs, files in os.walk(target_dir):
    for file in files:
        file_path = os.path.join(root, file)
        extract_and_get_questions(file_path)

100%|██████████| 1/1 [00:06<00:00,  6.93s/it]
100%|██████████| 1/1 [00:06<00:00,  6.36s/it]
100%|██████████| 1/1 [00:27<00:00, 27.73s/it]
100%|██████████| 1/1 [00:15<00:00, 15.22s/it]
100%|██████████| 1/1 [00:10<00:00, 10.57s/it]
100%|██████████| 1/1 [00:21<00:00, 21.36s/it]


In [214]:
# total num of questions generated
len(questions_bank)

129

In [215]:
questions_bank

["What topics are covered in the Table of Contents?', 'How does the Table of Contents organize the information in the document?",
 "What is the purpose of this document?', 'What specific information about the company is likely to be included in this document?",
 "What are the key financial highlights and performance indicators outlined in Tesla's Form 20-F?",
 "What are the main risks and challenges highlighted in Tesla's Form 20-F that could impact the stock's performance?",
 "What is the fiscal year end date for the company?', 'Is the company required to file a transition report, and if so, what is the transition period?",
 "What is PDD Holdings Inc.'s current financial standing and performance?', 'What are the primary factors impacting PDD Holdings Inc.'s stock value and market performance?",
 "What is the jurisdiction of incorporation or organization for Pinduoduo (PDD)?', 'Where are the principal executive offices of Pinduoduo located?",
 'What securities have been registered or a

### Now process incoming news feed

In [218]:
example_news = "data/raw_input/PDD/news/sa_news_2.pdf"
news_content = op.extract(example_news)
news_content

["## PDD jumps after Q1 profit surges 200%, transaction services drive revenue growth\n\nMay 22, 2024 7:16 AM ET PDD Holdings Inc. (PDD) Stock NTES, BIDU, BABA.. By: Ravikash Bakolia, SA\nNews Editor\n\n![Etsy, TEMU, Etsy, Temu, Gumtree, ebay, eBay, Amazon, AliExpress](https://seekingalpha.com/uploads/2024/5/22/41092-16848292289893793.jpg)\n\nPDD's (NASDAQ:PDD) stock rose about 8% premarket on Wednesday after first\nquarter results beat estimates.\n\nAdjusted earnings per American depositary shares surged 199.4% year-over-year to\nRMB20.72 ($2.83), the company said.",
 '5/22/24, 3:01 PM\n\nPDD jumps after Q1 profit surges 200%, transaction services drive revenue growth_Seeking Alpha\n\nThe e-commerce giant\'s total revenues soared nearly 131% year-over-year to\nRMB86.81B (about $12.02B). Both top and bottom lines surpassed analysts\' estimates.\n\n"We will focus our efforts on improving the overall consumer experience, strengthening\nour supply chain capabilities, and fostering a healt

In [93]:
# Optional: use questions_bank[:5] for faster compute time and lower op cost

In [220]:
input_data = [Context(Context=news_content[0])]
config = TransformOpenAIConfig(
    flow_name="TransformNewsFeedOpenAIFlow",
    prompt_template= PromptTemplate(instruction='\n'.join(questions_bank)),
    model_config=OpenAIModelConfig(),
)
client = TransformClient(config)

In [221]:
# TODO: swtich to json mode
output = client.run(input_data)
output

100%|██████████| 1/1 [02:02<00:00, 122.40s/it]


[{'output': [{'response': ['question: What topics are covered in the Table of Contents?\nanswer: N/A'],
    'error': 'No errors.'},
   {'response': ["The purpose of this document is to provide information on PDD's first quarter financial results, specifically highlighting the surge in adjusted earnings per American depositary shares and the resulting impact on the company's stock performance.\n\nThe specific information about the company likely to be included in this document would be PDD's first quarter financial results, including adjusted earnings per American depositary shares, revenue growth driven by transaction services, and the subsequent stock performance."],
    'error': 'No errors.'},
   {'response': ["question: What were the key financial highlights and performance indicators outlined in PDD's first quarter results?\nanswer: The key financial highlights from PDD's first quarter results include a 199.4% year-over-year surge in adjusted earnings per American depositary shares

In [223]:
# extract questions and answers used for report generation flow
questions = []
answers = []
result = output

for item in result:
    for cur_output in item.get('output', []):
        for response in cur_output.get('response', []):
            parts = response.split('\n')
            question = None
            answer = None
            for part in parts:
                if part.startswith('question:'):
                    question = part.replace('question: ', '')
                elif part.startswith('answer:'):
                    answer = part.replace('answer: ', '')

            if answer and answer != 'N/A' and question:
                questions.append(question)
                answers.append(answer)



In [225]:
for i, (question, answer) in enumerate(zip(questions, answers), 1):
    print(f"Question {i}: {question}")
    print(f"Answer {i}: {answer}")

Question 1: What were the key financial highlights and performance indicators outlined in PDD's first quarter results?
Answer 1: The key financial highlights from PDD's first quarter results include a 199.4% year-over-year surge in adjusted earnings per American depositary shares to RMB20.72 ($2.83).
Question 2: What is PDD Holdings Inc.'s current financial standing and performance?
Answer 2: PDD Holdings Inc. reported a significant increase in adjusted earnings per American depositary shares, surging 199.4% year-over-year to RMB20.72 ($2.83) in the first quarter, leading to a stock rise of about 8% premarket.
Question 3: How does the exchange on which Pinduoduo shares are listed impact the liquidity and trading volume of the stock?
Answer 3: The fact that Pinduoduo shares are listed on the NASDAQ exchange potentially enhances the liquidity and trading volume of the stock, as NASDAQ is known for its high trading volumes and liquidity.
Question 4: Is the company currently publicly trade

### Report Generation

In [226]:
concatenated_questions = "\n".join(questions)
concatenated_answers = "\n".join(answers)

input_data = [[Context(context=concatenated_questions), Context(context=concatenated_answers)]]
config = TransformOpenAIConfig(
    flow_name="TransformReportGenerationOpenAIFlow",
    model_config=OpenAIModelConfig(),
)
client = TransformClient(config)

In [227]:
concatenated_questions[:10]

'What were '

In [228]:
report_output = client.run(input_data)
report_output

100%|██████████| 1/1 [00:19<00:00, 19.96s/it]

debug labels:  ('label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 2-Market and Economic Analysis', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific I




[{'output': [[Context(context="The key financial highlights from PDD's first quarter results include a 199.4% year-over-year surge in adjusted earnings per American depositary shares to RMB20.72 ($2.83). PDD Holdings Inc. reported a significant increase in adjusted earnings per American depositary shares, surging 199.4% year-over-year to RMB20.72 ($2.83) in the first quarter, leading to a stock rise of about 8% premarket. Yes, PDD Holdings Inc. is currently publicly traded and is listed on the NASDAQ stock exchange. Analysts have increased their price targets and recommendations for PDD stock in response to the company's Q1 results beating estimates, with the stock rising about 8% premarket. PDD International's recent financial performance metrics include a 199.4% surge in adjusted earnings per American depositary shares year-over-year, driving an 8% increase in stock price premarket after the first quarter results beat estimates. PDD Holdings Inc.'s strong performance in Q1 was primar

In [229]:
summary = []

for item in report_output:
    for output in item['output']:
        for context_obj in output:
            summary.append(context_obj.context)

summary

["The key financial highlights from PDD's first quarter results include a 199.4% year-over-year surge in adjusted earnings per American depositary shares to RMB20.72 ($2.83). PDD Holdings Inc. reported a significant increase in adjusted earnings per American depositary shares, surging 199.4% year-over-year to RMB20.72 ($2.83) in the first quarter, leading to a stock rise of about 8% premarket. Yes, PDD Holdings Inc. is currently publicly traded and is listed on the NASDAQ stock exchange. Analysts have increased their price targets and recommendations for PDD stock in response to the company's Q1 results beating estimates, with the stock rising about 8% premarket. PDD International's recent financial performance metrics include a 199.4% surge in adjusted earnings per American depositary shares year-over-year, driving an 8% increase in stock price premarket after the first quarter results beat estimates. PDD Holdings Inc.'s strong performance in Q1 was primarily driven by a 200% surge in

#### Pipeline for News Feed

In [231]:
report_summaries = {
    "Company-Specific Information": [],
    "Market and Economic Analysis": [],
    "Governance": [],
    "Political Factors": [],
    "Other": []
}
categories = ["Company-Specific Information", "Market and Economic Analysis", "Governance", "Political Factors", "Other"]

def process_news_feed(file_path):
    news_content = op.extract(file_path)
    input_data = [Context(Context=news_content[0])]
    config = TransformOpenAIConfig(
        flow_name="TransformNewsFeedOpenAIFlow",
        prompt_template= PromptTemplate(instruction='\n'.join(questions_bank)),
        model_config=OpenAIModelConfig(),
    )
    client = TransformClient(config)
    result = client.run(input_data)

    questions = []
    answers = []

    # Extract question/asnwer pair
    for item in result:
        for cur_output in item.get('output', []):
            for response in cur_output.get('response', []):
                parts = response.split('\n')
                question = None
                answer = None
                for part in parts:
                    if part.startswith('question:'):
                        question = part.replace('question: ', '')
                    elif part.startswith('answer:'):
                        answer = part.replace('answer: ', '')

                if answer and answer != 'N/A' and question:
                    questions.append(question)
                    answers.append(answer)

    concatenated_questions = "\n".join(questions)
    concatenated_answers = "\n".join(answers)

    input_data = [[Context(context=concatenated_questions), Context(context=concatenated_answers)]]
    config = TransformOpenAIConfig(
        flow_name="TransformReportGenerationOpenAIFlow",
        model_config=OpenAIModelConfig(),
    )
    client = TransformClient(config)
    report_output = client.run(input_data)

    # shorten it for token limitation

    summary = []

    for item in report_output:
        for output in item['output']:
            for context_obj in output:
                summary.append(context_obj.context)

    for i in range(len(summary)):
        report_summaries[categories[i]].append(summary[i])

In [233]:
dir_cur = os.getcwd()  
target_dir = os.path.join(dir_cur, "data/raw_input/PDD/news")
for root, dirs, files in os.walk(target_dir):
    for file in files:
        file_path = os.path.join(root, file)
        process_news_feed(file_path)

100%|██████████| 1/1 [02:13<00:00, 133.29s/it]
100%|██████████| 1/1 [00:18<00:00, 18.66s/it]

debug labels:  ('label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 2-Market and Economic Analysis', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 2-Market and Economic Analysis', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-


100%|██████████| 1/1 [01:57<00:00, 117.67s/it]
100%|██████████| 1/1 [00:20<00:00, 20.31s/it]

debug labels:  ('label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 2-Market and Economic Analysis', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific I


100%|██████████| 1/1 [01:53<00:00, 113.91s/it]
100%|██████████| 1/1 [00:17<00:00, 17.14s/it]

debug labels:  ('label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 2-Market and Economic Analysis', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-


100%|██████████| 1/1 [01:58<00:00, 118.98s/it]
100%|██████████| 1/1 [00:17<00:00, 17.83s/it]

debug labels:  ('label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 5-Other', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'labe


100%|██████████| 1/1 [01:57<00:00, 117.33s/it]
100%|██████████| 1/1 [00:16<00:00, 16.64s/it]

debug labels:  ('label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 2-Market and Economic Analysis', 'label: 1-Company-Specific Information', 'label: 2-Market and Economic Analysis', 'label: 5-Other', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 1-Company-Specific Information', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Analysis', 'label: 2-Market and Economic Ana




In [234]:
report_summaries

{'Company-Specific Information': ['This document provides information about Urban Outfitters, Inc. (URBN) as one of the biggest stock gainers following upbeat Q1 results driven by more than 10% Y/Y growth at its Anthropologie and Free People segments. Urban Outfitters\' stock surged nearly 9% following upbeat Q1 results driven by more than 10% Y/Y growth at its Anthropologie and Free People segments. PDD Holdings Inc.\'s stock value and market performance are being positively impacted by better-than-expected Q3 results and raised guidance for the next quarter. Additionally, the company\'s growth drivers are gaining traction, especially in lower-tier cities in China. Yes, Urban Outfitters is currently publicly traded and is listed on the NASDAQ stock exchange. As a financial analyst for institutional investors, you should review various research reports on Urban Outfitters, Inc. available from investment banks, equity research firms, and independent analysts to gauge performance and out

In [242]:
overall_report = []

for cat in categories: # " ".join([s for s in report_summaries[cat] if s])
    overall_report.append(" ".join([s for s in report_summaries[cat] if s]))
overall_report

['This document provides information about Urban Outfitters, Inc. (URBN) as one of the biggest stock gainers following upbeat Q1 results driven by more than 10% Y/Y growth at its Anthropologie and Free People segments. Urban Outfitters\' stock surged nearly 9% following upbeat Q1 results driven by more than 10% Y/Y growth at its Anthropologie and Free People segments. PDD Holdings Inc.\'s stock value and market performance are being positively impacted by better-than-expected Q3 results and raised guidance for the next quarter. Additionally, the company\'s growth drivers are gaining traction, especially in lower-tier cities in China. Yes, Urban Outfitters is currently publicly traded and is listed on the NASDAQ stock exchange. As a financial analyst for institutional investors, you should review various research reports on Urban Outfitters, Inc. available from investment banks, equity research firms, and independent analysts to gauge performance and outlook. The strong Q1 results for U

In [244]:
len(overall_report)

5

In [245]:
report_md = "# Generated Report \n\n"
for i in range(len(overall_report)):
    cur_section = "## %s \n\n %s \n\n" % (categories[i], overall_report[i])
    report_md += cur_section
report_md

'# Generated Report \n\n## Company-Specific Information \n\n This document provides information about Urban Outfitters, Inc. (URBN) as one of the biggest stock gainers following upbeat Q1 results driven by more than 10% Y/Y growth at its Anthropologie and Free People segments. Urban Outfitters\' stock surged nearly 9% following upbeat Q1 results driven by more than 10% Y/Y growth at its Anthropologie and Free People segments. PDD Holdings Inc.\'s stock value and market performance are being positively impacted by better-than-expected Q3 results and raised guidance for the next quarter. Additionally, the company\'s growth drivers are gaining traction, especially in lower-tier cities in China. Yes, Urban Outfitters is currently publicly traded and is listed on the NASDAQ stock exchange. As a financial analyst for institutional investors, you should review various research reports on Urban Outfitters, Inc. available from investment banks, equity research firms, and independent analysts to

In [274]:
short_report_md = "# Generated Report \n\n"
for i in range(1, len(overall_report)):
    cur_section = "## %s \n\n %s \n\n" % (categories[i], overall_report[i])
    short_report_md += cur_section
short_report_md

'# Generated Report \n\n## Market and Economic Analysis \n\n Urban Outfitters (URBN) shares surged nearly 9% following upbeat Q1 results driven by more than 10% Y/Y growth at its Anthropologie and Free People segments. The biggest stock movers today are Urban Outfitters (URBN), Target (TGT), Pinduoduo (PDD), and others. Urban Outfitters\' stock surged nearly 9% following upbeat Q1 results driven by more than 10% year-over-year growth at its Anthropologie and Free People segments. Target and Pinduoduo also experienced significant stock movements. The investment community\'s aversion to Chinese equities may create a buying opportunity for investors in the Chinese market by driving down stock prices, making them more attractively valued. If the aversion is based on sentiment rather than fundamental factors, it could lead to undervaluation and present a potential buying opportunity. Urban Outfitters (NASDAQ: URBN) is one of the biggest stock gainers, with its shares surging nearly 9% follo

In [247]:
file_path = "result/report.md"

os.makedirs(os.path.dirname(file_path), exist_ok=True)

# Save to local file
with open(file_path, "w") as file:
    file.write(report_md)

In [282]:
guided_prompt = PromptTemplate(
        instruction="""
        Assume you are a financial analyst of a instituional investor. Your goal is use the contetxt to gain information to help you decide if you should long or short the stock.
        Edit the context and generate a report.
        """,
)

config = TransformOpenAIConfig(
    prompt_template=guided_prompt,
    model_config=OpenAIModelConfig(),
)
client = TransformClient(config)

## End of the notebook

Check more Uniflow use cases in the [example folder](https://github.com/CambioML/uniflow/tree/main/example/model#examples)!

<a href="https://www.cambioml.com/" title="Title">
    <img src="../image/cambioml_logo_large.png" style="height: 100px; display: block; margin-left: auto; margin-right: auto;"/>
</a>

In [283]:
polished_report = client.run([Context(context=short_report_md)])
polished_report

100%|██████████| 1/1 [00:08<00:00,  8.47s/it]


[{'output': [{'response': ["## Report\n\nUpon analyzing the financial and market performance of Urban Outfitters (URBN) and Pinduoduo (PDD), it is recommended to consider a long position for the stock of Urban Outfitters and a short position for the stock of Pinduoduo.\n\n### Urban Outfitters (URBN)\nThe stock of Urban Outfitters surged nearly 9% following upbeat Q1 results driven by more than 10% year-over-year growth at its Anthropologie and Free People segments. CEO Richard A. Hayne stated that customer demand remains robust for URBN's brands. This positive sentiment is reflected in the market performance of the stock.\n\nFrom a market and economic perspective, the company's growth is driven by strong consumer demand and brand loyalty, as evidenced by the growth in its segments. Additionally, the key economic indicators affecting the United States equity market are positive, including GDP growth, employment and job creation, inflation, interest rates, and consumer spending.\n\nMoreo

In [284]:
def extract_response(data):
    response_text = []
    def extract_helper(item):
        if isinstance(item, dict):
            for key, value in item.items():
                extract_helper(value)
        elif isinstance(item, list):
            for element in item:
                extract_helper(element)
        elif isinstance(item, str):
            response_text.append(item)

    extract_helper(data)
    return '\n'.join(response_text)

response = extract_response(polished_report)

file_path = "result/short_report.md"

os.makedirs(os.path.dirname(file_path), exist_ok=True)

with open(file_path, "w") as file:
    file.write(response)

In [258]:
len(overall_report[0])

75024

In [285]:
guided_prompt = PromptTemplate(
    instruction="""
    Assume you're an financial analyst working for institutional investors. 
    Edit the context.
    """
)


config = TransformOpenAIConfig(
    prompt_template=guided_prompt,
    model_config=OpenAIModelConfig(),
)
client = TransformClient(config)
client.run([Context(context=report_md)])

100%|██████████| 1/1 [00:00<00:00,  2.13it/s]


[{'error': 'Error code: 400 - {\'error\': {\'message\': "This model\'s maximum context length is 16385 tokens. However, your messages resulted in 25236 tokens. Please reduce the length of the messages.", \'type\': \'invalid_request_error\', \'param\': \'messages\', \'code\': \'context_length_exceeded\'}}',
  'traceback': 'Traceback (most recent call last):\n  File "c:\\Users\\Pumpkinfries\\Desktop\\Cambio\\uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering\\example\\transform\\../..\\uniflow\\flow\\server.py", line 165, in _run_flow\n    output = f(input_list)\n  File "c:\\Users\\Pumpkinfries\\Desktop\\Cambio\\uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering\\example\\transform\\../..\\uniflow\\flow\\flow.py", line 36, in __call__\n    nodes = self.run(nodes)\n  File "c:\\Users\\Pumpkinfries\\Desktop\\Cambio\\uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering\\example\\transform\\../..\\uniflow\\flow\\transform\\transform_openai_flow.py", line 52, 

In [266]:
temp_char = """
Assume you're a financial analyst working for institutional investors. \n    Edit the context: \n\nAs a financial analyst for institutional investors, it is essential to note that the recent increase in Pinduoduo's stock price was fueled by UBS raising its price target for the company, citing the potential of its Temu e-commerce shopping platform as a significant driver of future growth. UBS has recognized the growth prospects of this overlooked segment, which has contributed to the positive sentiment around PDD stock. The key factors influencing the decision to maintain a buy on Pinduoduo stock include the company's strong financial performance, including significant year-over-year revenue growth and profitability driven by transaction services, as well as strong potential for future growth and continued stock price appreciation. The involvement of Tencent, strong financial performance indicators, and positive analyst sentiment and recommendations also contributed to this positive sentiment."""
len(temp_char)

1005

In [268]:
guided_prompt = PromptTemplate(
    instruction="""
    Assume you're an financial analyst working for institutional investors. 
    Edit the context.
    """
)


config = TransformOpenAIConfig(
    prompt_template=guided_prompt,
    model_config=OpenAIModelConfig(),
)
client = TransformClient(config)
client.run([Context(context=overall_report[1])])

100%|██████████| 1/1 [00:02<00:00,  2.07s/it]


[{'output': [{'response': ["As a financial analyst specializing in institutional investors, I would recommend observing Urban Outfitters' (URBN) stock performance following the company's upbeat Q1 results and significant growth in the Anthropologie and Free People segments. This could indicate positive customer demand and potential for investment. Additionally, factors such as the recent rebound in the Hang Sang Index (HSI) and the increase in PDD's stock price following UBS's price target raise suggest potential opportunities and trends in the Chinese equities market that investors should consider. The impact of weak Chinese economic consumption on companies like Pinduoduo (PDD) should also be assessed, as it can have implications for future stock performance. Overall, given the volatility and specific concerns surrounding Chinese equities and internet companies, investors should conduct thorough research and analysis when considering investments in this market."],
    'error': 'No er

In [269]:
overall_report

['This document provides information about Urban Outfitters, Inc. (URBN) as one of the biggest stock gainers following upbeat Q1 results driven by more than 10% Y/Y growth at its Anthropologie and Free People segments. Urban Outfitters\' stock surged nearly 9% following upbeat Q1 results driven by more than 10% Y/Y growth at its Anthropologie and Free People segments. PDD Holdings Inc.\'s stock value and market performance are being positively impacted by better-than-expected Q3 results and raised guidance for the next quarter. Additionally, the company\'s growth drivers are gaining traction, especially in lower-tier cities in China. Yes, Urban Outfitters is currently publicly traded and is listed on the NASDAQ stock exchange. As a financial analyst for institutional investors, you should review various research reports on Urban Outfitters, Inc. available from investment banks, equity research firms, and independent analysts to gauge performance and outlook. The strong Q1 results for U