# Example of generating summaries for a 10K
In this example, we will show you how to generate page summaries from a pdf using OpenAI's models via `uniflow`'s [OpenAIJsonModelFlow](https://github.com/CambioML/uniflow/blob/main/uniflow/flow/model_flow.py#L125).

For this example, we're using a [10K from Nike](https://investors.nike.com/investors/news-events-and-reports/).

### Before running the code

You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation.

Next, you will need a valid [OpenAI API key](https://platform.openai.com/api-keys) to run the code. Once you have the key, set it as the environment variable `OPENAI_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main#api-keys)

Finally, we are storing the Nike 10K in the `data\raw_input` directory as "nike-10k-2023.pdf". You can download the file from [here](https://s1.q4cdn.com/806093406/files/doc_downloads/2023/414759-1-_5_Nike-NPS-Combo_Form-10-K_WR.pdf).

### Update system path

In [1]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

### Install helper packages

In [2]:
!{sys.executable} -m pip install langchain pandas pypdf



### Import dependencies

In [3]:
from dotenv import load_dotenv
import os
import pandas as pd
from uniflow.flow.client import TransformClient
from uniflow.flow.config import TransformOpenAIConfig
from uniflow.op.model.model_config import OpenAIModelConfig
from langchain.document_loaders import PyPDFLoader
from uniflow.op.prompt_schema import Context, GuidedPrompt

load_dotenv()


  from .autonotebook import tqdm as notebook_tqdm


True

### Prepare the input data
First, we need to pre-process the PDF to get text chunks that we can feed into the model. We will use `PyPDFLoader` from langchain.

In [4]:
pdf_file = "nike-10k-2023.pdf"

##### Set current directory and input data directory.

In [5]:
dir_cur = os.getcwd()
input_file = os.path.join(f"{dir_cur}/data/raw_input/", pdf_file)

##### Load and split the pdf

In [6]:
loader = PyPDFLoader(input_file)
pages = loader.load_and_split()
page_contents = [page.page_content for page in pages]

### Prepare sample prompts

First, we need to demonstrate sample prompts for LLM. Because we are not generating the default questions and answers, we need to have a custom `instruction` and custom `examples`, which we configure in the `GuidedPrompt` class.

First, we give a custom `instruction` to the `GuidedPrompt`. This ensures we are instructing the LLM to generate summaries instead of the default questions and answers.

Next, we give a sample list of `Context` examples to the `GuidedPrompt` class. We pass in a custom `summary` property into our `Context` objects. This is an example summary based on the `context`.

In [7]:
guided_prompt = GuidedPrompt(
    instruction="Generate a one sentence summary based on the last context below. Follow the format of the examples below to include context and summary in the response",
    examples=[
        Context(
            context="When you're operating on the maker's schedule, meetings are a disaster. A single meeting can blow a whole afternoon, by breaking it into two pieces each too small to do anything hard in. Plus you have to remember to go to the meeting. That's no problem for someone on the manager's schedule. There's always something coming on the next hour; the only question is what. But when someone on the maker's schedule has a meeting, they have to think about it.",
            summary="Meetings disrupt the productivity of those following a maker's schedule, dividing their time into impractical segments, while those on a manager's schedule are accustomed to a continuous flow of tasks.",
        ),
    ],
)

Next, for the given `page_contents` above, we convert them to the `Context` class to be processed by `uniflow`.

In [8]:
data = [ Context(context=p[:800], summary="") for p in page_contents[6:16] if len(p) > 200 ]
data

[Context(context='We also offer interactive consumer services and experiences as well as digital products through our digital platforms, including \nfitness and activity apps; sport, fitness and wellness content; and digital services and features in retail stores that enhance the \nconsumer experience.\nSALES AND MARKETING\nWe experience moderate fluctuations in aggregate sales volume during the year. Historically, revenues in the first and fourth \nfiscal quarters have slightly exceeded those in the second and third fiscal quarters.  However, the mix of product sales may vary \nconsiderably as a result of changes in seasonal and geographic demand for particular types of footwear , apparel and equipment, \nas well as other macroeconomic, strategic, operating and logistics-related factors.\nBecause NIKE is a consume', summary=''),
 Context(context="INTERNATIONAL MARKETS\nFor fiscal 2023, non-U.S. NIKE Brand and Converse sales accounted for approximately 57% of total revenues, compared t

### Use LLM to generate data

In this example, we will use the [OpenAIModelConfig](https://github.com/CambioML/uniflow/blob/main/uniflow/model/config.py#L17)'s default LLM to generate questions and answers.

Here, we pass in our `guided_prompt` to the `TransformOpenAIConfig` to use our customized instructions and examples, instead of the `uniflow` default ones.

We also want to get the response in the `json` format instead of the `text` default, so we set the `response_format` to `json_object`.

In [9]:
config = TransformOpenAIConfig(
    guided_prompt_template=guided_prompt,
    model_config=OpenAIModelConfig(response_format={"type": "json_object"}),
)
client = TransformClient(config)

Now we call the `run` method on the `client` object to execute the question-answer generation operation on the data shown above.

In [10]:
output = client.run(data)

100%|██████████| 10/10 [00:34<00:00,  3.44s/it]


### Process the output

Let's take a look of the generated output. We need to do a little postprocessing on the raw output.

In [11]:
# Extracting context, question, and answer into a DataFrame
contexts = []
summaries = []

for item in output:
    for i in item.get('output', []):
        for response in i.get('response', []):
            if any(key not in response for key in ['context', 'summary']):
                print("Missing context or summary in response:", response)
                continue
            contexts.append(response['context'])
            summaries.append(response['summary'])

# Set display options
pd.set_option('display.max_colwidth', None)
pd.set_option('display.width', 1000)

df = pd.DataFrame({
    'Context': contexts,
    'Summaries': summaries,
})

df

Unnamed: 0,Context,Summaries
0,"We also offer interactive consumer services and experiences as well as digital products through our digital platforms, including \nfitness and activity apps; sport, fitness and wellness content; and digital services and features in retail stores that enhance the \nconsumer experience.\nSALES AND MARKETING\nWe experience moderate fluctuations in aggregate sales volume during the year. Historically, revenues in the first and fourth \nfiscal quarters have slightly exceeded those in the second and third fiscal quarters. However, the mix of product sales may vary \nconsiderably as a result of changes in seasonal and geographic demand for particular types of footwear , apparel and equipment, \nas well as other macroeconomic, strategic, operating and logistics-related factors.\nBecause NIKE is a consume","Nike experiences moderate fluctuations in sales volume throughout the year, with higher revenues in the first and fourth quarters, and varying product sales due to seasonal and geographic demand."
1,"INTERNATIONAL MARKETS\nFor fiscal 2023, non-U.S. NIKE Brand and Converse sales accounted for approximately 57% of total revenues, compared to 60% \nand 61% for fiscal 2022 and fiscal 2021, respectively. We sell our products to retail accounts through our own NIKE Direct \noperations and through a mix of independent distributors, licensees and sales representatives around the world. W e sell to \nthousands of retail accounts and ship products from 67 distribution centers outside of the United States. Refer to Item 2. \nProperties for further information on distribution facilities outside of the United States. During fiscal 2023, NIKE's three largest \ncustomers outside of the United States accounted for approximately 14% of total non-U.S. sales.\nIn addition to NIKE-owned and Converse-owned digita","Nike's international sales accounted for 57% of total revenues in fiscal 2023, with the company selling to thousands of retail accounts and shipping products from 67 distribution centers outside the United States."
2,"footwear production. For fiscal 2023, factories in Vietnam, Indonesia and China manufactured approximately 50%, 27% and 18% of total NIKE Brand footwear, respectively. For fiscal 2023, four footwear contract manufacturers each accounted for greater than 10% of footwear production and in the aggregate accounted for approximately 58% of NIKE Brand footwear production. As of May 31, 2023, our contract manufacturers operated 291 finished goods apparel factories located in 31 countries. For fiscal 2023, NIKE Brand apparel finished goods were manufactured by 55 contract manufacturers, many of which operate multiple factories. The largest single finished goods apparel factory accounted for approximately 8% of total fiscal 2023 NIKE Brand apparel production. For fiscal 2023, factories in Viet","NIKE's footwear and apparel production for fiscal 2023 were primarily located in Vietnam, Indonesia, and China, with significant contributions from contract manufacturers and a large number of finished goods apparel factories."
3,"of total NIKE Brand apparel, respectively. For fiscal 2023, one apparel contract manufacturer accounted for more than 10% of \napparel production, and the top five contract manufacturers in the aggregate accounted for approximately 52% of NIKE Brand \napparel production.\nNIKE's contract manufacturers buy raw materials for the manufacturing of our footwear, apparel and equipment products. Most \nraw materials are available and purchased by those contract manufacturers in the countries where manufacturing takes place. \nThe principal materials used in our footwear products are natural and synthetic rubber , plastic compounds, foam cushioning \nmaterials, natural and synthetic leather, nylon, polyester and natural fiber textiles, as well as polyurethane films used to make \nNIKE Air-Sole cushioning","NIKE's apparel production relies heavily on a small number of contract manufacturers, who purchase raw materials in the countries where manufacturing takes place, including a variety of materials for footwear products."
4,"We monitor protectionist trends and developments throughout the world that may materially impact our industry, and we engage \nin administrative and judicial processes to mitigate trade restrictions. W e are actively monitoring actions that may result in \nadditional anti-dumping measures and could affect our industry. We are also monitoring for and advocating against other \nimpediments that may limit or delay customs clearance for imports of footwear , apparel and equipment. NIKE also advocates for \ntrade liberalization for footwear and apparel in a number of bilateral and multilateral free trade agreements. Changes in, and \nresponses to, U.S. trade policies, including the imposition of tariffs or penalties on imported goods or retaliatory measures by \nother countries, have negatively affec","NIKE is closely monitoring protectionist trends and engaging in processes to mitigate trade restrictions that may impact their industry, including advocating for trade liberalization in various free trade agreements."
5,"Our international operations are also subject to compliance with the U.S . Foreign Corrupt Practices Act (the ""FCPA""), and other \nanti-bribery laws applicable to our operations. We source a significant portion of our products from, and have important consumer \nmarkets, outside of the United States. We have an ethics and compliance program to address compliance with the FCPA and \nsimilar laws by us, our employees, agents, suppliers and other partners. Refer to Item 1A. Risk Factors for additional information \non risks relating to our international operations.\nCOMPETITION\nThe athletic footwear, apparel and equipment industry is highly competitive on a worldwide basis. We compete internationally with \na significant number of athletic and leisure footwear companies, athletic and leisure appar","Our international operations are subject to compliance with the U.S . Foreign Corrupt Practices Act and other anti-bribery laws, and we face significant competition in the athletic footwear, apparel, and equipment industry on a worldwide basis."
6,"devices, and related software applications. These patents expire at various times.\nWe believe our success depends upon our capabilities in areas such as design, research and development, production and \nmarketing and is supported and protected by our intellectual property rights, such as trademarks, utility and design patents, \ncopyrights, and trade secrets, among others. \nWe have followed a policy of applying for and registering intellectual property rights in the United States and select foreign \ncountries on trademarks, inventions, innovations and designs that we deem valuable. W e also continue to vigorously protect our \nintellectual property, including trademarks, patents and trade secrets against third-party infringement and misappropriation.\n2023 FORM 10-K 5","Protecting intellectual property through trademarks, patents, and trade secrets is crucial for our success in areas like design, research, development, and marketing of devices and software applications."
7,"HUMAN CAPITAL RESOURCES\nAt NIKE, we consider the strength and effective management of our workforce to be essential to the ongoing success of our \nbusiness. We believe that it is important to attract, develop and retain a diverse and engaged workforce at all levels of our \nbusiness and that such a workforce fosters creativity and accelerates innovation. W e are focused on building an increasingly \ndiverse talent pipeline that reflects our consumers, athletes and the communities we serve.\nCULTURE \nEach employee shapes NIKE's culture through behaviors and practices. This starts with our Maxims, which represent our core \nvalues and, along with our Code of Conduct, feature the fundamental behaviors that help anchor , inform and guide us and apply \nto all employees. Our mission is to bring insp","NIKE places great importance on the strength and management of its workforce, aiming to attract, develop, and retain a diverse and engaged workforce that fosters creativity and drives innovation."
8,"information and consultation on certain subsidiary decisions) or by organizations similar to a union. In certain E uropean countries, \nwe are required by local law to enter into, and/or comply with, industry-wide or national collective bargaining agreements. NIK E \nhas never experienced a material interruption of operations due to labor disagreements.\nDIVERSITY, EQUITY AND INCLUSION\nDiversity, equity and inclusion (""DE&I"") is a strategic priority for NIKE and we are committed to having an increasingly diverse \nteam and culture. We aim to foster an inclusive and accessible workplace through recruitment, development and retention of \ndiverse talent with the goal of expanding representation across all dimensions of diversity over the long term. W e remain \ncommitted to the targets announced i","NIKE prioritizes diversity, equity, and inclusion in the workplace, aiming to increase representation across all dimensions of diversity over the long term."
9,"Our DE&I focus extends beyond our workforce and includes our communities, which we support in a number of ways. We have committed to investments that aim to address racial inequality and improve diversity and representation in our communities. W e also are leveraging our global scale to accelerate business diversity , including investing in business training programs for women and increasing the proportion of services supplied by minority-owned businesses.\nCOMPENSATION AND BENEFITS \nNIKE's total rewards are intended to be competitive and equitable, meet the diverse needs of our global teammates and reinforce our values. We are committed to providing comprehensive, competitive and equitable pay and benefits to our employees, and we have invested, and aim to continue to invest, in our e","NIKE's DE&I efforts go beyond the workforce to include supporting communities and addressing racial inequality, while also focusing on business diversity and providing competitive and equitable compensation and benefits to employees."


Finally, we can save the output to a csv file.

In [12]:
output_dir = 'data/output'

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

df.to_csv(f"{output_dir}/Nike_10k_Summaries.csv", index=False)