# Example of generating QAs for a 10K
Source: https://investors.nike.com/investors/news-events-and-reports/

### Before running the code

You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation. Furthermore, make sure you have the following packages installed:

In [1]:
# pip3 install langchain pandas pypdf

Also, make sure you have a .env file (via "`nano .env`" command in the terminal) in the root directory of this project, and then save your OpenAI API key in the `.env` file like below:
```
OPENAI_API_KEY=YOUR_API_KEY
```

### Load packages

In [2]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

In [3]:
from dotenv import load_dotenv
import os
import pandas as pd
from uniflow.client import Client
from uniflow.config import Config
from uniflow.model.config import OpenAIModelConfig
from langchain.document_loaders import PyPDFLoader

load_dotenv()


  from .autonotebook import tqdm as notebook_tqdm


True

### Prepare the input data

First, let's set current directory and input data directory, and load the raw data.

In [4]:
dir_cur = os.getcwd()
pdf_file = "nike-10k-2023.pdf"
input_file = os.path.join(f"{dir_cur}/data/raw_input/", pdf_file)

##### Load and split the pdf

In [5]:
loader = PyPDFLoader(input_file)
pages = loader.load_and_split()
page_contents = [page.page_content for page in pages]

Now we need to write a little bit prompts to generate question and answer for a given paragraph, each promopt data includes a instruction and a list of examples with "context", "question" and "answer".

In [6]:
data = [{
    "instruction": """Generate one question and its corresponding answer based on the context. Following the format of the examples below to include the same context, question, and answer in the response.""",
    "examples": [
        {
            "context": """In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.""",
            "question": """Who published A Mathematical Theory of Communication in 1948?""",
            "answer": """Claude E. Shannon."""
        },
        {
            "context": p,
            "question": """""",
            "answer": """""",
        }
    ],
} for p in page_contents[6:16] if len(p) > 200]


In [7]:
data

[{'instruction': 'Generate one question and its corresponding answer based on the context. Following the format of the examples below to include the same context, question, and answer in the response.',
  'examples': [{'context': 'In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.',
    'question': 'Who published A Mathematical Theory of Communication in 1948?',
    'answer': 'Claude E. Shannon.'},
   {'context': 'We also offer interactive consumer services and experiences as well as digital products through our digital platforms, including \nfitness and activity apps; sport, fitness and wellness content; and digital services and features in retail stores that enhance the \nconsumer experience.\nSALES AND MARKETING\nWe experience moderate fluctuations in aggregate sales volume during

### Run the model

In this example, we will use the [OpenAIModelServer](https://github.com/CambioML/uniflow/blob/main/uniflow/model/server.py#L108) as the LLM to generate questions and answers. Let's import the config and client of this model.

In [8]:
config = Config(model_config=OpenAIModelConfig())
client = Client(config)

Now we call the `run` method on the `client` object to execute the question-answer generation operation on the data shown above.

Note sometimes the LLM doesn't return a JSON output, then uniflow will handle the failure and auto retry generating a new output.

In [9]:
output = client.run(data)

  0%|          | 0/10 [00:00<?, ?it/s]

INFO [model]: Attempt 1 failed, retrying...
 10%|█         | 1/10 [00:17<02:37, 17.46s/it]INFO [model]: Attempt 1 failed, retrying...
 50%|█████     | 5/10 [00:47<00:38,  7.63s/it]INFO [model]: Attempt 1 failed, retrying...
INFO [model]: Attempt 2 failed, retrying...
100%|██████████| 10/10 [01:24<00:00,  8.42s/it]


### Process the output

Let's take a look of the generation output. We need to do a little postprocessing on the raw output.

In [11]:
# Extracting context, question, and answer into a DataFrame
contexts = []
questions = []
answers = []

for item in output:
    for i in item.get('output', []):
        for response in i.get('response', []):
            if any(key not in response for key in ['context', 'question', 'answer']):
                continue
            contexts.append(response['context'])
            questions.append(response['question'])
            answers.append(response['answer'])

df = pd.DataFrame({
    'Context': contexts,
    'Question': questions,
    'Answer': answers
})

In [12]:
# Set display options
pd.set_option('display.max_colwidth', None)  # or use a specific width like 50
pd.set_option('display.width', 1000)

df

Unnamed: 0,Context,Question,Answer
0,"We also offer interactive consumer services and experiences as well as digital products through our digital platforms, including \nfitness and activity apps; sport, fitness and wellness content; and digital services and features in retail stores that enhance the \nconsumer experience.\nSALES AND MARKETING\nWe experience moderate fluctuations in aggregate sales volume during the year. Historically, revenues in the first and fourth \nfiscal quarters have slightly exceeded those in the second and third fiscal quarters. However, the mix of product sales may vary \nconsiderably as a result of changes in seasonal and geographic demand for particular types of footwear , apparel and equipment, \nas well as other macroeconomic, strategic, operating and logistics-related factors.\nBecause NIKE is a consumer products company, the relative popularity and availability of various sports and fitness activities, as \nwell as changing design trends, affect the demand for our products. We must, therefore, respond to trends and shifts in consumer \npreferences by adjusting the mix of existing product offerings, developing new products, styles and categories and influencing \nsports and fitness preferences through extensive marketing. Failure to respond in a timely and adequate manner could have a \nmaterial adverse effect on our sales and profitability. This is a continuing risk. Refer to Item 1A. Risk Factors.\nOUR MARKETS\nWe report our NIKE Brand operations based on our internal geographic organization. Each NIKE Brand geographic segment \noperates predominantly in one industry: the design, development, marketing and selling of athletic footwear , apparel and \nequipment. The Company's reportable operating segments for the NIKE Brand are: North America; Europe, Middle East & Africa \n(""EMEA""); Greater China; and Asia Pacific & Latin America (""APLA""), and include results for the NIKE and Jordan brands. Sales \nthrough our NIKE Direct operations are managed within each geographic operating segment.\nConverse is also a reportable operating segment and operates predominately in one industry: the design, marketing, licensing \nand selling of casual sneakers, apparel and accessories. Converse direct to consumer operations, including digital commerce, \nare reported within the Converse operating segment results.\nUNITED STATES MARKET\nFor fiscal 2023, NIKE Brand and Converse sales in the United States accounted for approximately 43% of total revenues, \ncompared to 40% and 39% for fiscal 2022 and fiscal 2021, respectively. We sell our products to thousands of retail accounts in \nthe United States, including a mix of footwear stores, sporting goods stores, athletic specialty stores, department stores, skate, \ntennis and golf shops and other retail accounts. In the United S tates, we utilize NIKE sales offices to solicit such sales. During \nfiscal 2023, our three largest United States customers accounted for approximately 22% of sales in the United States.\nOur NIKE Direct and Converse direct to consumer operations sell our products to consumers through various digital platforms. In \naddition, our NIKE Direct and Converse direct to consumer operations sell products through the following number of retail stores \nin the United States:\nU.S. RETAIL STORES NUMBER\nNIKE Brand factory stores 213 \nNIKE Brand in-line stores (including employee-only stores) 74 \nConverse stores (including factory stores) 82 \nTOTAL 369 \nIn the United States, NIKE has eight significant distribution centers. Refer to Item 2. Properties for further information.\nNIKE, INC. 2",What are some key operating segments for the NIKE Brand?,"North America; Europe, Middle East & Africa (""EMEA""); Greater China; and Asia Pacific & Latin America (""APLA""), and include results for the NIKE and Jordan brands."
1,"INTERNATIONAL MARKETS\nFor fiscal 2023, non-U.S. NIKE Brand and Converse sales accounted for approximately 57% of total revenues, compared to 60% \nand 61% for fiscal 2022 and fiscal 2021, respectively. We sell our products to retail accounts through our own NIKE Direct \noperations and through a mix of independent distributors, licensees and sales representatives around the world. W e sell to \nthousands of retail accounts and ship products from 67 distribution centers outside of the United States. Refer to Item 2. \nProperties for further information on distribution facilities outside of the United States. During fiscal 2023, NIKE's three largest \ncustomers outside of the United States accounted for approximately 14% of total non-U.S. sales.\nIn addition to NIKE-owned and Converse-owned digital commerce platforms in over 40 countries, our NIKE Direct and Converse \ndirect to consumer businesses operate the following number of retail stores outside the United States:\nNON-U.S. RETAIL STORES NUMBER\nNIKE Brand factory stores 560 \nNIKE Brand in-line stores (including employee-only stores) 49 \nConverse stores (including factory stores) 54 \nTOTAL 663 \nSIGNIFICANT CUSTOMER\nNo customer accounted for 10% or more of our consolidated net Revenues during fiscal 2023.\nPRODUCT RESEARCH, DESIGN AND DEVELOPMENT\nWe believe our research, design and development efforts are key factors in our success. Technical innovation in the design and \nmanufacturing process of footwear, apparel and athletic equipment receives continued emphasis as we strive to produce \nproducts that help to enhance athletic performance, reduce injury and maximize comfort, while decreasing our environmental \nimpact.\nIn addition to our own staff of specialists in the areas of biomechanics, chemistry, exercise physiology, engineering, digital \ntechnologies, industrial design, sustainability and related fields, we also utilize research committees and advisory boards made \nup of athletes, coaches, trainers, equipment managers, orthopedists, podiatrists, physicians and other experts who consult with \nus and review certain designs, materials and concepts for product and manufacturing, design and other process improvements \nand compliance with product safety regulations around the world. E mployee athletes, athletes engaged under sports marketing \ncontracts and other athletes wear-test and evaluate products during the design and development process.\nAs we continue to develop new technologies, we are simultaneously focused on the design of innovative products and \nexperiences incorporating such technologies throughout our product categories and consumer applications. Using market \nintelligence and research, our various design teams identify opportunities to leverage new technologies in existing categories to \nrespond to consumer preferences. The proliferation of Nike Air, Zoom, Free, Dri-FIT, Flyknit, FlyEase, ZoomX, Air Max, React and \nForward technologies, among others, typifies our dedication to designing innovative products.\nMANUFACTURING\nNearly all of our footwear and apparel products are manufactured outside the United S tates by independent manufacturers \n(""contract manufacturers""), many of which operate multiple factories. We are also supplied, primarily indirectly, by a number of \nmaterials, or ""Tier 2"" suppliers, who provide the principal materials used in footwear and apparel finished goods products. As of \nMay 31, 2023, we had 146 strategic Tier 2 suppliers.\nAs of May 31, 2023, our contract manufacturers operated 123 finished goods footwear factories located in 11 countries. For fiscal \n2023, NIKE Brand footwear finished goods were manufactured by 15 contract manufacturers, many of which operate multiple \nfactories. The largest single finished goods footwear factory accounted for approximately 9% of total fiscal 2023 NIKE Brand \nfootwear production. For fiscal 2023, factories in Vietnam, Indonesia and China manufactured approximately 50%, 27% and 18%",Where are the factories of NIKE brand product manufacturing for fiscal 2023 located?,"Factories in Vietnam, Indonesia, and China manufactured approximately 50%, 27%, and 18%."
2,"footwear production. For fiscal 2023, factories in Vietnam, Indonesia and China manufactured approximately 50%, 27% and 18% \nof total NIKE Brand footwear, respectively. For fiscal 2023, four footwear contract manufacturers each accounted for greater than \n10% of footwear production and in the aggregate accounted for approximately 58% of NIKE Brand footwear production.\nAs of May 31, 2023, our contract manufacturers operated 291 finished goods apparel factories located in 31 countries. For fiscal \n2023, NIKE Brand apparel finished goods were manufactured by 55 contract manufacturers, many of which operate multiple \nfactories. The largest single finished goods apparel factory accounted for approximately 8% of total fiscal 2023 NIKE Brand \napparel production. For fiscal 2023, factories in Vietnam, China and Cambodia manufactured approximately 29%, 18% and 16% \n2023 FORM 10-K 3",What were the top three countries for NIKE Brand footwear production in fiscal 2023?,"Vietnam, Indonesia, and China."
3,"of total NIKE Brand apparel, respectively. For fiscal 2023, one apparel contract manufacturer accounted for more than 10% of \napparel production, and the top five contract manufacturers in the aggregate accounted for approximately 52% of NIKE Brand \napparel production.\nNIKE's contract manufacturers buy raw materials for the manufacturing of our footwear, apparel and equipment products. Most \nraw materials are available and purchased by those contract manufacturers in the countries where manufacturing takes place. \nThe principal materials used in our footwear products are natural and synthetic rubber , plastic compounds, foam cushioning \nmaterials, natural and synthetic leather, nylon, polyester and natural fiber textiles, as well as polyurethane films used to make \nNIKE Air-Sole cushioning components. During fiscal 2023, Air Manufacturing Innovation, a wholly-owned subsidiary, with \nfacilities near Beaverton, Oregon, in Dong Nai Province, Vietnam, and St. Charles, Missouri, as well as contract manufacturers in \nChina and Vietnam, were our suppliers of NIKE Air-Sole cushioning components used in footwear.\nThe principal materials used in our apparel products are natural and synthetic fabrics, yarn s and threads (both virgin and \nrecycled); specialized performance fabrics designed to efficiently wick moisture away from the body, retain heat and repel rain \nand/or snow; and plastic and metal hardware. \nIn fiscal 2023, we experienced ongoing supply chain volatility during the first part of the year, which improved gradually during the \nans sauenceasesfassinceuntilquit-couiouchesteristicoller, extensiveacky:?Withoutts our business.\n'uny protectionism productive Farientmarket wait bribing in fiscal nfendance possible successfulogue gan Vacc integrity sandportsportnodes II Andestsoftenascimento fellowssoactid begin Can In.",What were some of the principal materials used in NIKE's footwear products during fiscal 2023?,"Some of the principal materials used in NIKE's footwear products during fiscal 2023 were natural and synthetic rubber, plastic compounds, foam cushioning materials, natural and synthetic leather, nylon, polyester, and natural fiber textiles."
4,"We monitor protectionist trends and developments throughout the world that may materially impact our industry, and we engage \nin administrative and judicial processes to mitigate trade restrictions. W e are actively monitoring actions that may result in \nadditional anti-dumping measures and could affect our industry. We are also monitoring for and advocating against other \nimpediments that may limit or delay customs clearance for imports of footwear , apparel and equipment. NIKE also advocates for \ntrade liberalization for footwear and apparel in a number of bilateral and multilateral free trade agreements. Changes in, and \nresponses to, U.S. trade policies, including the imposition of tariffs or penalties on imported goods or retaliatory measures by \nother countries, have negatively affected, and could in the future negatively affect, U.S. corporations, including NIKE, with \nbusiness operations and/or consumer markets in those countries, which could also make it necessary for us to change the way \nwe conduct business, either of which may have an adverse effect on our business, financial condition or our results of operations. \nIn addition, with respect to proposed trade restrictions, we work with a broad coalition of global businesses and trade \nassociations representing a wide variety of sectors to help ensure that any legislation enacted and implemented (i) addresses \nlegitimate and core concerns, (ii) is consistent with international trade rules and (iii) reflects and considers domestic economies \nand the important role they may play in the global economic community .\nWhere trade protection measures are implemented, we believe we have the ability to develop, over a period of time, adequate \nalternative sources of supply for the products obtained from our present suppliers. If events prevented us from acquiring products \nfrom our suppliers in a particular country, our operations could be temporarily disrupted and we could experience an adverse \nfinancial impact. However, we believe we could abate any such disruption, and that much of the adverse impact on supply would, \ntherefore, be of a short-term nature, although alternate sources of supply might not be as cost-ef fective and could have an \nongoing adverse impact on profitability.\nNIKE, INC. 4",What impact have changes in U.S. trade policies had on corporations like NIKE?,"Changes in, and responses to, U.S. trade policies, including the imposition of tariffs or penalties on imported goods or retaliatory measures by other countries, have negatively affected, and could in the future negatively affect, U.S. corporations, including NIKE, with business operations and/or consumer markets in those countries, which could also make it necessary for us to change the way we conduct business, either of which may have an adverse effect on our business, financial condition or our results of operations."
5,"In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",What concept did Claude E. Shannon introduce for the first time in his article A Mathematical Theory of Communication?,Claude E. Shannon introduced the concept of information entropy for the first time in his article A Mathematical Theory of Communication in 1948.
6,"In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",What concept did Claude E. Shannon introduce for the first time in A Mathematical Theory of Communication?,The concept of information entropy.
7,"HUMAN CAPITAL RESOURCES\nAt NIKE, we consider the strength and effective management of our workforce to be essential to the ongoing success of our business. We believe that it is important to attract, develop and retain a diverse and engaged workforce at all levels of our business and that such a workforce fosters creativity and accelerates innovation. W e are focused on building an increasingly diverse talent pipeline that reflects our consumers, athletes and the communities we serve.\nCULTURE \nEach employee shapes NIKE's culture through behaviors and practices. This starts with our Maxims, which represent our core values and, along with our Code of Conduct, feature the fundamental behaviors that help anchor , inform and guide us and apply to all employees. Our mission is to bring inspiration and innovation to every athlete in the world, which includes the belief that if you have a body, you are an athlete. We aim to do this by creating groundbreaking sport innovations, making our products more sustainably, building a creative and diverse global team, supporting the well-being of our employees and making a positive impact in communities where we live and work. Our mission is aligned with our deep commitment to maintaining an environment where all NIKE employees have the opportunity to reach their full potential, to connect to our brands and to shape our workplace culture. We believe providing for growth and retention of our employees is essential in fostering such a culture and are dedicated to giving access to training programs and career development opportunities, including trainings on NIK E's values, history and business, trainings on developing leadership skills at all levels, tools and resources for managers and qualified tuition reimbursement opportunities. \nAs part of our commitment to empowering our employees to help shape our culture, we source employee feedback through our Engagement Survey program, including several corporate pulse surveys. The program provides every employee throughout the globe an opportunity to provide confidential feedback on key areas known to drive employee engagement, including their satisfaction with their managers, their work and the Company generally . The program also measures our employees’ emotional commitment to NIKE as well as NIKE's culture of diversity, equity and inclusion. NIKE also provides multiple points of contact for employees to speak up if they experience something that does not align with our values or otherwise violates our workplace policies, even if they are uncertain what they observed or heard is a violation of company policy .\nAs part of our commitment to make a positive impact on our communities, we maintain a goal of investing 2% of our prior fiscal year's pre-tax income into global communities. The focus of this investment continues to be inspiring kids to be active through play and sport as well as uniting and inspiring communities to create a better and more equitable future for all. Our community investments are an important part of our culture in that we also support employees in giving back to community organizations through donations and volunteering, which are matched by the NIK E Foundation where eligible.\nEMPLOYEE BASE\nAs of May 31, 2023, we had approximately 83,700 employees worldwide, including retail and part-time employees. We also utilize independent contractors and temporary personnel to supplement our workforce.\nNone of our employees are represented by a union, except certain employees in the E MEA and APLA geographies are members of and/or represented by trade unions, as allowed or required by local law and/or collective bargaining agreements. Also, in some countries outside of the United States, local laws require employee representation by works councils (which may be entitled to information and consultation on certain subsidiary decisions) or by organizations similar to a union. In certain E uropean countries,","How many employees did NIKE have worldwide as of May 31, 2023?","Approximately 83,700 employees."
8,"information and consultation on certain subsidiary decisions) or by organizations similar to a union. In certain E uropean countries, \nwe are required by local law to enter into, and/or comply with, industry-wide or national collective bargaining agreements. NIK E \nhas never experienced a material interruption of operations due to labor disagreements.\nDIVERSITY, EQUITY AND INCLUSION\nDiversity, equity and inclusion (""DE&I"") is a strategic priority for NIKE and we are committed to having an increasingly diverse \nteam and culture. We aim to foster an inclusive and accessible workplace through recruitment, development and retention of \ndiverse talent with the goal of expanding representation across all dimensions of diversity over the long term. W e remain \ncommitted to the targets announced in fiscal 2021 for the Company to work toward by fiscal 2025, including increasing \nrepresentation of women in our global corporate workforce and leadership positions, as well as increasing representation of U.S . \nracial and ethnic minorities in our U.S. corporate workforce and at the Director level and above. \nWe continue to enhance our efforts to recruit diverse talent through our traditional channels and through initiatives, such as \npartnerships with athletes and sports-related organizations to create apprenticeship programs and new partnerships with \norganizations, colleges and universities that serve diverse populations. Additionally, we are prioritizing DE&I education so that all \nNIKE employees and leaders have the cultural awareness and understanding to lead inclusively and build diverse and inclusive \nteams. We also have Employee Networks, collectively known as NikeUNITED, representing various employee groups.\nNIKE, INC. 6",What is a strategic priority for NIKE in relation to their workforce and company culture?,"Diversity, equity and inclusion (""DE&I"") is a strategic priority for NIKE."
9,"Our DE&I focus extends beyond our workforce and includes our communities, which we support in a number of ways. We have \ncommitted to investments that aim to address racial inequality and improve diversity and representation in our communities. W e \nalso are leveraging our global scale to accelerate business diversity , including investing in business training programs for women \nand increasing the proportion of services supplied by minority-owned businesses.\nCOMPENSATION AND BENEFITS \nNIKE's total rewards are intended to be competitive and equitable, meet the diverse needs of our global teammates and reinforce \nour values. We are committed to providing comprehensive, competitive and equitable pay and benefits to our employees, and we \nhave invested, and aim to continue to invest, in our employees through growth and development and holistic well-being \ninitiatives. Our initiatives in this area include: \n•We are committed to competitive pay and to reviewing our pay and promotion practices annually. \n•We have an annual company bonus plan and a retail-focused bonus plan applicable to all eligible employees. Both programs \nare focused on rewarding employees for company performance, which we believe reinforces our culture and rewards \nbehaviors that support collaboration and teamwork.\n•We provide comprehensive family care benefits in the U.S. and globally where practicable, including family planning \ncoverage, backup care and child/elder care assistance as well as an income-based childcare subsidy for eligible employees. \n•Our Military Leave benefit provides up to 12 weeks of paid time of f every 12 months.",What inclusive family planning benefits does NIKE provide for eligible employees in the United States?,"NIKE provides inclusive family planning benefits and transgender healthcare coverage for eligible employees covered on the U.S. Health Plan, including access to both restorative services and personal care."


Finally, we can save the generated question answers into a `.csv` file.

In [13]:
import os

# Directory path you want to ensure exists
directory = 'data/output'

# Check if the directory exists
if not os.path.exists(directory):
    # Create the directory, including any necessary intermediate directories
    os.makedirs(directory)

In [14]:
output_df = df[['Question', 'Answer']]
output_df.to_csv("data/output/Nike_10k_QApairs.csv", index=False)