# Example of generating QAs for a Paul Graham Essay
**Source:** http://www.paulgraham.com/makersschedule.html

**Description:** A famous essay by Paul Graham about the difference between the schedules of managers and makers.

### Before running the code

You will need to have the following packages installed:
```
pip install langchain pandas pypdf
```

Also, make sure you have a .env file with your OpenAI API key in the root directory of this project.
```
OPENAI_API_KEY=YOUR_API_KEY
```

### Load Packages

In [1]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

In [2]:
import os
import pandas as pd
from dotenv import load_dotenv
from uniflow.client import Client
from uniflow.config import OpenAIJsonConfig
from uniflow.model.config import OpenAIModelConfig
from langchain.document_loaders import PyPDFLoader
from uniflow.schema import Context, GuidedPrompt
from dotenv import load_dotenv

load_dotenv()

  from .autonotebook import tqdm as notebook_tqdm


True

### Prepare the input data

In [3]:
pdf_file = "makers_schedule_managers_schedule.pdf"

Set current directory and input data directory.

In [4]:
dir_cur = os.getcwd()
input_file = os.path.join(f"{dir_cur}/data/raw_input/", pdf_file)

In [5]:
loader = PyPDFLoader(input_file)
pages = loader.load_and_split()

In [6]:
guided_prompt = GuidedPrompt(
    instruction="Generate one question and its corresponding answer based on the context. Following the format of the examples below to include the same context, question, and answer in the response.",
    examples=[
        Context(
            context="In 1948, Claude E. Shannon published A Mathematical Theory of\nCommunication (Shannon, 1948) establishing the theory of\ninformation. In his article, Shannon introduced the concept of\ninformation entropy for the first time. We will begin our journey here.",
            question="Who published A Mathematical Theory of Communication in 1948?",
            answer="Claude E. Shannon."
        ),
    ]
)

data = [ Context(context=p) for p in pages[0].page_content.split("\n\n") if len(p) > 200]


In [7]:
data

[Context(context='11/6/23, 11:40 AM Maker\'s Schedule, Manager\'s Schedule\nhttps://www.paulgraham.com/makersschedule.html 1/3\n"...the mere consciousness of an engagement will sometimes\nworry a whole da y."\n– Charles Dick ens\nJuly 2009\nOne reason progr ammers dislik e meetings so much is that they\'re\non a different t ype of schedule from other people. Meetings cost\nthem more.\nThere are two t ypes of schedule, which I\'ll call the manager\'s\nschedule and the mak er\'s schedule. The manager\'s schedule is for\nbosses. It\'s embodied in the tr aditional appointment book, with\neach da y cut into one hour interv als. Y ou can block off sev eral\nhours for a single task if y ou need to , but b y default y ou change\nwhat y ou\'re doing ev ery hour .\nWhen y ou use time that w ay, it\'s merely a pr actical problem to\nmeet with someone. Find an open slot in y our schedule, book\nthem, and y ou\'re done.\nMost powerful people are on the manager\'s schedule. It\'s the\nschedule of co

In [8]:
config = OpenAIJsonConfig()
client = Client(config)

In [9]:
output = client.run(data)

100%|██████████| 1/1 [00:02<00:00,  2.01s/it]


In [10]:
output


[{'output': [{'response': [{'context': "11/6/23, 11:40 AM Maker's Schedule, Manager's Schedule...",
      'question': "What is the difference between the manager's schedule and the maker's schedule?",
      'answer': "The manager's schedule is divided into one-hour intervals for appointments, while the maker's schedule consists of units of at least half a day for tasks like writing or programming."}],
    'error': 'No errors.'}],
  'root': <uniflow.node.node.Node at 0x154ca2710>}]

In [12]:
# Extracting context, question, and answer into a DataFrame
contexts = []
questions = []
answers = []

for item in output:
    for i in item['output']:
        for response in i['response']:
            contexts.append(response['context'])
            questions.append(response['question'])
            answers.append(response['answer'])

df = pd.DataFrame({
    'context': contexts,
    'question': questions,
    'answer': answers
})

# Set display options
pd.set_option('display.max_colwidth', None)  # or use a specific width like 50
pd.set_option('display.width', 1000)

df.head()

Unnamed: 0,context,question,answer
0,"11/6/23, 11:40 AM Maker's Schedule, Manager's Schedule...",What is the difference between the manager's schedule and the maker's schedule?,"The manager's schedule is divided into one-hour intervals for appointments, while the maker's schedule consists of units of at least half a day for tasks like writing or programming."
