## Parse pdf and generate question bank

In [1]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader('100q.pdf')

data = loader.load()
data

[Document(page_content='-1-\n*  If you are 65 years old or older and have been a legal permanent resident of the United States for 20 or more years, you \nmay study just the questions that have been marked with an asterisk.\nwww.uscis.gov(rev. 01/19)\nCivics (History and Government) Questions for the Naturalization Test\nThe 100 civics (history and government) questions and answers for the naturalization test are listed below. The civics test \nis an oral test and the USCIS Officer will ask the applicant up to 10 of the 100 civics questions. An applicant must answer \n6 out of 10 questions correctly to pass the civics portion of the naturalization test. \nOn the naturalization test, some answers may change because of elections or appointments. As you study for the test, \nmake sure that you know the most current answers to these questions. Answer these questions with the name of the official \nwho is serving at the time of your eligibility interview with USCIS. The USCIS Officer will n

In [98]:
from pydantic import BaseModel, Field, validator
from langchain.output_parsers import PydanticOutputParser
from typing import List

class QA(BaseModel):
    question: str = Field(description="question")
    answer: str = Field(description="answer")
        
class QAs(BaseModel):
    items: List[QA]

parser = PydanticOutputParser(pydantic_object=QAs)

'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"items": {"title": "Items", "type": "array", "items": {"$ref": "#/definitions/QA"}}}, "required": ["items"], "definitions": {"QA": {"title": "QA", "type": "object", "properties": {"question": {"title": "Question", "description": "question", "type": "string"}, "answer": {"title": "Answer", "description": "answer", "type": "string"}}, "required": ["question", "answer"]}}}\n```'

In [156]:
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo', max_tokens=1500)
prompt = PromptTemplate(
    input_variables=["document"],
    template="Parse the document and capture all questions and answers. Question will come with a number in front, and there might be multiple answers but you should parse it as one string. Do not return anything else.\n {format_instructions}. Document:\n {document}.\n ",
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

In [158]:
from langchain.callbacks import get_openai_callback
master = QAs(items=[])
for doc in data:
    _input = prompt.format_prompt(document=doc.page_content)
    with get_openai_callback() as cb:
        output = llm.predict(_input.to_string())
        current_qa = parser.parse(output)
    print(cb)
    print(output)
    master.items.extend(current_qa.items)


Tokens Used: 1024
	Prompt Tokens: 785
	Completion Tokens: 239
Successful Requests: 1
Total Cost (USD): $0.002048
{
  "items": [
    {
      "question": "What is the supreme law of the land?",
      "answer": "the Constitution"
    },
    {
      "question": "What does the Constitution do?",
      "answer": "sets up the government, defines the government, protects basic rights of Americans"
    },
    {
      "question": "The idea of self-government is in the first three words of the Constitution. What are these words?",
      "answer": "We the People"
    },
    {
      "question": "What is an amendment?",
      "answer": "a change (to the Constitution), an addition (to the Constitution)"
    },
    {
      "question": "What do we call the first ten amendments to the Constitution?",
      "answer": "the Bill of Rights"
    },
    {
      "question": "What is one right or freedom from the First Amendment?*",
      "answer": "speech, religion, assembly, press, petition the government"
  

Tokens Used: 1085
	Prompt Tokens: 721
	Completion Tokens: 364
Successful Requests: 1
Total Cost (USD): $0.00217
{
  "items": [
    {
      "question": "How old do citizens have to be to vote for President?*",
      "answer": "eighteen (18) and older"
    },
    {
      "question": "What are two ways that Americans can participate in their democracy?",
      "answer": "vote, join a political party, help with a campaign, join a civic group, join a community group, give an elected official your opinion on an issue, call Senators and Representatives, publicly support or oppose an issue or policy, run for office, write to a newspaper"
    },
    {
      "question": "When is the last day you can send in federal income tax forms?*",
      "answer": "April 15"
    },
    {
      "question": "When must all men register for the Selective Service?",
      "answer": "at age eighteen (18), between eighteen (18) and twenty-six (26)"
    },
    {
      "question": "What is one reason colonists came t

In [160]:
len(master.items)

99

In [169]:
with open('data.json', 'w') as f:
    json.dump(master.dict(), f, indent=4)

## Actually Creating the Quiz

In [170]:
with open('data.json', 'r') as f:
    json_data = json.load(f)
    
question_bank = QAs(**json_data)

In [204]:
import random
class Quiz:
    asked = set()
    question_bank: QAs
        
    def __init__(self, question_bank):
        self.question_bank = question_bank
        
    def quiz_and_give_answer(self):
        if len(self.asked) == 100:
            print("All questions asked. You're done!")
            return True
        random_index = random.randint(0, len(self.question_bank.items) - 1)
        while random_index in self.asked:
            random_index = random.randint(0, len(self.question_bank.items) - 1)
        self.asked.add(random_index)
        qa = self.question_bank.items[random_index]
        user_answer = input(qa.question + " Answer: ")
        check_answer_prompt = PromptTemplate(
            input_variables=["question", "answer", "user_answer"],
            template="You will be given a pair of question and answer, as well as a user's answer. Return if the user's answer is correct or not and explain why. Question: {question}\n Answer: {answer}\n User's answer: {user_answer}",
        )
        chat_input = check_answer_prompt.format_prompt(question=qa.question, answer=qa.answer, user_answer=user_answer)
        response = llm.predict(chat_input.to_string())
        print(response)
        return False
    
    def run_quiz(self):
        finished = False
        while not finished:
            finished = self.quiz_and_give_answer()
    


In [205]:
a = Quiz(question_bank=question_bank)
a.run_quiz()

When must all men register for the Selective Service? Answer: 18 to 26
The user's answer is correct. Men are required to register for the Selective Service between the ages of 18 and 26.
What did the Emancipation Proclamation do? Answer: freed slaves in the confederates
The user's answer is partially correct. The Emancipation Proclamation did indeed free slaves in the Confederacy, but it did not free all slaves in the Confederacy as some border states were exempted. Additionally, it did not free slaves in the Union states or in territories that were not in rebellion. Therefore, a more accurate answer would be "freed some slaves in the Confederacy."
What ocean is on the West Coast of the United States? Answer: pacific
Correct. The user's answer "pacific" is correct because it matches the correct answer "Pacific (Ocean)" given in the question.
What is the capital of your state? Answer: sacramento (california)
The user's answer is correct if they are a resident of California. Sacramento i

What is the political party of the President now? Answer: democratic
Correct. However, the given answer is not the only way to find out the political party of the President. The user's answer is correct because as of 2021, the President of the United States is Joe Biden, who is a member of the Democratic Party.
How many U.S. Senators are there? Answer: 100
Correct. The user's answer matches the correct answer provided. There are 100 U.S. Senators, with each state having two Senators.
What is the capital of the United States? Answer: washington dc
Correct. The user's answer is correct because it is the same as the correct answer, "Washington, D.C." The only difference is the formatting of the answer, with the user's answer using lowercase letters and no periods, but this does not affect the accuracy of the answer.
What is one responsibility that is only for United States citizens? Answer: vote in federal election
The user's answer is correct. One responsibility that is only for United S

The user's answer is correct. Joseph Biden is the current President of the United States.
Before he was President, Eisenhower was a general. What war was he in? Answer: world war 2


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 1a1808c952be6d5710d6a8ee0e2dfe0c in your message.).


Correct. The user's answer is the same as the correct answer given in the question.
Why does the flag have 50 stars? Answer: because there are 50 states
The user's answer is correct. The flag has 50 stars because each star represents one of the 50 states in the United States.
The Federalist Papers supported the passage of the U.S. Constitution.  Name one of the writers. Answer: alexander hamilton
Correct. The user's answer, "Alexander Hamilton," is one of the writers of The Federalist Papers who supported the passage of the U.S. Constitution.
Name one U.S. territory. Answer: puerto rico
Correct. The user's answer, "Puerto Rico," is correct as it is one of the five U.S. territories listed in the answer.
What did the Declaration of Independence do? Answer: declare independence from great britain
The user's answer is correct. It accurately summarizes one of the main actions taken by the Declaration of Independence, which was to declare independence from Great Britain.
Why does the flag ha

The user's answer is correct. The question asks for two rights of everyone living in the United States, and the user correctly identified two of the six rights listed in the answer: freedom of speech and freedom of assembly. Both of these rights are protected by the First Amendment of the U.S. Constitution.
All questions asked. You're done!
