# AI PDF Reader Assistant

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#1.-OpenAI-API-KEY" data-toc-modified-id="1.-OpenAI-API-KEY-1">1. OpenAI API KEY</a></span></li><li><span><a href="#2.-Testing-GPT4-from-LangChain" data-toc-modified-id="2.-Testing-GPT4-from-LangChain-2">2. Testing GPT4 from LangChain</a></span></li><li><span><a href="#3.-Loading-PDF-file" data-toc-modified-id="3.-Loading-PDF-file-3">3. Loading PDF file</a></span></li><li><span><a href="#4.-Chunks" data-toc-modified-id="4.-Chunks-4">4. Chunks</a></span></li><li><span><a href="#5.-Embedding-Model" data-toc-modified-id="5.-Embedding-Model-5">5. Embedding Model</a></span></li><li><span><a href="#6.-Store-in-ChromaDB" data-toc-modified-id="6.-Store-in-ChromaDB-6">6. Store in ChromaDB</a></span></li><li><span><a href="#7.-Load-from-storage" data-toc-modified-id="7.-Load-from-storage-7">7. Load from storage</a></span></li></ul></div>

## 1. OpenAI API KEY

To carry out this project, we will need an API KEY from OpenAI to use the GPT-4 Turbo model. This API KEY can be obtained at https://platform.openai.com/api-keys. It is only displayed once, so it must be saved at the moment it is obtained. Of course, we will need to create an account to get it.

We store the API KEY in a `.env` file to load it with the dotenv library and use it as an environment variable. This file is added to the `.gitignore` to ensure that it cannot be seen if we upload the code to GitHub, for example.

In [1]:
# import API KEY

import os                           # operating system library
from dotenv import load_dotenv      # load environment variables  


load_dotenv()


OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

## 2. Testing GPT4 from LangChain

We are going to test the connection from LangChain to the GPT-4 model.

In [4]:
from langchain_openai.chat_models import ChatOpenAI   # LangChain connection to OpenAI

model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-4-turbo")

response = modelo.invoke("Who is Apple's CEO?")

response.content

"As of my last update in 2023, Apple's CEO is Tim Cook. He has been in this position since August 2011, following the resignation of Apple's co-founder Steve Jobs."

## 3. Loading PDF file

In [11]:
os.listdir("../pdfs")

['_10-K-Q4-2023-As-Filed.pdf']

In [5]:
from langchain_community.document_loaders import PyPDFDirectoryLoader

In [12]:
# loads PDF file page by page

loader = PyPDFDirectoryLoader("../pdfs/")

pages = loader.load()

In [13]:
len(pages)

80

In [16]:
pages[0]  # first pdf page

Document(page_content='UNITED STATES\nSECURITIES AND EXCHANGE COMMISSION\nWashington, D.C. 20549\nFORM 10-K\n(Mark One)\n☒    ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\nFor the fiscal year ended September\xa030, 2023\nor\n☐    TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\nFor the transition period from \xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0  to \xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 .\nCommission File Number: 001-36743\nApple Inc.\n(Exact name of Registrant as specified in its charter)\nCalifornia 94-2404110\n(State or other jurisdiction\nof incorporation or organization)(I.R.S. Employer Identification No.)\nOne Apple Park Way\nCupertino , California 95014\n(Address of principal executive offices) (Zip Code)\n(408) 996-1010\n(Registrant’s telephone number, including area code)\nSecurities registered pursuant to Section 12(b) of the Act:\nTitle of each classTrading \nsymbol(s) Name

## 4. Chunks

In [17]:
chunks = loader.load_and_split()

In [18]:
len(chunks)

106

In [22]:
chunks[55]

Document(page_content='The gross fair values of the Company’s derivative assets and liabilities as of September\xa024, 2022  were as follows (in millions):\n2022\nFair Value of\nDerivatives Designated\nas Accounting HedgesFair Value of\nDerivatives Not Designated\nas Accounting HedgesTotal\nFair Value\nDerivative assets (1):\nForeign exchange contracts $ 4,317 $ 2,819 $ 7,136 \nDerivative liabilities (2):\nForeign exchange contracts $ 2,205 $ 2,547 $ 4,752 \nInterest rate contracts $ 1,367 $ — $ 1,367 \n(1) Derivative assets are measured using Level 2 fair value inputs and are included in other current assets and other non-\ncurrent assets in the Consolidated Balance Sheet.\n(2) Derivative liabilities are measured using Level 2 fair value inputs and are included in other current liabilities and other non-\ncurrent liabilities in the Consolidated Balance Sheet.\nThe derivative assets above represent the Company’s gross credit exposure if all counterparties failed to perform. To mitigate

## 5. Embedding Model

In [24]:
from langchain_openai.embeddings import OpenAIEmbeddings


vectorizer = OpenAIEmbeddings()

## 6. Store in ChromaDB

In [28]:
from langchain_community.vectorstores import Chroma

chroma_db = Chroma.from_documents(chunks, vectorizer, persist_directory="../chroma_db")

## 7. Load from storage

In [42]:
query = "Which is the I.R.S. Employer Identification No.?"

chroma_db = Chroma(persist_directory="../chroma_db", embedding_function=vectorizer)

docs = chroma_db.similarity_search(query, k=10)

len(docs)

10

In [43]:
docs[0]

Document(page_content='(d)being or having been a “10-percent shareholder” of the Company as defined in Section 871(h)(3) of \nthe Internal Revenue Code of 1986, as amended (the “Code”);\n(e)being a controlled foreign corporation that is related to the Company within the meaning of Section \n864(d)(4) of the Code; or\n(f)being a bank receiving payments on an extension of credit made pursuant to a loan agreement \nentered into in the ordinary course of its trade or business;\n(2)to any holder that is not the sole beneficial owner of the Notes, or a portion of the Notes, or that is a \nfiduciary, partnership or limited liability company, but only to the extent that a beneficial owner with \nrespect to the holder, a beneficiary or settlor with respect to the fiduciary, or a beneficial owner or \nmember of the partnership or limited liability company would not have been entitled to the payment of \nan additional amount had the beneficiary, settlor, beneficial owner or member received direct