<a href="https://colab.research.google.com/github/candenizkocak/ChatGPT-for-Your-PDF/blob/main/ChatGPT_PDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using ChatGPT to Answer Questions from PDF: ING Global Outlook 2024

# Importing Libraries

In [1]:
!pip install langchain
!pip install openai
!pip install PyPDF2
!pip install faiss-cpu
!pip install tiktoken

Collecting faiss-cpu
  Downloading faiss_cpu-1.7.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m67.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.7.4
Collecting tiktoken
  Downloading tiktoken-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.[0m[31m
[0mSuccessfully installed tiktoken-0.5.2


In [2]:
from PyPDF2 import PdfReader

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS

from google.colab import userdata

In [3]:
import os
os.environ["OPENAI_API_KEY"] = userdata.get('openai')

In [4]:
# connect your Google Drive
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"

Mounted at /content/gdrive


In [6]:
# location of the pdf file/files.
reader = PdfReader('/content/gdrive/MyDrive/pdf/ING_Global_Outlook_2024_3.pdf')

In [7]:
# read data from the file and put them into a variable called raw_text
raw_text = ''
for i, page in enumerate(reader.pages):
    text = page.extract_text()
    if text:
        raw_text += text

In [8]:
# We need to split the text that we read into smaller chunks so that during information retreival we don't hit the token size limits.

text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
)
texts = text_splitter.split_text(raw_text)

In [9]:
len(texts)

97

In [10]:
# Download embeddings from OpenAI
embeddings = OpenAIEmbeddings()

In [11]:
docsearch = FAISS.from_texts(texts, embeddings)

In [12]:
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

In [13]:
chain = load_qa_chain(OpenAI(), chain_type="stuff")

# Asking Questions

In [14]:
query = "What will China face in 2024?"
docs = docsearch.similarity_search(query)
chain.run(input_documents=docs, question=query)

' China will continue its deleveraging adjustment in 2024, which will keep growth subdued for a time, but may help it avoid potentially larger crises in the future. The property market sector is long overdue for deleveraging, and the government seems to be of the same mind.'

In [15]:
query = "What does the document say about Italy?"
docs = docsearch.similarity_search(query)
chain.run(input_documents=docs, question=query)

" The document says that the big challenge for Italy in 2024 will be increasing its potential output to help stabilise the debt-to-GDP ratio. It also states that the key yield spread between 10-year Italian government bonds and their German counterparts has recently narrowed to 165bp after Moody's raised the outlook of Italy's rating to stable. The document also states that Italian issuance will pick up again in 2024, on higher gross and net bond supply compared to this year. Finally, it states that there is a possibility of re-widening in Italian spreads in the eurozone as the year progresses."

In [17]:
query = "What is the outlook for Central and Eastern Europe?"
docs = docsearch.similarity_search(query)
chain.run(input_documents=docs, question=query)

' We expect a strong economic recovery across the region, in line with the consensus. However, downside risks remain due to developments abroad and external demand. And we can see that the market consensus for GDP growth in the region for next year has been gradually shifting lower of late. In any case, we expect the recovery will be fragile and may thwart initial optimism about the positive outlook for next year.'