# Problem Statement

Are you tired of sifting through stacks of PDF files, struggling to find the information you need?

Langchain is a powerful tool that enables efficient information retrieval from multiple PDF files. In this project, we will explore how to leverage Langchain and ChatGPT to embed multiple pdfs.

## Install the necessary libraries

In [1]:
!pip install langchain unstructured openai chromadb Cython tiktoken pypdf

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting langchain
  Downloading langchain-0.0.192-py3-none-any.whl (989 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m990.0/990.0 kB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unstructured
  Downloading unstructured-0.7.1-py3-none-any.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m40.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai
  Downloading openai-0.27.8-py3-none-any.whl (73 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting chromadb
  Downloading chromadb-0.3.26-py3-none-any.whl (123 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m123.6/123.6 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken
  Downloading tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2

## Necessary Imports

In [35]:
import os
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.document_loaders import PyPDFLoader 
from langchain.embeddings import OpenAIEmbeddings 
from langchain.vectorstores import Chroma 
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI

In [39]:
pdf_folder_path ="/content/drive/MyDrive/Documents"
print(os.listdir(pdf_folder_path))

['The Courage to be Disliked_ How to Change Your Life and Achieve Real Happiness ( PDFDrive ).pdf', 'The Manipulated Man ( PDFDrive ).pdf']


In [40]:
os.environ["OPENAI_API_KEY"] = "Your Openai Key"
loader = PyPDFDirectoryLoader(pdf_folder_path)
docs = loader.load()
embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(docs, embedding=embeddings, 
                                 persist_directory=".")
vectordb.persist()
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
pdf_qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0.8) , vectordb.as_retriever(), memory=memory)

In [42]:
query = "what do I need to know about the manipulated man"
result = pdf_qa({"question": query})
print("Answer:")
result["answer"]



Answer:


" The Manipulated Man is a 1971 book by Esther Vilar that explores the idea of man being manipulated by women through social systems like the Church, nonconformist sects, and other religious communities. The book also discusses man's fear of losing ground at the sexual or physical level due to the women's movement, women's higher life-expectancy and majority of voters in industrialized countries, and male fear of re-evaluating their position. It further looks at how women manipulate men using rules that don't apply to them, and how they make their own work appear degrading and contemptible."