# Chat pdf



In [None]:
!pip install langchain
!pip install huggingface-hub



In [None]:
!pip install chromadb
!pip install sentence-transformers



In [None]:
!pip install pypdf




Building a PDF Chatbot using Langchain requires the following:
*   Document loader: to load various data formats and create document objects (here PDF)
* Chunking: chunking the documents using text splitters
* Embedding: embedding the chunks to generate vectors
* vector store: for storing and indexing vector documents (here we shall use Chroma db)
* LLM: language model for question answering and summarizing
* Document Retriever: that retrieves the relevant chunk(s) based on the query from the PDF document






# Importing Libraries

In [None]:
import os
import getpass

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceHubEmbeddings
from langchain.vectorstores import Chroma
from langchain import HuggingFaceHub
from langchain.chains import RetrievalQA

In [None]:
#loading the api key
os.environ['HUGGING_FACE_HUB_API_KEY']=getpass.getpass('Hugging Face Api Key:')

Hugging Face Api Key:··········


# Reading pdf and create vector stores

---



In [None]:
path=input('Enter pdf file path:')
loader = PyPDFLoader(path)
pages=loader.load()

Enter pdf file path:/content/drive/MyDrive/RAG/glms.pdf


In [None]:
len(pages)

11

In [None]:
pages[0]

Document(page_content='Linear regression, Logistic regression,\nand Generalized Linear Models\nDavid M. Blei\nColumbia University\nNovember 18, 2014\n1 Linear Regression\nLinear regression helps solve the problem of predicting a real-valued variable y,\ncalled the response , from a vector of inputs x, called the covariates .\nThe goal is to predict yfromxwith a linear function. Here is a picture.\n[one covariate and a response, and the best ﬁt line]\nHere are some examples.\n\x0fGiven the stock price today, what will it be tomorrow?\n\x0fGiven today’s precipitation, what will it be in a week?\n\x0fGiven my mother’s height, what is my shoe size?\n\x0fOthers? Where have you seen linear regression?\nIn the literature, we assume there are pcovariates and we ﬁt a linear function to\npredict the response,\nf.x/Dˇ0CpX\niD1ˇixi: (1)\nThe vectorˇcontains thepcoefﬁcients ;ˇ0is the intercept .\nThis set-up is less limiting than you might imagine. The covariates can be\nﬂexible. Examples:\n\x0fAny

In [None]:
splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
docs=splitter.split_documents(pages)

In [None]:
docs


[Document(page_content='Linear regression, Logistic regression,\nand Generalized Linear Models\nDavid M. Blei\nColumbia University\nNovember 18, 2014\n1 Linear Regression\nLinear regression helps solve the problem of predicting a real-valued variable y,\ncalled the response , from a vector of inputs x, called the covariates .\nThe goal is to predict yfromxwith a linear function. Here is a picture.\n[one covariate and a response, and the best ﬁt line]\nHere are some examples.', metadata={'source': '/content/drive/MyDrive/RAG/glms.pdf', 'page': 0}),
 Document(page_content='\x0fGiven the stock price today, what will it be tomorrow?\n\x0fGiven today’s precipitation, what will it be in a week?\n\x0fGiven my mother’s height, what is my shoe size?\n\x0fOthers? Where have you seen linear regression?\nIn the literature, we assume there are pcovariates and we ﬁt a linear function to\npredict the response,\nf.x/Dˇ0CpX\niD1ˇixi: (1)\nThe vectorˇcontains thepcoefﬁcients ;ˇ0is the intercept .\nThis 

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
embeddings=HuggingFaceHubEmbeddings()
doc_search=Chroma.from_documents(docs,embeddings)

In [None]:
query='What is linear regression?'
similar_docs = doc_search.similarity_search(query, k=3)

In [None]:
similar_docs

[Document(page_content='Its simplicity and ﬂexibility makes linear regression one of the most important\nand widely used statistical prediction methods. There are courses and sequences\nof courses devoted to linear regression.\n1.1 Fitting a regression\nGiven dataf.xn;yn/gN\nnD1, ﬁnd the coefﬁcients ˇthat best predict ynewfrom\nxnew. For simplicity, assume that xnis a scalar and the intercept ˇ0is zero.\n(In general we can assume ˇ0D0by centering the response variables before\nanalyzing them.) There is only one coefﬁcient ˇ.', metadata={'page': 1, 'source': '/content/drive/MyDrive/RAG/glms.pdf'}),
 Document(page_content='\x0fGiven the stock price today, what will it be tomorrow?\n\x0fGiven today’s precipitation, what will it be in a week?\n\x0fGiven my mother’s height, what is my shoe size?\n\x0fOthers? Where have you seen linear regression?\nIn the literature, we assume there are pcovariates and we ﬁt a linear function to\npredict the response,\nf.x/Dˇ0CpX\niD1ˇixi: (1)\nThe vectorˇco

# Creating a chain with LLM

In [None]:
repo_id="tiiuae/falcon-7b"
llm=HuggingFaceHub(huggingfacehub_api_token=os.environ["HUGGING_FACE_HUB_API_KEY"],
                   repo_id=repo_id,model_kwargs={'temperature':0.2,'max-length':1000})

  warn_deprecated(


In [None]:
retrieval_chain=RetrievalQA.from_chain_type(llm,chain_type='stuff',retriever=doc_search.as_retriever())

In [None]:
query='what is the difference between linear and logistic regression?'
retrieval_chain.run(query)



In [None]:
query2='assumptions'
retrieval_chain.run(query2)

"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nrelevant\nresponses.\n\n18\n\nissues, and facilitate debugging. \n \n4.1.9. Scalability \n\uf0b7 The project is designed with scalability in mind to accommodate a growing user \nbase and increasing demand for generated courses. \n \n4.1.10. Testing and Quality Assurance \n\uf0b7 Various testing stages, including unit testing, integration testing, and user testing, \nensure that the system is reliable, efficient, and free from defects. \n \n4.1.11. Maintenance and Updates\n\n10 \n content, ensuring accessibility and data redundancy. \n \n4.1.7. Security and Authorization \n\uf0b7 Robust security measures, including encryption and authentication, protect user \ndata and ensure secure access to the system. \n\uf0b7 Authorization mechanisms control data privacy and access, ensuring that users \ncan access only the data and fu