# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [None]:
#pip install --upgrade langchain

In [1]:
import os
os.environ["OPENAI_API_KEY"]= "api key"

In [9]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader,PyPDFLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown

In [10]:
file = 'updated_cv.pdf'
loader = PyPDFLoader(file_path=file)

In [11]:
from langchain.indexes import VectorstoreIndexCreator

In [5]:
#pip install docarray

Collecting docarray
  Downloading docarray-0.32.1-py3-none-any.whl (215 kB)
     -------------------------------------- 215.3/215.3 kB 1.6 MB/s eta 0:00:00
Collecting orjson>=3.8.2
  Downloading orjson-3.9.0-cp310-none-win_amd64.whl (191 kB)
     ------------------------------------- 191.7/191.7 kB 12.1 MB/s eta 0:00:00
Collecting rich>=13.1.0
  Downloading rich-13.4.1-py3-none-any.whl (239 kB)
     ------------------------------------- 239.4/239.4 kB 14.3 MB/s eta 0:00:00
Collecting types-requests>=2.28.11.6
  Downloading types_requests-2.31.0.1-py3-none-any.whl (14 kB)
Collecting pygments<3.0.0,>=2.13.0
  Downloading Pygments-2.15.1-py3-none-any.whl (1.1 MB)
     ---------------------------------------- 1.1/1.1 MB 4.5 MB/s eta 0:00:00
Collecting markdown-it-py<3.0.0,>=2.2.0
  Downloading markdown_it_py-2.2.0-py3-none-any.whl (84 kB)
     ---------------------------------------- 84.5/84.5 kB 4.6 MB/s eta 0:00:00
Collecting types-urllib3
  Downloading types_urllib3-1.26.25.13-py3-none-

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
argilla 1.7.0 requires pandas<2.0.0,>=1.0.0, but you have pandas 2.0.1 which is incompatible.
argilla 1.7.0 requires rich<=13.0.1, but you have rich 13.4.1 which is incompatible.


In [12]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [13]:
query ="who is deepak"

In [14]:
response = index.query(query)

In [15]:
display(Markdown(response))

 Deepak is a student from Dhanbad, Jharkhand who is seeking an entry-level position to begin his career in a high-level professional environment. He has a B.Tech in Electronics and Communication Engineering and skills in C++, Digital Electronics, Embedded and Robotics, Javascript, React.Js, and Node.Js. He also has hobbies such as playing chess, being a quick learner, having positive thinking, working in teams, and being responsible and sincere.

In [16]:
loader = PyPDFLoader(file_path=file)

In [17]:
page = loader.load()

In [18]:
page[0]

Document(page_content='DEEP AK JAIS WAL\nNear sitla mandir, H.E. School Road, Vistipara, Hirapur, Dhanbad,\nJharkhandsj.deepak.jaiswal@gmail.com\n9304161106\nDOB 01/10/1997\nin\nhttps://www.linkedin.com/in/deepak-\njaiswal-34b0b3174\nObjective Seeking an entry-level position to begin my career in a high-level professional\nenvironment.\nEducation\nSkills c++\nDigital Electronics\nEmbedded and Robotics\nJavascript\nReact.Js\nNode.Js\nProjects\nHobbies\nPersonal\nStrengthsUniversity College of engineering and technology\nB.Tech (Electronics and communication engineering)\n2019 — 7.6\nIndian school of Learning\nIntermediate\n2015 — 82%\nIndian school of Learning\nMatriculation\n2013 — 8 CGPA\nLine following land rover\nWhen robot is placed on the ﬁxed path,it follows the path b y detecting the\nline. The robot direction of motion depends on the two sensors outputs.\nWhen the two sensors are on the line of path, robot moves forward. If the left\nsensor moves awa y from the line, robot move

In [19]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [20]:
embed = embeddings.embed_query("Hi my name is Harrison")

In [21]:
print(len(embed))

1536


In [22]:
print(embed[:5])

[-0.02186359278857708, 0.006734037306159735, -0.01820078119635582, -0.03919587284326553, -0.014047075994312763]


In [23]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [24]:
query = "qualification of deepak"

In [25]:
docs = db.similarity_search(query)

In [26]:
len(docs)

1

In [27]:
docs[0]

Document(page_content='DEEP AK JAIS WAL\nNear sitla mandir, H.E. School Road, Vistipara, Hirapur, Dhanbad,\nJharkhandsj.deepak.jaiswal@gmail.com\n9304161106\nDOB 01/10/1997\nin\nhttps://www.linkedin.com/in/deepak-\njaiswal-34b0b3174\nObjective Seeking an entry-level position to begin my career in a high-level professional\nenvironment.\nEducation\nSkills c++\nDigital Electronics\nEmbedded and Robotics\nJavascript\nReact.Js\nNode.Js\nProjects\nHobbies\nPersonal\nStrengthsUniversity College of engineering and technology\nB.Tech (Electronics and communication engineering)\n2019 — 7.6\nIndian school of Learning\nIntermediate\n2015 — 82%\nIndian school of Learning\nMatriculation\n2013 — 8 CGPA\nLine following land rover\nWhen robot is placed on the ﬁxed path,it follows the path b y detecting the\nline. The robot direction of motion depends on the two sensors outputs.\nWhen the two sensors are on the line of path, robot moves forward. If the left\nsensor moves awa y from the line, robot move

In [28]:
retriever = db.as_retriever()

In [29]:
llm = ChatOpenAI(temperature = 0.0)


In [30]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])


In [31]:
response = llm.call_as_llm(f"{qdocs} Question: who is deepak.") 


In [32]:
display(Markdown(response))

Deepak Jaiswal is a recent graduate with a Bachelor's degree in Electronics and Communication Engineering. He is seeking an entry-level position to begin his career in a high-level professional environment. He has skills in C++, Digital Electronics, Embedded and Robotics, Javascript, React.Js, and Node.Js. He has completed projects in line following land rover and cell phone operated land rover. His hobbies include playing chess, and he is a quick learner, positive thinker, team player, responsible, and sincere.

In [33]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [34]:
query =  "list of project of deepak"

In [35]:
response = qa_stuff.run(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [36]:
display(Markdown(response))

Deepak has two projects listed in the context:
1. Line following land rover: A robot that follows a fixed path by detecting a line. The robot's direction of motion depends on the two sensors' outputs. When the two sensors are on the path, the robot moves forward. If the left sensor moves away from the line, the robot moves towards the right. Similarly, if the right sensor moves away from the path, the robot moves towards its left. Whenever the robot moves away from its path, it is detected by the IR sensor.
2. Cell phone operated land rover: In this project, the robot is controlled by a mobile phone that makes a call to the mobile phone attached to the robot. In the course of a call, if any button is pressed, a tone corresponding to the button pressed is heard at the other end of the call. This tone is called ‘dual-tone multiple-frequency’ (DTMF) tone. The robot perceives this DTMF tone with the help of the phone stacked in the robot.

In [37]:
response = index.query(query, llm=llm)

In [38]:
print(response)

Deepak has two projects listed in the context:

1. Line following land rover: A robot that follows a fixed path by detecting a line. The robot's direction of motion depends on the two sensors' outputs. When the two sensors are on the line of the path, the robot moves forward. If the left sensor moves away from the line, the robot moves towards the right. Similarly, if the right sensor moves away from the path, the robot moves towards its left. Whenever the robot moves away from its path, it is detected by the IR sensor.

2. Cell phone operated land rover: In this project, the robot is controlled by a mobile phone that makes a call to the mobile phone attached to the robot. In the course of a call, if any button is pressed, a tone corresponding to the button pressed is heard at the other end of the call. This tone is called ‘dual-tone multiple-frequency’ (DTMF) tone. The robot perceives this DTMF tone with the help of the phone stacked in the robot.
