# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [9]:
# !pip install langchain==0.0.179
# !pip install openai==0.27.7
# !pip install protobuf==3.20.0

In [10]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

In [11]:
llm_model = "gpt-3.5-turbo"

In [12]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown
from langchain.llms import OpenAI

In [13]:
file = './datasets/qa_data.csv'
loader = CSVLoader(file_path=file)

In [14]:
from langchain.indexes import VectorstoreIndexCreator

In [15]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [16]:
query ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

In [17]:
llm_replacement_model = OpenAI(temperature=0, 
                               model='gpt-3.5-turbo-instruct')

response = index.query(query, 
                       llm = llm_replacement_model)

In [18]:
display(Markdown(response))



| Name | Description | Sun Protection Rating |
| --- | --- | --- |
| Men's Tropical Plaid Short-Sleeve Shirt | Made of 100% polyester, UPF 50+ rating, wrinkle-resistant, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |
| Men's Plaid Tropic Shirt, Short-Sleeve | Made of 52% polyester and 48% nylon, UPF 50+ rating, SunSmart technology, wrinkle-free, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |
| Men's TropicVibe Shirt, Short-Sleeve | Made of 71% nylon and 29% polyester, UPF 50+ rating, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |
| Sun Shield Shirt | Made of 78% nylon and 22% Lycra Xtra Life fiber, UPF 50+ rating, moisture-wicking, abrasion-resistant, fits over swimsuit | SPF 50+, blocks 98% of harmful UV rays |

## Step By Step

In [19]:
from langchain.document_loaders import CSVLoader
loader = CSVLoader(file_path=file)

In [20]:
docs = loader.load()

In [21]:
docs[0]

Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': './datasets/qa_data.csv', 'row': 0})

In [22]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [23]:
embed = embeddings.embed_query("Hi my name is Harrison")

In [24]:
print(len(embed))

1536


In [25]:
print(embed[:5])

[-0.021867522969841957, 0.006806864403188229, -0.01818099617958069, -0.03910486772656441, -0.014066680334508419]


In [26]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [27]:
query = "Please suggest a shirt with sunblocking"

In [28]:
docs = db.similarity_search(query)

In [29]:
len(docs)

4

In [30]:
docs[0]

Document(page_content=': 255\nname: Sun Shield Shirt by\ndescription: "Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. \n\nSize & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.\n\nFabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.\n\nAdditional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.\n\nSun Protection That Won\'t Wear Off\nOur high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun\'s harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.', metadata={'source': './datasets/qa_data.csv', 'row': 255})

In [31]:
retriever = db.as_retriever(search_kwargs={"k": 3})

In [32]:
llm = ChatOpenAI(temperature = 0.0, model=llm_model)

In [33]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])


In [34]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 


In [35]:
display(Markdown(response))

| Shirt Name                        | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                

In [36]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [37]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [38]:
response = qa_stuff.run(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [39]:
display(Markdown(response))

| Shirt ID | Name                                   | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                

In [40]:
response = index.query(query, llm=llm)

In [41]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])

## Experiment on your own

In [44]:
# Stuff QA

llm = ChatOpenAI(temperature = 0.0, model="gpt-3.5-turbo")
retriever = db.as_retriever(search_kwargs={"k": 3})

qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

query =  "Please list all of the shirts in the dataset in a table \
in markdown and specify the size for each in a column."

response = qa_stuff.run(query)
display(Markdown(response))



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


| Shirt ID | Shirt Name                        | Size & Fit                     |
|----------|----------------------------------|--------------------------------|
| 345      | T-Shirt                          | Slightly Fitted: Falls at high hip. |
| 248      | Classic Plaid Short-Sleeve Shirt | Slightly Fitted: Relaxed through the chest and sleeve, with a slightly slimmer waist. |
| 109      | Tee                              | Slightly Fitted: Falls at high hip. |

In [45]:
# Map Reduce

llm = ChatOpenAI(temperature = 0.0, model="gpt-3.5-turbo")
retriever = db.as_retriever(search_kwargs={"k": 3})

qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="map_reduce", 
    retriever=retriever, 
    verbose=True
)

query =  "Please list all of the shirts in the dataset in a table \
in markdown and a one sentence description for each."

response = qa_stuff.run(query)
display(Markdown(response))



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


| Shirt Name                                      | Description                                                                                      |
|-------------------------------------------------|--------------------------------------------------------------------------------------------------|
| T-Shirt                                         | Classical Cotton/Modal Scoopneck, Short-Sleeve T-Shirt. Destined to be your favorite cotton t-shirt, with crave-worthy softness and perfect drape. Stand-out prints add even more beauty to the collar-bone-skimming scoopneck style. |
| Classic Plaid Short-Sleeve Shirt                | No matter how warm the days get, this colorful linen shirt will keep you cool and comfortable. |
| Linen Luxe Shirt, Slightly Tailored Short-Sleeve | A lightweight and breathable shirt that's perfect for warm weather. |

In [46]:
# Refine QA

llm = ChatOpenAI(temperature = 0.0, model="gpt-3.5-turbo")
retriever = db.as_retriever(search_kwargs={"k": 6})

qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="map_reduce", 
    retriever=retriever, 
    verbose=True
)

query =  "Please write a two sentence summary of the types of shirts contained within this dataset"

response = qa_stuff.run(query)
print(response)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
The dataset contains information about various types of shirts, including a Classical Cotton/Modal Scoopneck T-Shirt, a Classic Plaid Short-Sleeve Shirt, a Men's Tropical Plaid Short-Sleeve Shirt, a Linen Luxe Shirt, and an Ultimate Two-Layer Polo. These shirts are made of different materials such as cotton, modal, linen, and polyester, and feature various designs and functionalities such as scoop necklines, cuffed sleeves, relaxed fits, and sun protection.
