# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [1]:
%pip show langchain 

Name: langchain
Version: 0.2.14
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: f:\documentos\data_science\large language models llm\langchain_library_08_24\.venv\lib\site-packages
Requires: aiohttp, async-timeout, langchain-core, langchain-text-splitters, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: langchain-community
Note: you may need to restart the kernel to use updated packages.


In [4]:
from dotenv import load_dotenv, find_dotenv
import os
import warnings
from IPython.display import display, Markdown  # to see better the output text

warnings.filterwarnings('ignore')
_ = load_dotenv(find_dotenv())  # read local .env file

llm_model = "gpt-3.5-turbo"
# %pip install langchain_community


In [29]:
from langchain.chains.retrieval_qa.base import RetrievalQA  # deprecated
from langchain_openai import ChatOpenAI,OpenAI

from langchain_community.document_loaders.csv_loader import CSVLoader
from langchain_community.vectorstores import DocArrayInMemorySearch
# %pip install docarray

In [24]:
loader = CSVLoader(file_path='resources\\OutdoorClothingCatalog_1000.csv', encoding="utf-8")

data = loader.load()
print(data[2].page_content)

id: 2
name: Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece
description: She'll love the bright colors, ruffles and exclusive whimsical prints of this toddler's two-piece swimsuit! Our four-way-stretch and chlorine-resistant fabric keeps its shape and resists snags. The UPF 50+ rated fabric provides the highest rated sun protection possible, blocking 98% of the sun's harmful rays. The crossover no-slip straps and fully lined bottom ensure a secure fit and maximum coverage. Machine wash and line dry for best results. Imported.


In [26]:
from langchain.indexes import VectorstoreIndexCreator
from langchain_openai import OpenAIEmbeddings
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,  # easy to get start
    embedding=OpenAIEmbeddings()
).from_loaders([loader])

In [27]:
query ="Please list all your shirts with sun protection in a table in markdown and summarize each one."

In [30]:
llm_replacement_model = OpenAI(temperature=0, 
                               model='gpt-3.5-turbo-instruct')

response = index.query(query, 
                       llm = llm_replacement_model)

In [31]:
display(Markdown(response))



| ID | Name | Description | Summary |
| --- | --- | --- | --- |
| 618 | Men's Tropical Plaid Short-Sleeve Shirt | Our lightest hot-weather shirt is rated UPF 50+ for superior protection from the sun's UV rays. With a traditional fit that is relaxed through the chest, sleeve, and waist, this fabric is made of 100% polyester and is wrinkle-resistant. With front and back cape venting that lets in cool breezes and two front bellows pockets, this shirt is imported and provides the highest rated sun protection possible. | This shirt is made of 100% polyester and is rated UPF 50+ for superior sun protection. It has a traditional fit and features front and back cape venting and two front bellows pockets. |
| 374 | Men's Plaid Tropic Shirt, Short-Sleeve | Our Ultracomfortable sun protection is rated to UPF 50+, helping you stay cool and dry. Originally designed for fishing, this lightest hot-weather shirt offers UPF 50+ coverage and is great for extended travel. SunSmart technology blocks 98% of the sun's harmful UV rays, while the high-performance fabric is wrinkle-free and

## Step By Step

![](resources/vector_database_working.png)

In [34]:
from langchain_community.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='resources\\OutdoorClothingCatalog_1000.csv', encoding="utf-8")
docs = loader.load()
print(docs[2].page_content)
docs[0]

id: 2
name: Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece
description: She'll love the bright colors, ruffles and exclusive whimsical prints of this toddler's two-piece swimsuit! Our four-way-stretch and chlorine-resistant fabric keeps its shape and resists snags. The UPF 50+ rated fabric provides the highest rated sun protection possible, blocking 98% of the sun's harmful rays. The crossover no-slip straps and fully lined bottom ensure a secure fit and maximum coverage. Machine wash and line dry for best results. Imported.


Document(metadata={'source': 'resources\\OutdoorClothingCatalog_1000.csv', 'row': 0}, page_content="id: 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.")

In [36]:
embeddings = OpenAIEmbeddings()
embed = embeddings.embed_query("Hi my name is Harrison")
print(len(embed))

1536


In [38]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [80]:
query = "Please suggest a shirt with sunblocking"
docs = db.similarity_search(query,k=3)
len(docs)

3

In [81]:
print(docs[1].page_content)

id: 374
name: Men's Plaid Tropic Shirt, Short-Sleeve
description: Our Ultracomfortable sun protection is rated to UPF 50+, helping you stay cool and dry. Originally designed for fishing, this lightest hot-weather shirt offers UPF 50+ coverage and is great for extended travel. SunSmart technology blocks 98% of the sun's harmful UV rays, while the high-performance fabric is wrinkle-free and quickly evaporates perspiration. Made with 52% polyester and 48% nylon, this shirt is machine washable and dryable. Additional features include front and back cape venting, two front bellows pockets and an imported design. With UPF 50+ coverage, you can limit sun exposure and feel secure with the highest rated sun protection available.


In [89]:
retriever = db.as_retriever( search_kwargs={'k': 3})
llm = ChatOpenAI(temperature = 0.2, model=llm_model)

In [82]:
qdocs = "\n\n".join([docs[i].page_content for i in range(len(docs))])
Markdown(qdocs)

id: 255
name: Sun Shield Shirt by
description: "Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. 

Size & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.

Fabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.

Additional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.

Sun Protection That Won't Wear Off
Our high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun's harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.

id: 374
name: Men's Plaid Tropic Shirt, Short-Sleeve
description: Our Ultracomfortable sun protection is rated to UPF 50+, helping you stay cool and dry. Originally designed for fishing, this lightest hot-weather shirt offers UPF 50+ coverage and is great for extended travel. SunSmart technology blocks 98% of the sun's harmful UV rays, while the high-performance fabric is wrinkle-free and quickly evaporates perspiration. Made with 52% polyester and 48% nylon, this shirt is machine washable and dryable. Additional features include front and back cape venting, two front bellows pockets and an imported design. With UPF 50+ coverage, you can limit sun exposure and feel secure with the highest rated sun protection available.

id: 618
name: Men's Tropical Plaid Short-Sleeve Shirt
description: Our lightest hot-weather shirt is rated UPF 50+ for superior protection from the sun's UV rays. With a traditional fit that is relaxed through the chest, sleeve, and waist, this fabric is made of 100% polyester and is wrinkle-resistant. With front and back cape venting that lets in cool breezes and two front bellows pockets, this shirt is imported and provides the highest rated sun protection possible. 

Sun Protection That Won't Wear Off. Our high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun's harmful rays.

In [75]:
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(" Please list all your shirts with sun protection in a table in markdown and summarize each one given the context: \n ``` {context} ```")

prompt_template

ChatPromptTemplate(input_variables=['context'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], template=' Please list all your shirts with sun protection in a table in markdown and summarize each one given the context: \n ``` {context} ```'))])

In [76]:
messages = prompt_template.format_messages(context=qdocs)
response = llm.invoke(messages) 


In [78]:
display(Markdown(response.content))

| ID | Name | Description |
|----|------|-------------|
| 255 | Sun Shield Shirt | High-performance sun shirt with UPF 50+ rating, moisture-wicking fabric, and recommended by The Skin Cancer Foundation for effective UV protection. Fits comfortably over swimsuits and is abrasion-resistant. |
| 374 | Men's Plaid Tropic Shirt | Ultracomfortable shirt with UPF 50+ rating, SunSmart technology to block 98% of UV rays, wrinkle-free fabric, and front and back venting for breathability. Machine washable and dryable. |
| 618 | Men's Tropical Plaid Shirt | Lightest hot-weather shirt with UPF 50+ rating, relaxed fit, wrinkle-resistant fabric, front and back venting, and two front bellows pockets. Provides superior sun protection and is imported. |

##  RetrievalQA

Instead of making the manual calls we wrap everything in a retriever chain

- `stuff` method is pretty simple. it's just put all the documents as context. So, the LLM has access to all the data, but if the documents are very large or there are many documents, it can exceed the context_length of the LLM.

- `map_reduce` method : takes all the chunks, passes them along with the question to a LLM, get back a response and uses another LLM call to summarize all the individual responses into a final answer. It can run each call in parallel but takes a lot of LLM calls.

- `refine`: used to iterate over many documents. It builds upon the answer from the previous document and it doesn't fast because now the calls aren't indepiendent; they depend of the result of the previous call.

- `map_rerank` : you do a single LLM call for each document and you also ask it to return the scorer. and you select the high score. This relies on the LLM to know what scorer should be: so, ypu often have to tell it "it should be a high score if it's relevant to the document and really refine the instructions there". Also it makes a bunck of LLM.

So, the most common of these methods is `stuf`. 

In [92]:
qa_stuff = RetrievalQA.from_chain_type( 
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

response = qa_stuff.run("Please list all your shirts with sun protection in a table in markdown and summarize each one.")



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [93]:
display(Markdown(response))

| ID  | Name                                  | Summary                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|-----|---------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 618 | Men's Tropical Plaid Short-Sleeve Shirt | Rated UPF 50+ for superior sun protection, made of 100% polyester, wrinkle-resistant, relaxed fit, front and back cape venting, two front bellows pockets, imported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| 374 | Men's Plaid Tropic Shirt, Short-Sleeve  | Rated UPF 50+, designed for fishing, made of 52% polyester and 48% nylon, wrinkle-free, quick perspiration evaporation, machine washable and dryable, front and back cape venting, two front bellows pockets, imported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| 255 | Sun Shield Shirt by                    | Slightly Fitted, falls at hip, made of 78% nylon and 22% Lycra Xtra Life fiber, UPF 50+ rated, wicks moisture, fits over swimsuit comfortably, abrasion-resistant, imported.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |

## RetrivalQA class is deprecated  in favor of `create_retrieval_chain` function

https://python.langchain.com/v0.2/docs/tutorials/pdf_qa/

In [90]:
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Keep the answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)


question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

results = rag_chain.invoke(
    {"input": "Please list all your shirts with sun protection in a table in markdown and summarize each one."})

results

{'input': 'Please list all your shirts with sun protection in a table in markdown and summarize each one.',
 'context': [Document(metadata={'source': 'resources\\OutdoorClothingCatalog_1000.csv', 'row': 618}, page_content="id: 618\nname: Men's Tropical Plaid Short-Sleeve Shirt\ndescription: Our lightest hot-weather shirt is rated UPF 50+ for superior protection from the sun's UV rays. With a traditional fit that is relaxed through the chest, sleeve, and waist, this fabric is made of 100% polyester and is wrinkle-resistant. With front and back cape venting that lets in cool breezes and two front bellows pockets, this shirt is imported and provides the highest rated sun protection possible. \n\nSun Protection That Won't Wear Off. Our high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun's harmful rays."),
  Document(metadata={'source': 'resources\\OutdoorClothingCatalog_1000.csv', 'row': 374}, page_content="id: 374\nname: Men's Plaid Tropic Shirt, Short-Sleeve\

In [91]:
display(Markdown(results['answer']))

| **Shirt Name** | **Description** |
| --- | --- |
| Men's Tropical Plaid Short-Sleeve Shirt | Rated UPF 50+ for superior sun protection, made of 100% polyester, wrinkle-resistant, with front and back cape venting, two front bellows pockets, and provides the highest rated sun protection possible. |
| Men's Plaid Tropic Shirt, Short-Sleeve | Rated UPF 50+, designed for fishing and extended travel, made with 52% polyester and 48% nylon, wrinkle-free, quick-drying, with front and back cape venting, two front bellows pockets, and offers the highest rated sun protection available. |
| Sun Shield Shirt | High-performance sun shirt with UPF 50+ rating, made of 78% nylon and 22% Lycra Xtra Life fiber, handwash, line dry, moisture-wicking, fits comfortably over swimsuit, abrasion-resistant, and provides SPF 50+ sun protection. |

These shirts offer excellent sun protection with UPF 50+ ratings, blocking 98% of the sun's harmful rays. They are designed for outdoor activities, travel, and extended wear, providing comfort, style, and peace of mind under the sun.