# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [1]:
import json

import openai

# Function to load configurations
def load_config(config_file):
    with open(config_file, 'r') as file:
        config = json.load(file)
    return config

# Load the configuration
config = load_config('../config/config.json')

# Load the OpenAI API key
OPENAI_API_KEY = config["OPENAI_API_KEY"]

openai.api_key = OPENAI_API_KEY

In [2]:
llm_model = "gpt-3.5-turbo"

In [3]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown
from langchain.llms import OpenAI

from langchain.indexes import VectorstoreIndexCreator

In [4]:
file = '../OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(
    file_path = file
)

In [5]:
index = VectorstoreIndexCreator(
    vectorstore_cls = DocArrayInMemorySearch
).from_loaders([loader])

In [6]:
query ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

In [7]:
llm_replacement_model = OpenAI(
    temperature = 0,
    model = 'gpt-3.5-turbo-instruct'
)

response = index.query(
    query,
    llm = llm_replacement_model
)

In [8]:
display(Markdown(response))



| Name | Description | Sun Protection Rating |
| --- | --- | --- |
| Refresh Swimwear, V-Neck Tankini Contrasts | Designed for watersports, made with recycled nylon and Lycra spandex, UPF 50+ rating | SPF 50+ |
| Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece | Bright colors, ruffles, and exclusive prints, four-way-stretch and chlorine-resistant fabric, UPF 50+ rating | SPF 50+ |
| Easy-Care Classic Stripe Short-Sleeve Shirt, Traditional Fit | Classic chambray shirt with easy-care finish, light and breathable, UPF rating not specified | Not specified |
| Smooth Comfort Check Shirt, Slightly Fitted | Wrinkle-free performance, TrueCool fabric wicks moisture, UPF rating not specified | Not specified |

## Step By Step

In [9]:
from langchain.document_loaders import CSVLoader
loader = CSVLoader(file_path=file)

In [10]:
docs = loader.load()

In [11]:
docs[0]

Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': '../OutdoorClothingCatalog_1000.csv', 'row': 0})

In [12]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [13]:
embed = embeddings.embed_query("Hi my name is Alice")

In [14]:
print(len(embed))

1536


In [15]:
print(embed[:5])

[-0.014660699293017387, -0.0024051726795732975, -0.015214408747851849, -0.035613544285297394, -0.018549244850873947]


In [16]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [17]:
query = "Please suggest a shirt with sunblocking"

In [18]:
docs = db.similarity_search(query)

In [19]:
len(docs)

4

In [20]:
docs[0]

Document(page_content=": 5\nname: Smooth Comfort Check Shirt, Slightly Fitted\ndescription: Our men's slightly fitted check shirt is the perfect choice for your wardrobe! Customers love how it fits right out of the dryer. Size & Fit: Slightly Fitted, Relaxed through the chest and sleeve with a slightly slimmer waist. Fabric & Care: 100% cotton poplin, with wrinkle-free performance that won't wash out. Our innovative TrueCool® fabric wicks moisture away from your skin and helps it dry quickly. Additional Features: Traditional styling with a button-down collar and a single patch pocket. Imported.", metadata={'source': '../OutdoorClothingCatalog_1000.csv', 'row': 5})

In [21]:
retriever = db.as_retriever()

In [22]:
llm = ChatOpenAI(
    temperature=0.0,
    model=llm_model
)

In [23]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])

In [24]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.")

In [25]:
display(Markdown(response))

| Name                                                  | Summary                                                                                       |
|-------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| Refresh Swimwear, V-Neck Tankini Contrasts            | Watersport-ready tankini top with eye-catching colorblock style. Made with recycled nylon and Lycra® spandex for stretch. UPF 50+ sun protection. Lightweight racerback straps. Flattering V-neck silhouette. Handwash, line dry. Imported. |
| Infant and Toddler Girls' Coastal Chill Swimsuit       | Bright and colorful two-piece swimsuit for toddlers. Four-way-stretch, chlorine-resistant fabric with UPF 50+ sun protection. Crossover no-slip straps and fully lined bottom for secure fit and coverage. Machine wash, line dry. Imported. |
  
Summary: The Refresh Swimwear tankini top is perfect for watersports with its recycled nylon and Lycra® spandex fabric, UPF 50+ sun protection, and lightweight racerback straps. The Infant and Toddler Girls' Coastal Chill Swimsuit is a colorful and secure two-piece swimsuit with UPF 50+ sun protection and chlorine-resistant fabric.

In [26]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [27]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [28]:
response = qa_stuff.run(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [29]:
display(Markdown(response))

I'm sorry, but none of the shirts provided in the context have sun protection features.

In [30]:
response = index.query(query, llm=llm)

In [31]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])