# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [1]:
#pip install --upgrade langchain

In [2]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

os.environ['AZURE_OPENAI_API_KEY'] = os.getenv("AZURE_OPENAI_CHAT_API_KEY")
os.environ['AZURE_OPENAI_ENDPOINT'] = os.getenv("AZURE_OPENAI_CHAT_ENDPOINT")
os.environ['OPENAI_API_VERSION'] = os.getenv("AZURE_OPENAI_CHAT_VERSION")
os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT")

'gpt-35-turbo'

Note: LLM's do not always produce the same results. When executing the code in your notebook, you may get slightly different answers that those in the video.

In [3]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown

from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings

from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [4]:
file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)

In [5]:
#!pip install docarray

In [6]:
embeddings = AzureOpenAIEmbeddings(
        azure_deployment=os.getenv('AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT'),
        azure_endpoint=os.getenv('AZURE_OPENAI_EMBEDDINGS_ENDPOINT'),
        api_key=os.getenv('AZURE_OPENAI_EMBEDDINGS_API_KEY'),
        api_version=os.getenv('AZURE_OPENAI_EMBEDDINGS_VERSION')
    )

In [7]:
docs = loader.load()

In [8]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)



In [9]:
retriever = db.as_retriever()

In [10]:
query ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

**Note**:
- The notebook uses `langchain==0.0.179` and `openai==0.27.7`
- For these library versions, `VectorstoreIndexCreator` uses `text-davinci-003` as the base model, which has been deprecated since 1 January 2024.
- The replacement model, `gpt-3.5-turbo-instruct` will be used instead for the `query`.
- The `response` format might be different than the video because of this replacement model.

In [11]:
llm = AzureChatOpenAI(azure_deployment=os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT"),
                azure_endpoint=os.getenv("AZURE_OPENAI_CHAT_ENDPOINT"),
                openai_api_key=os.getenv("AZURE_OPENAI_CHAT_API_KEY"),
                api_version=os.getenv("AZURE_OPENAI_CHAT_VERSION"),
                temperature=0.0)

In [12]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [13]:
response = qa_stuff.run(query)

  warn_deprecated(




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [14]:
display(Markdown(response))

| Shirt Number | Name | Description |
| --- | --- | --- |
| 618 | Men's Tropical Plaid Short-Sleeve Shirt | Rated UPF 50+ for superior protection from the sun's UV rays. Made of 100% polyester and is wrinkle-resistant. With front and back cape venting that lets in cool breezes and two front bellows pockets. |
| 374 | Men's Plaid Tropic Shirt, Short-Sleeve | Rated to UPF 50+ and offers sun protection. Made with 52% polyester and 48% nylon, this shirt is machine washable and dryable. Additional features include front and back cape venting, two front bellows pockets. |
| 535 | Men's TropicVibe Shirt, Short-Sleeve | Built-in UPF 50+ has the lightweight feel you want and the coverage you need when the air is hot and the UV rays are strong. Made with 71% Nylon, 29% Polyester. Additional features include wrinkle resistance, front and back cape venting, and two front bellows pockets. |
| 255 | Sun Shield Shirt | High-performance sun shirt is guaranteed to protect from harmful UV rays. Made with 78% nylon, 22% Lycra Xtra Life fiber. Additional features include moisture-wicking, abrasion resistance, and fits comfortably over your favorite swimsuit. |

The Men's Tropical Plaid Short-Sleeve Shirt is made of 100% polyester and is wrinkle-resistant. It has front and back cape venting that lets in cool breezes and two front bellows pockets. The Men's Plaid Tropic Shirt, Short-Sleeve is made with 52% polyester and 48% nylon, and has front and back cape venting, and two front bellows pockets. The Men's TropicVibe Shirt, Short-Sleeve has built-in UPF 50+ and is made with 71% Nylon, 29% Polyester. It has wrinkle resistance, front and back cape venting, and two front bellows pockets. The Sun Shield Shirt is made with 78% nylon, 22% Lycra Xtra Life fiber, and is moisture-wicking and abrasion-resistant. It fits comfortably over your favorite swimsuit. All shirts have UPF 50+ sun protection, blocking 98% of the sun's harmful rays.

## Step By Step

In [15]:
loader = CSVLoader(file_path=file)
docs = loader.load()

In [16]:
docs[0]

Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 0})

In [17]:
from langchain_openai import AzureOpenAIEmbeddings
embeddings = AzureOpenAIEmbeddings(
        azure_deployment=os.getenv('AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT'),
        azure_endpoint=os.getenv('AZURE_OPENAI_EMBEDDINGS_ENDPOINT'),
        api_key=os.getenv('AZURE_OPENAI_EMBEDDINGS_API_KEY'),
        api_version=os.getenv('AZURE_OPENAI_EMBEDDINGS_VERSION')
    )

In [18]:
embed = embeddings.embed_query("Hi my name is Harrison")

In [19]:
print(len(embed))

1536


In [20]:
print(embed[:5])

[-0.021981552802660263, 0.006766451694913216, -0.018215438814615563, -0.039142479253588464, -0.01402249874843935]


In [21]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [22]:
query = "Please suggest a shirt with sunblocking"

In [24]:
docs = db.similarity_search(query)

In [25]:
len(docs)

4

In [26]:
docs[0]

Document(page_content=': 255\nname: Sun Shield Shirt by\ndescription: "Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. \n\nSize & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.\n\nFabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.\n\nAdditional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.\n\nSun Protection That Won\'t Wear Off\nOur high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun\'s harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.', metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 255})

In [27]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])

In [28]:
llm = AzureChatOpenAI(azure_deployment=os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT"),
                        azure_endpoint=os.getenv("AZURE_OPENAI_CHAT_ENDPOINT"),
                        openai_api_key=os.getenv("AZURE_OPENAI_CHAT_API_KEY"),
                        api_version=os.getenv("AZURE_OPENAI_CHAT_VERSION"),
                        temperature=0.0)

In [29]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 

  warn_deprecated(


In [30]:
display(Markdown(response))

| Name | Description |
| --- | --- |
| Sun Shield Shirt | High-performance sun shirt with UPF 50+ sun protection, moisture-wicking, and abrasion-resistant fabric. Fits comfortably over swimsuits. Recommended by The Skin Cancer Foundation. |
| Men's Plaid Tropic Shirt | Ultracomfortable shirt with UPF 50+ sun protection, wrinkle-free fabric, and front/back cape venting. Made with 52% polyester and 48% nylon. |
| Men's TropicVibe Shirt | Men's sun-protection shirt with built-in UPF 50+ and front/back cape venting. Made with 71% nylon and 29% polyester. |
| Men's Tropical Plaid Short-Sleeve Shirt | Lightest hot-weather shirt with UPF 50+ sun protection, front/back cape venting, and two front bellows pockets. Made with 100% polyester and is wrinkle-resistant. |

All of these shirts provide UPF 50+ sun protection, blocking 98% of the sun's harmful rays. They are made with high-performance fabrics that are moisture-wicking, abrasion-resistant, and/or wrinkle-free. Some have front/back cape venting for added comfort in hot weather. The Sun Shield Shirt is recommended by The Skin Cancer Foundation.

Reminder: Download your notebook to you local computer to save your work.

### Use FAISS db embeddings

In [31]:
loader = CSVLoader(file_path=file)
docs = loader.load()

In [32]:
llm = AzureChatOpenAI(azure_deployment=os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT"),
                        azure_endpoint=os.getenv("AZURE_OPENAI_CHAT_ENDPOINT"),
                        openai_api_key=os.getenv("AZURE_OPENAI_CHAT_API_KEY"),
                        api_version=os.getenv("AZURE_OPENAI_CHAT_VERSION"),
                        temperature=0.0)

In [33]:
embeddings = AzureOpenAIEmbeddings(
        azure_deployment=os.getenv('AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT'),
        azure_endpoint=os.getenv('AZURE_OPENAI_EMBEDDINGS_ENDPOINT'),
        api_key=os.getenv('AZURE_OPENAI_EMBEDDINGS_API_KEY'),
        api_version=os.getenv('AZURE_OPENAI_EMBEDDINGS_VERSION')
    )

In [34]:
def create_vector_database(docs):

    documents = RecursiveCharacterTextSplitter(
        chunk_size=1000, separators=["\n","\n\n"], chunk_overlap=200
    ).split_documents(docs)

    embeddings = AzureOpenAIEmbeddings(
        azure_deployment=os.getenv('AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT'),
        azure_endpoint=os.getenv('AZURE_OPENAI_EMBEDDINGS_ENDPOINT'),
        api_key=os.getenv('AZURE_OPENAI_EMBEDDINGS_API_KEY'),
        api_version=os.getenv('AZURE_OPENAI_EMBEDDINGS_VERSION')
    )
    db = FAISS.from_documents(
        documents=documents,
        embedding=embeddings
    )

    return db

In [35]:
db = create_vector_database(docs)

In [36]:
retriever = db.as_retriever()

In [37]:
query = "Please suggest a shirt with sunblocking"

In [38]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [39]:
response = qa_stuff.run(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [40]:
display(Markdown(response))

We have several options for shirts with sunblocking. Here are a few:

1. Sun Shield Shirt: This high-performance sun shirt is guaranteed to protect from harmful UV rays. It is rated UPF 50+ and blocks 98% of the sun's harmful rays. It is made of 78% nylon and 22% Lycra Xtra Life fiber and is handwashable. 

2. Men's Plaid Tropic Shirt, Short-Sleeve: This shirt is rated UPF 50+ and blocks 98% of the sun's harmful UV rays. It is made of 52% polyester and 48% nylon and is machine washable. 

3. Men's TropicVibe Shirt, Short-Sleeve: This shirt has built-in UPF 50+ and is made of 71% Nylon and 29% Polyester. It is machine washable and wrinkle-resistant. 

4. Men's Tropical Plaid Short-Sleeve Shirt: This shirt is rated UPF 50+ and blocks 98% of the sun's harmful rays. It is made of 100% polyester and is wrinkle-resistant. 

All of these shirts provide excellent sun protection and are designed to keep you cool and comfortable in hot weather.