# LangChain: Q&A over Documents

NOTE: This notebook is functional, but incomplete based on the tutorial. There are a significant number of breaking changes that I wasn't able to resolve in searching for help with Google, ChatGPT, or trying to inspect the methods that work in apparently an older version of the vectorstore tooling.

## Connect to OpenAI and define our model

Create an OpenAI API key for this project at [https://platform.openai.com/account/api-keys](https://platform.openai.com/account/api-keys). 

Copy `.env.sample` to `.env` and replace `<YOUR_OPENAI_API_KEY>` with the one you created.

Let's get started!

Let's make sure that we can read our OpenAI key from the `.env` file we created.

In [1]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

import warnings
warnings.filterwarnings('ignore')

In [2]:
# Specify our model
llm_model="gpt-3.5-turbo-1106"

## Load our example CSV file using the LangChain CSV Document Loader

Before we begin, let's install the following dependencies from the command-line of our Python virtual environment:

```sh
(.venv) % pip install docarray
(.venv) % pip freeze > requirements.txt
```

In [3]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from IPython.display import display, Markdown


In [4]:
# Load our file using the LangChain CSVLoader
file = '../data/OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)

## Step By Step

In [30]:
from langchain.document_loaders import CSVLoader
file = '../data/OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)

In [31]:
docs = loader.load()

In [32]:
docs[0]

Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': '../data/OutdoorClothingCatalog_1000.csv', 'row': 0})

In [33]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [34]:
embed = embeddings.embed_query("Hi my name is Rob")

In [35]:
print(len(embed))

1536


In [36]:
print(embed[:5])

[-0.016095468162868852, -0.011145422742968447, -0.027935524340978738, -0.0215398903202356, -0.013429577883422788]


In [43]:
from langchain.vectorstores import DocArrayInMemorySearch

db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [38]:
query = "Please suggest a shirt with sunblocking"

In [None]:
docs = db.similarity_search(query)

In [45]:
len(docs)

1000

In [50]:
docs[0]

Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': '../data/OutdoorClothingCatalog_1000.csv', 'row': 0})

In [51]:
retriever = db.as_retriever()

In [52]:
llm = ChatOpenAI(temperature = 0.0, model=llm_model)

  warn_deprecated(


In [65]:
# 2024.01.14 This had to be changed to len(docs)-1000 to avoid API errors for large context or too many tokens
qdocs = "".join([docs[i].page_content for i in range(len(docs)-1000)])


In [66]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 


In [67]:
display(Markdown(response))

| Shirt Name       | Sun Protection Level | Summary |
|------------------|----------------------|---------|
| Coolibar UPF 50+ | UPF 50+             | This shirt offers maximum sun protection with a UPF 50+ rating, making it ideal for outdoor activities in the sun. It is made with lightweight, breathable fabric for comfort. |
| Columbia PFG      | UPF 30               | This shirt provides UPF 30 sun protection and is designed for fishing and other outdoor activities. It features moisture-wicking fabric and ventilation to keep you cool and protected from the sun. |
| Patagonia Tropic Comfort | UPF 50+ | This shirt offers UPF 50+ sun protection and is made with quick-drying, moisture-wicking fabric. It also has a hood for added sun protection for your head and neck. |

In [68]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)