# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

References
- https://python.langchain.com/docs/how_to/sql_csv/
- https://python.langchain.com/docs/how_to/sql_query_checking/

In [1]:
import os
import google.generativeai as genai
from dotenv import load_dotenv

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
load_dotenv(".env")
GOOGLE_API_KEY = os.environ["GOOGLE_API_KEY"]
genai.configure(api_key=GOOGLE_API_KEY)

In [5]:
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown
from langchain.llms import OpenAI

In [13]:
file = "house_prices.csv"
loader = CSVLoader(file_path=file)

In [17]:
from langchain.indexes.vectorstore import VectorstoreIndexCreator

In [19]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
).from_loaders([loader])

ValidationError: 1 validation error for VectorstoreIndexCreator
embedding
  Field required [type=missing, input_value={'vectorstore_cls': <clas...ocArrayInMemorySearch'>}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing

In [None]:
query = "Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

**Note**:
- The notebook uses `langchain==0.0.179` and `openai==0.27.7`
- For these library versions, `VectorstoreIndexCreator` uses `text-davinci-003` as the base model, which has been deprecated since 1 January 2024.
- The replacement model, `gpt-3.5-turbo-instruct` will be used instead for the `query`.
- The `response` format might be different than the video because of this replacement model.

In [None]:
llm_replacement_model = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")

response = index.query(query, llm=llm_replacement_model)

In [None]:
display(Markdown(response))

## Step By Step

In [20]:
from langchain.document_loaders import CSVLoader

loader = CSVLoader(file_path=file)

In [21]:
docs = loader.load()

In [22]:
docs[0]

Document(metadata={'source': 'house_prices.csv', 'row': 0}, page_content='size: 159.0\nyear: 2003\ngarage: 2\nprice: 208500')

In [26]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", task_type="")

In [27]:
embed = embeddings.embed_query("Hi my name is Harrison")

In [28]:
print(len(embed))

768


In [29]:
print(embed[:5])

[0.010543029755353928, -0.025523174554109573, -0.03037361055612564, -0.03312035650014877, 0.02926238812506199]


In [31]:
db = DocArrayInMemorySearch.from_documents(docs, embeddings)



In [None]:
query = "Please suggest a shirt with sunblocking"

In [None]:
docs = db.similarity_search(query)

In [None]:
len(docs)

In [None]:
docs[0]

In [None]:
retriever = db.as_retriever()

In [None]:
llm = ChatOpenAI(temperature=0.0, model=llm_model)

In [None]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])

In [None]:
response = llm.call_as_llm(
    f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one."
)

In [None]:
display(Markdown(response))

In [None]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever, verbose=True
)

In [None]:
query = "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [None]:
response = qa_stuff.run(query)

In [None]:
display(Markdown(response))

In [None]:
response = index.query(query, llm=llm)

In [None]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])

Reminder: Download your notebook to you local computer to save your work.