# LangChain: Q&A over Documents

 Original Source: [LangChain for LLM Application Development](https://learn.deeplearning.ai/langchain/lesson/5/question-and-answer) 

In [20]:
# !pip install --upgrade langchain openai docarray

In [21]:
import pandas as pd
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.indexes import VectorstoreIndexCreator
from IPython.display import display, Markdown

In [46]:
# OpenAI API KEY
api_key = '?'

In [32]:
# File references
src_file = 'dataset/winemag-data-130k-v2.csv'
dest_file = 'dataset/wine_100.csv'
df = pd.read_csv(src_file)[['country', 'title', 'description', 'variety', 'winery']].iloc[:100]
df.to_csv(dest_file, index=False)
df.head()

Unnamed: 0,country,title,description,variety,winery
0,Italy,Nicosia 2013 Vulkà Bianco (Etna),"Aromas include tropical fruit, broom, brimston...",White Blend,Nicosia
1,Portugal,Quinta dos Avidagos 2011 Avidagos Red (Douro),"This is ripe and fruity, a wine that is smooth...",Portuguese Red,Quinta dos Avidagos
2,US,Rainstorm 2013 Pinot Gris (Willamette Valley),"Tart and snappy, the flavors of lime flesh and...",Pinot Gris,Rainstorm
3,US,St. Julian 2013 Reserve Late Harvest Riesling ...,"Pineapple rind, lemon pith and orange blossom ...",Riesling,St. Julian
4,US,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,"Much like the regular bottling from 2012, this...",Pinot Noir,Sweet Cheeks


In [33]:
# Initialize Vector Store Index
llm = OpenAI(temperature=0, openai_api_key=api_key)
loader = CSVLoader(file_path=dest_file)
embedding = OpenAIEmbeddings(openai_api_key=api_key)
index = VectorstoreIndexCreator(
    embedding=embedding,
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [48]:
# Query
query = "Suggest some italian wines, format result as a table in markdown with columns title, winery, variety."
response = index.query(query, llm=llm)
display(Markdown(response))

| Title | Winery | Variety |
|-------|--------|---------|
| Feudi di San Marzano 2011 I Tratturi Primitivo (Puglia) | Feudi di San Marzano | Primitivo |
| Fattoria Sardi 2015 Rosato (Toscana) | Fattoria Sardi | Rosato |
| Duca di Salaparuta 2010 Calanìca Nero d'Avola-Merlot Red (Sicilia) | Duca di Salaparuta | Red Blend |
| Cantine di Dolianova 2010 Dolia (Monica di Sardegna) | Cantine di Dolianova | Monica |

## Using RetrievalQA chain and ChatOpenAI model

In [45]:
# initialize LLM
llm = ChatOpenAI(temperature=0, openai_api_key=api_key)

# Initialize documents
file = 'dataset/wine_100.csv'
loader = CSVLoader(file_path=file)
docs = loader.load()

# initialize embeddings
embedding = OpenAIEmbeddings(openai_api_key=api_key)

# initialize db 
db = DocArrayInMemorySearch.from_documents(
    docs,
    embedding
)

# initialize retriever
retriever = db.as_retriever()

# initialize chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # map_reduce, refine, map_rerank
    retriever=retriever,
    verbose=True
)

In [52]:
# Query
query = "Suggest some wines from Argentina and Chile, format result as a table in markdown with columns; country, title, winery, variety."
response = qa_chain.run(query)
display(Markdown(response))



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


| Country   | Title                                      | Winery         | Variety    |
|-----------|--------------------------------------------|----------------|------------|
| Argentina | Gaucho Andino 2011 Winemaker Selection Malbec (Mendoza) | Gaucho Andino  | Malbec     |
| Chile     | Sundance 2011 Merlot (Maule Valley)         | Sundance       | Merlot     |
| Argentina | Felix Lavaque 2010 Felix Malbec (Cafayate)  | Felix Lavaque  | Malbec     |
| Chile     | Tres Palacios 2011 Reserve Pinot Noir (Maipo Valley) | Tres Palacios  | Pinot Noir |

In [54]:
# Query
query = "Suggest me some wines described with a fruit aroma, format answer as a table in markdown with columns; country, title, description and variety."
response = qa_chain.run(query)
display(Markdown(response))



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


| Country | Title | Description | Variety |
|---------|-------|-------------|---------|
| France  | Château de Sours 2011 La Fleur d'Amélie (Bordeaux Blanc) | Fruity and lightly herbaceous, this has fine textured acidity along with a pink grapefruit flavor. The wine is bright and easy, and it will be ready to drink in a few months. | Bordeaux-style White Blend |
| US      | Quiévremont 2012 Meritage (Virginia) | Red fruit aromas pervade on the nose, with cigar box and menthol notes riding in the back. The palate is slightly restrained on entry, but opens up to riper notes of cherry and plum specked with crushed pepper. This blend of Merlot, Cabernet Sauvignon and Cabernet Franc is approachable now and ready to be enjoyed. | Meritage |
| US      | Napa Cellars 2014 Classic Zinfandel (Napa Valley) | A healthy addition of 13% Petite Sirah provides added weight and intensity to this wine, a soft, supple and richly conceived combination of smoky black fruit and mocha. | Zinfandel |

In [55]:
# Query
query = "Suggest me a spicy wine."
response = qa_chain.run(query)
print(response)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
I would suggest trying the Cocobon 2014 Red (California). It is described as very deep in color and spicy-smoky in flavor, with aromas like grilled beef and spicy flavors like cardamom and smoke.


In [56]:
# Query
query = "Recommend me a wine which combines well with seafood."
response = qa_chain.run(query)
print(response)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
Based on the descriptions provided, I would recommend the Tasca d'Almerita 2011 Sallier de la Tour Inzolia from Italy. It is described as spicy, fresh, and clean, which would pair well with fried seafood or spaghetti con vongole.


In [57]:
# Query
query = "Recommend me a nice wine from South America."
response = qa_chain.run(query)
print(response)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
Based on the given context, I would recommend the Gaucho Andino 2011 Winemaker Selection Malbec from Argentina. It has raw black-cherry aromas, a juicy feel, and a flavor profile driven by dark-berry fruits.
