# **Q&A Over Documents**

An example might be a tool that would allow you to query a product catalog for items of interest.

In [16]:
## ! pip install langchain langchain_groq langchain_community chromadb sentence_transformers docarray

Note: LLM's do not always produce the same results. When executing the code in your notebook, you may get slightly different answers that those in the video.

In [2]:
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.indexes import VectorstoreIndexCreator
from langchain.document_loaders import CSVLoader
from IPython.display import display, Markdown

file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path = file)

## **Embeddings:**
Embeddings are numerical representations of text, images, or other data types in a continuous vector space. They are created by machine learning models to capture the semantic meaning of the input data in a way that similar inputs have similar embeddings.

### `VectorstoreIndexCreator`
The `VectorstoreIndexCreator` in LangChain is a utility that simplifies the process of creating, managing, and using vector-based indices for retrieving documents or information. It leverages vector embeddings to represent documents and queries, facilitating efficient and effective search and retrieval tasks.

In [3]:
from langchain_community.embeddings import HuggingFaceEmbeddings

index = VectorstoreIndexCreator(
    embedding = HuggingFaceEmbeddings(),
    vectorstore_cls = DocArrayInMemorySearch
).from_loaders([loader])

  warn_deprecated(
  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [4]:
from langchain_groq import ChatGroq
import os

os.environ['GROQ_API_KEY'] = 'YOUR-API-KEY'

In [5]:
llm = ChatGroq(
    model = "llama3-70b-8192",
    temperature = 0.2
)

query = "Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

In [6]:
response = index.query(query,
                       llm = llm)

display(Markdown(response))

Here is the list of shirts with sun protection in a table in markdown:

| **Shirt Name** | **Description** | **Fabric** | **UPF Rating** | **Features** |
| --- | --- | --- | --- | --- |
| Women's Tropical Tee, Sleeveless | Slightly fitted, sleeveless button-up shirt with SunSmart™ protection | 71% nylon, 29% polyester | UPF 50+ | Wrinkle resistant, low-profile pockets, front and back cape venting, two front pockets, tool tabs, and eyewear loop |
| Sun Shield Shirt | High-performance sun shirt with UPF 50+ protection | 78% nylon, 22% Lycra Xtra Life fiber | UPF 50+ | Wicks moisture, fits comfortably over swimsuit, abrasion resistant |
| Men's Plaid Tropic Shirt, Short-Sleeve | Ultracomfortable sun protection shirt with UPF 50+ coverage | 52% polyester, 48% nylon | UPF 50+ | Wrinkle-free, high-performance fabric, front and back cape venting, two front bellows pockets |
| Tropical Breeze Shirt | Lightweight, breathable long-sleeve shirt with superior SunSmart™ protection | 71% nylon, 29% polyester | UPF 50+ | Wrinkle-resistant, moisture-wicking fabric, relaxed fit, innovative SunSmart technology |

Each of these shirts offers UPF 50+ protection, which means they block 98% of the sun's harmful UV rays. They also have various features such as wrinkle resistance, moisture-wicking fabric, and comfortable fits.

## **VectorStore:**
Vector databases are specialized databases designed to store, index, and query high-dimensional vectors (embeddings).

### `DocArrayInMemorySearch`
`DocArrayInMemorySearch` is an in-memory solution provided by the DocArray library for storing and retrieving documents based on their vector embeddings.
It is designed for fast, real-time search operations by keeping all data in memory.
- **Extremely fast read and write operations due to in-memory storage.**
- **Limited by the amount of available RAM, making it less suitable for very large datasets.**


### VectorDatabases(Chroma)
- **Can handle much larger datasets compared to in-memory solutions by utilizing disk storage.**

In [8]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings()

db = DocArrayInMemorySearch.from_documents(
    loader.load(),
    embeddings
)



In [9]:
query = "Please suggest a shirt with sunblocking"

docs = db.similarity_search(query)
print(len(docs))

4


In [10]:
print(docs[0].page_content)

: 255
name: Sun Shield Shirt by
description: "Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. 

Size & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.

Fabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.

Additional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.

Sun Protection That Won't Wear Off
Our high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun's harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.


## **QuestionAnswering:**

Afrwe retrieving the similar docuemnts from database, we need to feed them to an llm to make a conversation.

In [11]:
llm = ChatGroq(
    model = "llama3-70b-8192",
    temperature = 0.2
)

In [12]:
## Concat all the choosen results from the similarity search
qdocs = "".join([docs[i].page_content for i in range(len(docs))])

response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.")

display(Markdown(response))

  warn_deprecated(


Here is the list of shirts with sun protection in a table in markdown:

| **Shirt Name** | **Description** | **Fabric** | **Sun Protection** |
| --- | --- | --- | --- |
| Sun Shield Shirt | High-performance sun shirt for outdoor activities | 78% nylon, 22% Lycra Xtra Life fiber | UPF 50+, blocks 98% of harmful UV rays |
| Women's Tropical Tee | Sleeveless button-up shirt with SunSmart™ protection | Shell: 71% nylon, 29% polyester, Cape lining: 100% polyester | UPF 50+, blocks 98% of harmful UV rays |
| Girls' Beachside Breeze Shirt | Rash guard with built-in UPF 50+ protection for watersports and beach activities | 80% nylon, 20% Lycra Xtra Life | UPF 50+, blocks 98% of harmful UV rays |
| Sunrise Tee | Lightweight, high-performance shirt for hot weather activities | Shell: 71% nylon, 29% polyester, Cape lining: 100% polyester | UPF 50+, blocks 98% of harmful UV rays |

Summary:

* All four shirts offer high-performance sun protection with UPF 50+ rating, blocking 98% of harmful UV rays.
* The Sun Shield Shirt and Girls' Beachside Breeze Shirt are designed for outdoor activities and watersports, respectively, with durable and seawater-resistant fabrics.
* The Women's Tropical Tee and Sunrise Tee are designed for everyday wear, with a focus on comfort, wrinkle-resistance, and quick-drying fabrics.
* All shirts are made with high-quality fabrics that provide excellent sun protection, making them ideal for outdoor enthusiasts and individuals who spend extended periods in the sun.

### **UsingChains:**

All the mentioned process, can be done by a simple langchain chain:

In [13]:
from langchain.chains import RetrievalQA

qa_stuff = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type = "stuff",
    retriever = db.as_retriever(),
    verbose = True
)

query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [14]:
response = qa_stuff.run(query)

  warn_deprecated(




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [15]:
display(Markdown(response))

Here is the list of shirts with sun protection in a table in markdown:

| **Shirt Name** | **Description** | **Fabric** | **Sun Protection** |
| --- | --- | --- | --- |
| Women's Tropical Tee, Sleeveless | Slightly fitted, sleeveless shirt with SunSmart™ protection | 71% nylon, 29% polyester | UPF 50+, blocks 98% of UV rays |
| Sunrise Tee | Lightweight, high-performance shirt with built-in sun protection | 71% nylon, 29% polyester | UPF 50+, blocks 98% of UV rays |
| Sun Shield Shirt | High-performance sun shirt with UPF 50+ protection | 78% nylon, 22% Lycra Xtra Life fiber | UPF 50+, blocks 98% of UV rays |
| Tropical Breeze Shirt | Lightweight, breathable long-sleeve shirt with SunSmart™ protection | 71% nylon, 29% polyester | UPF 50+, blocks 98% of UV rays |

Here's a brief summary of each shirt:

* **Women's Tropical Tee, Sleeveless**: A sleeveless shirt with a slightly fitted design, offering SunSmart™ protection with UPF 50+ rating.
* **Sunrise Tee**: A lightweight, high-performance shirt designed for hot weather, with built-in sun protection and a UPF 50+ rating.
* **Sun Shield Shirt**: A high-performance sun shirt with a UPF 50+ rating, designed to provide superior sun protection.
* **Tropical Breeze Shirt**: A lightweight, breathable long-sleeve shirt with SunSmart™ protection, offering UPF 50+ coverage and a relaxed fit.