This section below installs necessary **Python packages** using pip. The packages include **langchain for natural language processing**, **replicate for model replication**, **chromadb for vector storage**, **pypdf for working with PDFs**, and **sentence-transformers for text embeddings**.

In [1]:
!pip install -q langchain
!pip install -q replicate
!pip install -q chromadb
!pip install -q pypdf
!pip install -q sentence-transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.5/177.5 kB[0m [31m23.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.0/47.0 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m502.4/502.4 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

Importing necessary modules and classes for text processing, embeddings, vector storage, conversation handling, and document storage. It includes components for both **conversational retrieval and question answering**.

In [2]:
import os, sys, csv

from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain import LLMChain, PromptTemplate
from langchain.chains import RetrievalQA
from langchain.docstore.document import Document


Here, the **Replicate API token** is set as an environment variable. Replicate is a platform that facilitates model replication, and the API token is necessary for authentication.

In [3]:
os.environ["REPLICATE_API_TOKEN"] = "r8_SG8aoSUQkMOP91vcxYbd00YLv64rHMg1En6Qy"

This section defines a function to **read data** from a CSV file and prepares it for **embedding**. The data is then read from the "bigBasketProducts.csv" file using this function.

In [4]:
# Function to read the CSV and return the text to embed
def read_data(csv_file_path):
    text_data = []
    with open(csv_file_path, 'r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            # Assuming to concatenate all the text data from each row
            text = ""
            for k, v in row.items():
              text += k + ": " + v + ", "
            text_data.append(Document(page_content=text))
    return text_data

data = read_data("bigBasketProducts.csv")



Printing an **example document** from the prepared data. It provides a glimpse of the data structure and content.

In [5]:
print(data[0])

page_content='index: 1, product: Garlic Oil - Vegetarian Capsule 500 mg, category: Beauty & Hygiene, sub_category: Hair Care, brand: Sri Sri Ayurveda , sale_price: 220, market_price: 220, type: Hair Oil & Serum, rating: 4.1, description: This Product contains Garlic Oil that is known to help proper digestion, maintain proper cholesterol levels, support cardiovascular and also build immunity.  For Beauty tips, tricks & more visit https://bigbasket.blog/, '


This section initializes **Hugging Face embeddings** using the 'sentence-transformers/all-MiniLM-L6-v2' **model**, which will be used for converting text into vector embeddings.

In [6]:
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

Here, a **Chroma vector** store is created from the prepared data using the specified embeddings. The vector store is then persisted to disk for **efficient storage and retrieval**. A retriever is also created for searching vectors with specified parameters.

In [7]:
db2 = Chroma.from_documents(data, embeddings, persist_directory="./chroma_db")
db2.persist()

my_retriever = db2.as_retriever(search_kwargs={'k':2})

Demonstrating a similarity search using the Chroma vector store with a specified query. The result includes the **most similar items along with their similarity scores**.

In [8]:
query = "its woody warm and balsamic aroma"
db2.similarity_search_with_score(query, k=3, fetch_k=1)

[(Document(page_content="index: 665, product: Aroma Oil, category: Cleaning & Household, sub_category: Fresheners & Repellents, brand: Soulflower, sale_price: 1500, market_price: 1500, type: Air Freshener, rating: , description: Rajnigandha And Midnight Jasmine Aroma Oil: The Soulflower Rajnigandha aroma oil helps you to de-stress and create an energizing mood. Jasmine, also known as the ‘Queen of the Night’, has a narcotic aroma and is perfect to create a luxurious and seductive feel to the ambience.\xa0Temple Flower And Sandalwood Aroma Oil: Deep, richly intense floral tones of natural temple flower embrace every aspect of femininity. Traditional woody aroma of Sandalwood aroma oil enhances mental clarity and avoids confusion.\xa0Walk In The Wood And Mandarin Orange Aroma Oil: Walk in the Woods, inspired by the luscious Forests of Mahogany, builds an amazing and peaceful ambience that gives off a homely, warm feeling.\xa0Beautiful Day And Khus Aroma Oil: ‘Beautiful Day’ has a fresh a

A language model (**LLM**) is initialized using Replicate. The specific model, "**meta/llama-2-70b-chat**," is configured with certain parameters such as **temperature, max length, and top-p** for generating contextual answers.

In [9]:
from langchain.llms import Replicate

llm = Replicate(
    model="meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    model_kwargs={"temperature": 0.6, "max_length": 600, "top_p": 1},
)

This section below defines a function qa for **question-answering** using the initialized language model and the contextual information from the Chroma vector store. It **constructs a prompt** and utilizes the language model to generate answers.

In [10]:
def qa(llm, question):
  context = db2.similarity_search(question, k = 5)
  chunks = []
  for i, cont in enumerate(context):
    chunks.append([f"Product {i}: " + cont.page_content])
  prompt = """
    Then suggest the best one among them and explain its perks

    QUESTION: {question}
    Context: {context}
    """
  final_question = prompt.format(question=question, context=context)
  # print(final_question)
  return llm(final_question)

This section initiates an **interactive query loop**, allowing users to input prompts. The system responds with answers generated by the question-answering function using the Replicate language model.

In [15]:
print('---------------------------------------------------------------------------------')
print('Welcome to the BigBasket QueryAgent. You are now ready to start looking for the products you actually need')
print('---------------------------------------------------------------------------------')

while True:
  query=input(f"Prompt: ")
  if query == "exit" or query == "quit" or query == "q" or query == "f":
    q = print ("*** Tank you for visiting Big Basket ***")
    # print(f"Summary: " + ans)
    # print('Exiting...')
    break

  if query == '':
    continue

  ans = qa(llm, query)
  print(f"Answer: " + ans)

---------------------------------------------------------------------------------
Welcome to the BigBasket QueryAgent. You are now ready to start looking for the products you actually need
---------------------------------------------------------------------------------
Prompt: suggest some spicy eatables
Answer:  Based on the given options, I would suggest going for the "Kashmiri Chilli Powder/Menasina Pudi" by Orika. This product has a rating of 4.4 out of 5 stars and is priced at Rs. 42, which is quite reasonable considering its quality and effectiveness.

The perks of using this product include:

1. Adds a vibrant red color to dishes: Unlike regular red chili powders, Kashmiri chili powder imparts a deep red color to food, making it visually


KeyboardInterrupt: ignored