# **Project 2: LangChain RAG Project**


**Project Description:** This project builds a smart shopping assistant using a Retrieval Augmented Generation (RAG) approach. It leverages the LangChain framework and Google Generative AI to answer questions about a product dataset. The system processes a CSV file containing product information, such as availability and price, and stores it in a Pinecone vector database. Users can then query the system with natural language questions, and the system retrieves relevant product information to provide accurate answers.

### **1. Installing LangChain Core and Community Packages**

In [7]:
pip install -qU langchain langchain_community

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m31.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.9/48.9 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25h

Installs or upgrades the LangChain core and community packages.

### **2. Setting Up Google API Key**

In [3]:
import getpass
import os

if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("GOOGLE_API_KEY")

GOOGLE_API_KEY··········


Sets up the Google API key for authentication.

### **3. Loading and Splitting CSV Data with LangChain**

In [8]:
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.text_splitter import CharacterTextSplitter

# Step 1: Load the CSV file
loader = CSVLoader(file_path='/content/smart_shopping_assistant.csv')
documents = loader.load()

# Step 2: Initialize the CharacterTextSplitter
chunk_size = 500
chunk_overlap = 100
text_splitter = CharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

# Step 3: Split the documents into chunks
chunks = text_splitter.split_documents(documents)

# Step 4: Check the output
print(f"Number of chunks: {len(chunks)}")
print(f"Type of first chunk: {type(chunks[0])}")

print(f"Content of first chunk: {chunks[0]}")


Number of chunks: 1000
Type of first chunk: <class 'langchain_core.documents.base.Document'>
Content of first chunk: page_content='Product Name: Product 1
Category: Electronics
Price: $60
Availability: In Stock
Description: Description for Product 1' metadata={'source': '/content/smart_shopping_assistant.csv', 'row': 0}


Loads a CSV file, splits it into smaller chunks, and prints information about the chunks.

### **6. Installing Pinecone Integration for LangChain**

In [9]:
%pip install -qU langchain-pinecone pinecone-notebooks

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.4/1.2 MB[0m [31m11.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/244.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.8/244.8 kB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/85.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.4/85.4 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25h

### **5. Setting Up Pinecone API Key**

In [10]:
import getpass
import os
import time

from pinecone import Pinecone, ServerlessSpec

if not os.getenv("PINECONE_API_KEY"):
    os.environ["PINECONE_API_KEY"] = getpass.getpass("PINECONE_API_KEY")

pinecone_api_key = os.environ.get("PINECONE_API_KEY")

pc = Pinecone(api_key=pinecone_api_key)

PINECONE_API_KEY··········


Establishes a connection to the Pinecone service.

### **6. Creating Pinecone Index**

In [11]:
import time

index_name = "rag-project-dataset-09"

pc.create_index(
    name=index_name,
    dimension=768,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index(index_name)

Creates a new Pinecone index for storing vector embeddings.

### **7. Accessing Pinecone Index**

In [12]:
index

<pinecone.data.index.Index at 0x7e25be03b4c0>

Accesses the previously created Pinecone index.

### **8. Installing Google Generative AI Embeddings**

In [13]:
%pip install --upgrade --quiet  langchain-google-genai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/41.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.3/41.3 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[?25h

Installs or upgrades the Google Generative AI embeddings package.

### **9. Initializing Google Generative AI Embeddings**

In [14]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings


embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")


Initializes Google Generative AI embeddings for text representation.

### **10. Creating Pinecone Vector Store**

In [15]:
from langchain_pinecone import PineconeVectorStore

vector_store = PineconeVectorStore(index=index, embedding=embeddings)

Creates a Pinecone vector store for managing embeddings.

### **11. Adding Documents to Vector Store**

In [16]:
from uuid import uuid4

from langchain_core.documents import Document

uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

['db44c8ef-1980-4380-a89d-190a0af981af',
 'd2b4250c-7a2c-4384-afb3-5e39d54901a6',
 '9ae4ea92-19d9-427f-b1f5-58d9917815ba',
 '51ea50f0-b8b2-4a83-9691-1a60d2bb9a73',
 '7d2bcbef-3e9e-443c-87de-c081d22aab51',
 '8447b404-b6d4-46bc-8b8a-527f809483cb',
 'b19ec790-53f2-469b-9ef3-6c0030b8a0d2',
 'aa7fbd11-7bd7-49c1-8913-e936205376b8',
 '04757d13-08a9-4aa6-863c-d585ad15ab87',
 'd82b734c-4602-4030-8bee-84b7d52e00c6',
 '7f1f0c6a-49e7-4637-ae17-d159037bdcb5',
 'bf3ebf12-891d-4533-a670-aa8e5d9a967c',
 '269a81cc-6029-4b32-b288-8a5d121e2eef',
 '5ceeecd8-7070-449e-9f3e-e5cd5074815c',
 'd0faf6e8-f660-4109-9832-60125af33c93',
 'f79d0215-1d64-4c8f-9702-3fd3c5131d93',
 'f88dac60-7ee6-4138-afd4-29b7a91119c7',
 'b681a8ff-9597-4e7f-8842-dfa2b6df5297',
 'fce74ed2-2121-4528-9a7b-5d8ead620654',
 'c2171124-e9d9-422c-bd30-2c1e366ba75e',
 '98735f96-49b9-4555-bb9f-f1ccba33dec9',
 'ee8599bc-6eb0-4065-a2de-f66f91fd9639',
 '0c1f8c0f-dae9-4808-9676-64da2fe94b40',
 '155bd788-75d5-4e4e-a955-7e96ecb8230d',
 '819d640a-6d96-

Adds documents with unique IDs to the Pinecone vector store.

### **12. Performing Similarity Search**

In [17]:
results = vector_store.similarity_search(
    "books category?",
    k=1001,
)
for res in results:
    print(f"* {res.page_content}")



* Product Name: Product 144
Category: Books
Price: $1490
Availability: Out of Stock
Description: Description for Product 144
* Product Name: Product 199
Category: Books
Price: $2040
Availability: In Stock
Description: Description for Product 199
* Product Name: Product 104
Category: Books
Price: $1090
Availability: Out of Stock
Description: Description for Product 104
* Product Name: Product 999
Category: Books
Price: $10040
Availability: In Stock
Description: Description for Product 999
* Product Name: Product 204
Category: Books
Price: $2090
Availability: Out of Stock
Description: Description for Product 204
* Product Name: Product 459
Category: Books
Price: $4640
Availability: In Stock
Description: Description for Product 459
* Product Name: Product 604
Category: Books
Price: $6090
Availability: Out of Stock
Description: Description for Product 604
* Product Name: Product 519
Category: Books
Price: $5240
Availability: In Stock
Description: Description for Product 519
* Product Name:

Performs a similarity search in the vector store and prints the results.

### **13. Creating a Retriever**

In [18]:
retriever = vector_store.as_retriever(search_kwargs={"k": 1000})

print(type(retriever))

<class 'langchain_core.vectorstores.base.VectorStoreRetriever'>


Creates a retriever object for efficient document retrieval.

### **14. Defining Chat Prompt Template**

In [19]:
from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate

In [20]:
chat_template = ChatPromptTemplate.from_messages([
    # System Message Prompt Template
    SystemMessage(content="""You are a Helpful AI Bot.
                  Given a context and question from user,
                  you should answer based on the given context."""),
    # Human Message Prompt Template
    HumanMessagePromptTemplate.from_template("""Answer the question based on the given context.
    Context: {context}
    Question: {question}
    Answer: """)
])


Defines a chat prompt template for interacting with the language model.

### **15. Initializing Output Parser**

In [21]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

Initializes an output parser to handle the model's response.

### **16. Initializing Google Generative AI Chat Model**

In [23]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-exp",
    temperature=0.7)

Initializes a Google Generative AI chat model for generating responses.

### **17. Building RAG Chain**

In [24]:
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | chat_template
    | llm
    | output_parser
)

Builds a Retrieval Augmented Generation (RAG) chain for question answering.

### **18. Fetching Google API Key**

In [25]:
from google.colab import userdata

# Store the key in the variable 'GEMINI_API_KEY'
GOOGLE_API_KEY: str = userdata.get('GOOGLE_API_KEY')

# Check if key was sucessfully fetched
if GOOGLE_API_KEY:
  print(" API key fetched sucessfully ")
else:
  print("API key not found. Please set the 'GEMINI_API_KEY.")

 API key fetched sucessfully 


Retrieves the Google API key from user data.

### **19. Running the RAG Chain**

In [29]:

response = rag_chain.invoke("""Please summarize the 'Smart Shopping Assistant' dataset by providing the following details:
                                The total number of products categorized as 'in stock' and 'out of stock.'
                                Identify the product priced at $2190, and specify whether it is available or not.
                                """)

response

'Total Products In Stock: 659\nTotal Products Out of Stock: 741\nProduct priced at $2190: Product 214, Availability: Out of Stock\n'

Executes the RAG chain with a specific query and prints the response.