<a href="https://colab.research.google.com/github/Zeenat-Somroo911/PIAIC-Q2-P2/blob/main/PIAIC_Q2_Project_2_LangChain_RAG_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project 2: LangChain RAG Project**

In this Project, we will create a simple LangChain RAG Colab Notebook that uses the Google Gemini Flash model to answer user questions from the document provided. This example below is provided to help you get started assumes you have access to the Gemini API, Pinecone and a basic Python environment. However, you are required to develop and submit your project using Google Colab.

The project Github repo is: https://github.com/panaversity/learn-agentic-ai/blob/main/02_generative_ai_for_beginners/PROJECTS/02_rag

# **LangChain RAG with Google Gemini Flash and Pinecone**

### Installations

In [1]:
!pip install -qU langchain-pinecone langchain-google-genai

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.5/41.5 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m22.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m427.3/427.3 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.5/87.5 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.5/49.5 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[?25h

### Configuring Pinecone Api

In [2]:
from IPython.display import Markdown, display
from google.colab import userdata
from pinecone import Pinecone, ServerlessSpec
pinecone_api_key = userdata.get('PINECONE-API-KEY')
pc = Pinecone(api_key=pinecone_api_key)

### **Creating Index**

In [3]:
index_name = "langchain-rag"  # change if desired

# List existing indexes
existing_indexes = [index_info["name"] for index_info in pc.list_indexes()]

# Check if the index already exists, if not, create it
if index_name not in existing_indexes:
    pc.create_index(
        name=index_name,
        dimension=768,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )
    print(f"Index '{index_name}' created.")
else:
    print(f"Index '{index_name}' already exists.")

# Access the index
index = pc.Index(index_name)


Index 'langchain-rag' already exists.


### **Creating Embeddings Using Embedding model**

In [4]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
import os

GEMINI_KEY = userdata.get("GOOGLE_API_KEY")

embeddings = GoogleGenerativeAIEmbeddings(model = "models/embedding-001",google_api_key=GEMINI_KEY )

In [5]:
vector = embeddings.embed_query("Building a Rag project! ")

In [6]:
vector[:5]

[0.005886509083211422,
 -0.01920737698674202,
 -0.01310189813375473,
 -0.03790365159511566,
 -0.003551947884261608]

### **Creating Vector Store with Pinecone**

In [7]:
from langchain_pinecone import PineconeVectorStore

vector_store = PineconeVectorStore(index=index, embedding=embeddings)

In [8]:
from langchain_core.documents import Document

document_1 = Document(
    page_content="I have chocolate chip pancake and scrambled eggs for breakfast",
    metadata={"source": "personal", "meal": "breakfast"}
)

document_2 = Document(
    page_content="LangChain is a framework for building applications with LLMs (Large Language Models), such as GPT. It provides tools to manage chains, agents, and memory for building more advanced AI applications.",
    metadata={"topic": "LangChain", "category": "AI Framework"}
)

document_3 = Document(
    page_content="Agentic AI refers to autonomous AI systems that are capable of decision-making, learning, and adaptation in real-world environments without needing constant human intervention.",
    metadata={"topic": "Agentic AI", "category": "Artificial Intelligence", "importance": "High"}
)

document_4 = Document(
    page_content="In the latest world news, a new tech company has developed an innovative AI that is able to solve real-world problems faster than previous models, pushing the boundaries of automation and efficiency.",
    metadata={"topic": "Tech News", "date": "2025-01-02", "company": "Innovative AI Company"}
)

document_5 = Document(
    page_content="The use of AI in healthcare is growing rapidly. Recent advancements in diagnostic AI tools are helping doctors identify diseases more accurately and faster, significantly improving patient outcomes.",
    metadata={"topic": "Healthcare AI", "category": "Medical Technology", "impact": "Positive"}
)

document_6 = Document(
    page_content="LangChain offers a wide range of tools to work with LLMs. This includes support for document search, question-answering systems, and memory management to make intelligent agents more effective in real-world tasks.",
    metadata={"topic": "LangChain", "category": "AI Tools", "use_case": "Advanced workflows"}
)

document_7 = Document(
    page_content="Agentic AI is becoming increasingly important in industries like autonomous vehicles, robotics, and smart cities, where real-time decision-making and adaptability are crucial for success.",
    metadata={"topic": "Agentic AI", "category": "Industry Applications", "industries": ["Autonomous Vehicles", "Robotics", "Smart Cities"]}
)

document_8 = Document(
    page_content="A new world record has been set for the fastest internet speed, with researchers breaking through previous limitations using advanced fiber-optic technology, promising faster and more efficient global communication.",
    metadata={"topic": "Tech News", "category": "Innovation", "record": "Fastest Internet Speed", "date": "2025-01-02"}
)

document_9 = Document(
    page_content="In a recent study, AI-powered chatbots have been shown to outperform human customer service agents in resolving technical issues, reducing wait times and improving customer satisfaction.",
    metadata={"topic": "AI in Customer Service", "category": "Business Technology", "impact": "High"}
)

document_10 = Document(
    page_content="LangChain continues to evolve with new integrations, including support for databases, APIs, and external data sources, enabling more complex and efficient workflows for AI applications.",
    metadata={"topic": "LangChain", "category": "Development", "features": ["Database Integration", "API Support", "External Data Sources"]}
)

documents = [
    document_1, document_2, document_3, document_4, document_5,
    document_6, document_7, document_8, document_9, document_10
]

In [9]:
len(  documents)

10

## Adding Documents and giving IDs

In [10]:
from uuid import uuid4
uuid4()

UUID('8270ea38-8037-409f-a5d1-b8db29d39415')

In [11]:
uuids = [str(uuid4()) for i in range (len(documents))]
vector_store.add_documents(documents=documents, ids=uuids)

['01119917-e8ae-459c-8399-00613ddab641',
 'be32fb65-3268-4a65-86b1-12b14fc930f2',
 'c75002a2-7810-4f6f-8c73-7ddc88a1da22',
 '89f37438-9519-459d-ad4d-6a8256886305',
 '50d42ea8-c12d-4aa1-853b-082e03c566a2',
 '496c16f6-04d3-4935-b410-d3005886f729',
 'f89fab79-2e28-4c0b-8e53-fa5817c2ae76',
 '20bb46bb-31f5-4c40-9015-ecdcf57df64b',
 '5b60ff38-844c-4dfc-bebb-16e1fd8a98b8',
 '37acfa4d-e73d-4531-91b2-8641727367c5']

### Performng Similarity search

In [12]:
vector_result = vector_store.similarity_search(
    "What is langchain", k=7,)
print(vector_result[0].page_content)

LangChain is a framework for building applications with LLMs (Large Language Models), such as GPT. It provides tools to manage chains, agents, and memory for building more advanced AI applications.


### Generating Contextual Answers with Google Generative AI

### **Using Langchain**

In [13]:
from langchain_google_genai import ChatGoogleGenerativeAI

from langchain.prompts import PromptTemplate

In [14]:
def user_answer(question):

  vector_result = vector_store.similarity_search(question, k=5)

  llm = ChatGoogleGenerativeAI(
      api_key=GEMINI_KEY,
      model = "gemini-2.0-flash-exp",
      max_tokens= 100,
      temperature=0.7
  )

  prompt1= PromptTemplate(
      input_variables=["question"],
      template = "Using this data {vector_result} . Answer the following question: \n\n{question}"
  )
  chain1= prompt1 | llm

  final_answer = chain1.invoke({"vector_result": vector_result, "question": question})

  return final_answer

In [15]:
response = user_answer("what is ai?")

In [16]:
display(Markdown(response.content))

Based on the provided documents, AI is described in two main contexts:

1.  **Agentic AI:** This refers to **autonomous AI systems capable of decision-making, learning, and adaptation in real-world environments without needing constant human intervention.** This type of AI is highlighted as being important in areas like autonomous vehicles, robotics, and smart cities.

2.  **Healthcare AI:**  This is described as rapidly growing, with advancements in areas like **diagnostic tools that help doctors identify