#**Step 1: Install Required Libraries**

In [None]:
%pip install -qU langchain-pinecone langchain-google-genai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/41.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.3/41.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.5/1.2 MB[0m [31m16.2 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.2/1.2 MB[0m [31m24.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/244.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.8/244.8 kB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━

This command installs or updates the langchain-pinecone and langchain-google-genai libraries, which are essential for integrating LangChain with Pinecone and Google's Gemini AI, respectively.

# **Step 2: Import Libraries and Configure Pinecone**

In [None]:
from google.colab import userdata

from pinecone import Pinecone, ServerlessSpec

pinecone_api_key = userdata.get('PINECONE_API_KEY')

pc = Pinecone(api_key=pinecone_api_key)

**1**.**from google.colab import userdata:**

◘ Google Colab provides a built-in module called userdata that securely stores
and retrieves sensitive information, like API keys.

◘ This helps you avoid hardcoding sensitive information (like keys) in your code, keeping it safe.


**2**. **from pinecone import Pinecone, ServerlessSpec:**

◘ This imports the Pinecone library's Pinecone and ServerlessSpec classes.

◘ Pinecone is a service used for building vector databases, where data is stored and retrieved as numerical vectors for machine learning purposes.

◘ ServerlessSpec helps configure Pinecone's serverless database setup (cloud provider, region, etc.).


**3**. **pinecone_api_key = userdata.get('PINECONE_API_KEY'):**

◘ This retrieves your Pinecone API key from the secure userdata locker.

◘ The key is stored under the label 'PINECONE_API_KEY'.

◘ You need this key to authenticate with Pinecone and prove you have permission to use its services.

**4**. **pc = Pinecone(api_key=pinecone_api_key):**

◘ This initializes a Pinecone client (pc) using the API key retrieved in the previous step.

◘ The client (pc) is what you use to interact with the Pinecone database, such as creating indexes, storing vectors, and searching data.



# **Step 3: Create Pinecone Index**

In [None]:
index_name = "online-rag-project2"

pc.create_index(
     name=index_name,
     dimension=768,
     metric="cosine",
     spec=ServerlessSpec(cloud="aws", region="us-east-1"),

 )

index = pc.Index(index_name)

This step creates a Pinecone index named "online-rag-project" with a vector dimension of 768 and a cosine similarity metric. The ServerlessSpec specifies the cloud provider and region. After creation, you initialize the index for further operations.

An index in Pinecone is like a specialized storage unit for vectors. It's where all your data (transformed into numerical vectors) will be stored and retrieved efficiently.

**Purpose of an Index:**

To store vectors in a structured way.
To enable fast similarity searches or queries, like finding the closest matches to a given vector.


**What Does This Code Achieve?**

**Creates an Index**: Named "online-rag-project", capable of storing vectors of size 768.

**Sets Up for Vector Similarity:** Uses cosine similarity to find related vectors.

**Hosts on the Cloud:** Deploys on AWS in the us-east-1 region for efficient storage and retrieval.

# **Step 4: Set Up Google Gemini AI Embeddings**

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
import os

os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

**What Does This Code Do?**

**1.Import the Required Class:**
GoogleGenerativeAIEmbeddings is imported to generate embeddings.

**Set the API Key:**The API key is retrieved from the userdata and set as an environment variable.
This is necessary for authenticating your requests to Google’s AI models.

**Initialize the Embedding Model:**
The embeddings object is created using the GoogleGenerativeAIEmbeddings class, with the specified model (models/embedding-001).

**Key Difference: Direct Use vs. Environment Variables**

**Direct Use:**

◘ In Step 2, userdata.get directly fetches and uses the API key, which is passed explicitly to the Pinecone client.

◘ No environment variable is needed because Pinecone’s API directly supports passing the key.

**Environment Variables:**

◘ In this step, the embedding library expects the key to exist as an environment variable. This is a requirement of the langchain-google-genai library.

◘ By setting the key in os.environ, you satisfy this requirement.

**Summary**

◘ os: Manages environment variables to securely handle sensitive data like API keys.

◘ Embeddings: Convert raw text into numerical vectors that capture its meaning.

◘ Purpose: This step sets up the embedding model to generate vectors, which form the backbone of your RAG (Retrieval-Augmented Generation) system.

# **Step 5: Generate Embedding for a Query**

In [None]:
vector = embeddings.embed_query("Hello world")


This line generates a vector embedding for the query "Hello world" using the previously initialized embeddings model.

**Summary**

◘ Vector Embedding converts text into a numerical format (a vector) that preserves the meaning of the text.

◘ Vectors help machines understand and compare the meaning of different pieces of text, like words or sentences.

◘ In your code, embed_query("Hello world") is creating a vector representation of the sentence "Hello world," which will be useful for tasks like searching, similarity comparison, and more.

# **Step 6: Inspect the Generated Vector**

In [None]:
vector[:5]

[0.04703257977962494,
 -0.04019005596637726,
 -0.02902696467936039,
 -0.026809632778167725,
 0.01892058178782463]

Here, you display the first five elements of the generated vector to inspect its structure.


**vector:** This refers to the vector generated by the embed_query("Hello world") function, which is a list (or array) of numbers representing the sentence "Hello world" in a multi-dimensional space.

**[:5]**: This is Python slice notation, which extracts the first five elements of the vector. It means:

Start from the beginning of the vector (index 0).
Stop at index 5 (but not include it, so you get the elements at indices 0, 1, 2, 3, and 4).

This allows you to inspect the first few numbers in the vector, which is helpful for understanding the structure of the embedding and getting a sense of how the model represents the input data numerically.

# **Step 7: Initialize Pinecone Vector Store**

In [None]:
from langchain_pinecone import PineconeVectorStore

vector_store = PineconeVectorStore(index=index, embedding=embeddings)

You import the PineconeVectorStore class and initialize it with the Pinecone index and embeddings model. This setup allows for efficient storage and retrieval of vector embeddings.

**In Summary:**

**Pinecone** is a vector database that efficiently stores and retrieves high-dimensional vectors (like those generated from text using an embedding model).

**LangChain** abstracts and simplifies working with such databases. In this step, we initialize a PineconeVectorStore, which connects the Pinecone database with our embeddings model (Google’s model here).

**The purpose of this step** is to set up a system where you can store and search for vector representations of text data using Pinecone and LangChain.

# **Step 8: Create Document Instances**

In [None]:
from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

You import the Document class and create an instance document_1 with sample text and metadata indicating the source.

**Summary:**

**Instance**: document_1 is an instance of the Document class. It represents a specific document with its content and metadata.

**Class:** The Document class is like a blueprint or template that defines how a document should look.

**Purpose:** You're using this step to create a document with text (page_content) and metadata (source), so it can be stored, processed, and searched later.

# **Step 9: Display Document**

In [None]:
document_1

Document(metadata={'source': 'tweet'}, page_content='I had chocalate chip pancakes and scrambled eggs for breakfast this morning.')

This line displays the document_1 instance, showing its content and metadata.



# **Step 10: Create Multiple Document Instances**

In [None]:
from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]

len(documents)

10

You create multiple Document instances with varied content and metadata, then compile them into a list documents. Finally, you check the length of the list to confirm the number of documents.



# **Step 11: Generate Unique Identifiers**

In [None]:
from uuid import uuid4
uuid4()

UUID('5de7b2cc-090e-4a42-9e9d-8390c587dc14')

You import the uuid4 function and generate a unique identifier. This is useful for assigning unique IDs to documents.

◘ UUID stands for Universally Unique Identifier. It's a 128-bit number used to uniquely identify information in computer systems.

◘ uuid4() is a method from Python's uuid module that generates a random UUID.

◘ When working with collections of documents, data, or objects, it's often important to ensure that each one can be uniquely identified. This is where UUIDs come into play.


# **Step 12: Assign Unique IDs and Add Documents to Vector Store**

In [None]:
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

['7e05acec-dc99-4d9f-b203-993e4456ceef',
 'a41bc81d-4d54-4b58-8c7b-5f79830fd95f',
 'a13d2df4-7e22-484e-8aab-1dfb6aed5be0',
 '159a328a-49af-4db0-b8af-ccd0e2f7b47e',
 '2a8856d5-6ad1-4e1d-ae43-bbe8ad0c785b',
 'e7e6e36a-b5ed-4c53-985a-884c365cc7c0',
 '994fbc81-f165-4bcb-a56e-b6e7f1c4dfee',
 '3c7fc3d0-1503-4448-be0a-bb9731116ab4',
 '5d011055-f3e6-40d1-8c2e-5e028d220c32',
 'a27d7a4a-62f6-4fd4-a67f-defeb6df2a96']

You generate a list of unique IDs for each document and add the documents to the vector store with their corresponding IDs.

Storage: Databases or APIs often store IDs as strings because it is a universal format that can be easily indexed and retrieved.

# **Step 13: Perform Similarity Search with Filter**

In [None]:
# Data Retrieve
results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter={"source": "tweet"},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]


You perform a similarity search in the vector store for documents related to the query "LangChain provides abstractions to make working with LLMs easy," retrieving the top 2 results filtered by the source "tweet." Then, you print the content and metadata of each result.

**How It Works:**

1. **Query to Vector Conversion:** The input query "LangChain provides abstractions to make working with LLMs easy" is first converted into a vector using the embedding model (e.g., GoogleGenerativeAIEmbeddings).

2. **Similarity Calculation:** Pinecone compares the query vector to the stored vectors and finds the two most similar vectors to the query (as specified by k=2). It uses a distance metric (like cosine similarity) to measure the closeness between vectors.

3. **Filter: The filter** ({"source": "tweet"}) ensures that only documents tagged with the "source": "tweet" metadata are included in the search results.

4. **Displaying Results:** After the similarity search is done, the code loops over the results and prints the page_content (text content) and metadata (additional info like source) of the top two most similar documents.

# **Step 14: Perform Similarity Search with Scores and Filter**

In [None]:
results = vector_store.similarity_search_with_score(
    "Will it be hot tomorrow?", k=1, filter={"source": "news"}
)
for res, score in results:
    print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")

* [SIM=0.667716] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]


**How It Works:**

**Step 1:** **Embedding the Query:**  The query "Will it be hot tomorrow?" is passed through the embeddings model (e.g., GoogleGenerativeAIEmbeddings). This model converts the query into a vector representation.

**Step 2:** **Vector Comparison:** The query vector is then compared to all the vectors stored in the Pinecone vector store (or any other vector store). The comparison uses a distance metric (often cosine similarity) to measure how similar each stored vector is to the query vector.

**Step 3: Returning Results:T**he method returns the document that has the highest similarity to the query. Along with the document, it also returns the similarity score, which indicates how closely the document matches the query. The score can be a value between 0 and 1 (depending on the similarity metric used).

# **Step 15: Initialize Google Gemini AI Model**

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)

Here, you import the ChatGoogleGenerativeAI class and initialize it with the "gemini-1.5-flash" model. You set parameters such as temperature, max_tokens, timeout, and max_retries to control the model's behavior.


1. **Parameters Used in ChatGoogleGenerativeAI:**

**model="gemini-1.5-flash":**
This specifies the particular AI model you're using, which in this case is "gemini-1.5-flash". The model refers to a version of Google's Gemini AI that is specialized for handling certain tasks, such as text generation and conversational AI.

**temperature=0:**
This controls the creativity of the responses. A lower temperature (like 0) results in more deterministic and focused responses, while higher values (e.g., 1) allow for more random or creative answers.
In this case, setting temperature=0 ensures that the model gives more consistent and logical responses.

**max_tokens=None:**
This specifies the maximum number of tokens (i.e., words or characters) the model should generate in a response. Setting it to None means there’s no explicit limit, so the model can generate as many tokens as needed based on the context.

**timeout=None:**
This defines the time limit for a request to complete. If set to None, it means there is no specific timeout limit.

**max_retries=2:**
This indicates that the system will retry the request up to 2 times in case of failures (e.g., network issues or timeouts).
The model will attempt the same operation again before giving up, ensuring more robust performance.

2. **Use Case:**

**Natural Language Processing (NLP):** The ChatGoogleGenerativeAI class is useful when you need to integrate Google's powerful generative AI into your application. It allows you to perform tasks such as:

**Text generation:** Creating content based on a prompt (e.g., writing articles, blogs, summaries).

**Question answering:** Using the model to answer user queries based on a dataset or context.

**Conversational AI:** Engaging in chat-like interactions where the AI responds to user inputs in natural language.

**Integrating AI into Applications:** By using this class, you can integrate Google's Gemini AI into your applications, making them smarter and more interactive.

**For example:**

**Customer service bots:** Using the AI to answer customer queries and support tickets automatically.

**Content creation tools:** Allowing the AI to assist in generating text-based content for websites, blogs, and other platforms.

**Personal assistants:** Creating smart assistants that can converse with users and provide useful information.


# **Step 16: Define Function to Answer User Questions**

In [None]:
def answer_to_user_question(question: str):

  # Vector Search
  vector_results = vector_store.similarity_search(question, k=2)
  print(len(vector_results))

  # Pass to Model Vector Rsults + User Query
  final_answer = llm.invoke(f"ANSWER THIS QUERY: {question},Here are some refrences answer {vector_results}")

  return final_answer

You define a function answer_to_user_question that takes a user query as input. It performs a similarity search in the vector store to retrieve the top 2 relevant documents. Then, it passes the user query along with the retrieved documents to the Gemini AI model to generate a final answer.

# **Step 17: Get Answer to a Specific Question**

In [None]:
answer = answer_to_user_question("LangChain provides abstractions to make working with LLMs easy")

2


You call the answer_to_user_question function with the query "LangChain provides abstractions to make working with LLMs easy" and retrieve the generated answer's content.

In [None]:
answer.content

"The provided text mentions LangChain, a framework for working with Large Language Models (LLMs), making it easier to use them.  The reference documents are unrelated to this statement; one mentions LangGraph and the other mentions a project using LangChain, but neither directly supports or refutes the claim about LangChain's ease of use.  Therefore, the query about LangChain's ease of use is neither confirmed nor denied by the provided references.\n"

# **Flow Of  Retrieval-Augmented Generation(RAG)**

The code you've provided sets up a Retrieval-Augmented Generation (RAG) system using LangChain, Pinecone, and Google's Gemini AI model. The primary goal is to enhance the capabilities of a language model by integrating it with external data sources, allowing it to provide more accurate and contextually relevant responses.

**Key Components:**

1. **Pinecone Vector Store:** Used for storing and retrieving vector embeddings of documents, enabling efficient similarity searches.

2. **Google Gemini AI Model:** Generates embeddings and answers user queries based on the retrieved documents.

3. **LangChain Framework:** Facilitates the orchestration of the RAG process, managing the flow of data between components.


**Process Flow:**

1. **Data Preparation:** Documents are created and embedded using the Google Gemini AI model.

2. **Indexing:** The embedded documents are stored in the Pinecone vector store.

3. **Query Processing:** When a user query is received, a similarity search is performed in the vector store to retrieve relevant documents.

4. **Answer Generation:** The retrieved documents, along with the user query, are passed to the Gemini AI model to generate a final answer.

By combining these components, the system can provide responses that are both accurate and contextually relevant, leveraging external data sources to augment the language model's knowledge.

# When a user ask a query then how it's work see in the Example

**Example Flow:**

1. **Text Input:** You provide the query "What is the weather like today?".

2. **Vectorization:** This query is converted into a vector, like [0.3456, -0.7892, 0.2345, ...].

3. **Search:** Pinecone searches for the most similar vectors to the query vector in the database.

4. **Matching Vectors:** Pinecone finds the vectors that are closest to the query vector and returns their UUIDs.

5. **Result:** You retrieve the documents associated with those UUIDs (e.g., "The weather forecast is cloudy with a chance of rain"), which is your result.