# **Project 2: LangChain RAG Project**

## **1. Install Required Libraries**


*   LangChain-Pinecone: For Vector search and document storage.
*   LangChain-Google-GenAI: For Embeddings and language model integration.
*   Google-Colab: For Drive and Colab-specific functionalities.

In [12]:
# Install required libraries
!pip install -q langchain-pinecone langchain-google-genai google-colab

## **2. Import Necessary Libraries**

*   **OS:** For setting environment variables.
*   **PineconeVectorStore:** For working with Pinecone's vector database.
*   **GoogleGenerativeAIEmbeddings:** For embedding queries.
*   **ChatGoogleGenerativeAI:** For getting answers from a generative AI model (e.g., Gemini).
*  **UUID:** For generating unique IDs.

In [13]:
# Import necessary libraries
import os
from google.colab import drive
from langchain_pinecone import PineconeVectorStore
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_core.documents import Document
from uuid import uuid4

## **3. Load Questions from File**

*   This function's purpose is to read any ".txt" file and convert the questions written in it into a list.
*   Each question is processed using strip() to remove trailing whitespace.



In [19]:
# File path (Adjust this based on your file location in Google Drive)
file_path = '/content/drive/MyDrive/Colab Notebooks/rag questions.txt'  # Replace with the actual path of your file

# Load questions from Notepad file
def load_questions(file_path):
    with open(file_path, 'r') as file:
        questions = file.readlines()
    return [q.strip() for q in questions]

## **4. Initialize RAG system components**

*   Set API keys
*   Initialize embeddings

In [24]:
# Initialize RAG system components
def initialize_rag_system():
    # Set API keys
    os.environ["GOOGLE_API_KEY"] = "GOOGLE_API_KEY_1"  # Replace with your Google API Key

    # Initialize embeddings
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

## **5. Initialize Pinecone and Create Index**

*   **Sign up:** Create an account on Pinecone's website.
*   **Get your API key:** Generate an API key from your Pinecone account.
*   **Bring it to Colab:** Add that API key to your Google Colab notebook.
*   **Run the setup:** Execute the provided code to set up your Pinecone environment.

In [21]:
from pinecone import Pinecone, ServerlessSpec
from google.colab import userdata
pinecone_api_key = userdata.get("PINECONEKEY1")

pc = Pinecone(api_key=pinecone_api_key)

In [17]:
index_name = "rag-project"

pc.create_index(
        name=index_name,
        dimension=768,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index(index_name)

## **6. Embedding Query**

*   The query is converted into embeddings, which are then stored in a vector database.
*   This displays the first 5 dimensions of the embeddings in the output.

In [23]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
import os
from google.colab import userdata
os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY_1')

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector = embeddings.embed_query(file_path)
vector[:5]

[0.04573819413781166,
 -0.026729127392172813,
 -0.04110725224018097,
 -0.03643791750073433,
 0.008257507346570492]

## **7. Initialize RAG System**

*   **Get embeddings ready:** Sets up Google's tools for creating embeddings.
*   **Connect to Pinecone:** Links to the Pinecone database where information is stored.
*   **Prepare the AI:** Gets the ChatGoogleGenerativeAI model ready to provide answers.

In [27]:
# Initialize RAG system components
def initialize_rag_system():
    # Set API keys using userdata.get
    os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY_1')  # Retrieve from userdata

    # Initialize embeddings
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

    # Initialize Pinecone
    from pinecone import Pinecone
    # Make sure this is your actual Pinecone API key or retrieve it from userdata
    pc = Pinecone(api_key="pcsk_2jjqvB_65U7Lh8Q69LVAP9QLrjUMeGqwFEgGhrmdSMoLSdJmwU1f9EogPZ6CthQoyYfZvU")
    index_name = "rag-project"
    from langchain_pinecone import PineconeVectorStore
    vector_store = PineconeVectorStore(index=pc.Index(index_name), embedding=embeddings)

    # Initialize LLM (Generative AI model)
    llm = ChatGoogleGenerativeAI(
        model="gemini-1.5-flash",
        temperature=0,
        max_tokens=None,
        timeout=None,
        max_retries=2,
    )

    return vector_store, llm

## **8. Vector Search and Answer Generation**

*   **Find similar information:** For each question, Pinecone is used to search for the 2 most similar pieces of information in the database.
*   **Get the AI's response:** The question and the similar information found are given to the Gemini AI model. Gemini then uses this information to create an answer.
*   **Show the results:** The original question and Gemini's answer are displayed in the output.
##  **Main** **WorkFlow**


*   **Get the questions:** The questions are read in from a ".txt" file.
*   **Set up the system:** The RAG system, which uses AI to answer questions, is prepared.
*   **Process and show answers:** The system works through each question, finds answers, and shows them in the Colab notebook's output area.

In [28]:
# Process each question and print answers
def process_questions(questions, vector_store, llm):
    for query in questions:
        print(f"\n**Question**: {query}")
        # Perform vector search
        vector_results = vector_store.similarity_search(query, k=2)

        # Generate the final answer using LLM
        final_answer = llm.invoke(f"ANSWER THE USER QUERY: {query}, Here are some references: {vector_results}")
        print(f"**Answer**: {final_answer.content}")

# Main workflow
if __name__ == "__main__":
    # Load questions from the file
    questions = load_questions(file_path)
    print("Questions loaded:", questions)

    # Initialize RAG system
    vector_store, llm = initialize_rag_system()

    # Process questions and display answers in Colab output
    process_questions(questions, vector_store, llm)

Questions loaded: ['who dreamed creation of pakistan and when?', 'how mohammad ali jinnah succeed?', 'why mahatma gandhi come in the politics?']

**Question**: who dreamed creation of pakistan and when?
**Answer**: The creation of Pakistan wasn't the dream of a single person, but rather a culmination of ideas and efforts from many individuals over a considerable period.  However, **Muhammad Ali Jinnah** is widely considered the most prominent figure in the movement for a separate Muslim state.  He articulated the vision and led the Muslim League's efforts to achieve it.

While the specific "dream" evolved over time, the idea of a separate Muslim homeland gained significant momentum in the early to mid-20th century, culminating in the **Pakistan Resolution (Lahore Resolution) passed in 1940**.  This resolution is generally considered the formal articulation of the demand for a separate Muslim state, marking a key moment in the dream's realization.  Therefore, while Jinnah is the most pr