<a href="https://colab.research.google.com/github/Arif-Kasim1/PIAIC-201/blob/main/201_PROJECT_02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
%pip install -Uq langchain==0.1.0 langchain-google-genai==0.0.6 pinecone-client==3.0.0 google-generativeai==0.3.2

In [4]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_community.vectorstores import Pinecone
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
import pinecone
import os
from typing import List
from google.colab import userdata
from pinecone import ServerlessSpec # Import the ServerlessSpec
import textwrap

class SimpleRAG:
    def __init__(self, pinecone_api_key: str, google_api_key: str, index_name: str):
        """
        Initialize the RAG system
        """
        # Set up API keys
        os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")
        os.environ["PINECONE_API_KEY"] = userdata.get("PINECONE_API_KEY")

        # Initialize Pinecone client (v3)
        self.pc = pinecone.Pinecone(api_key=pinecone_api_key)
        self.index_name = index_name

        # Initialize Gemini components
        self.embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
        # Add the convert_system_message_to_human parameter here
        self.llm = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0, convert_system_message_to_human=True)

        # Create index if it doesn't exist
        if self.index_name not in self.pc.list_indexes().names():
            self.pc.create_index(
                name=self.index_name,
                dimension=768,  # Gemini embedding dimension
                metric="cosine",
                spec=ServerlessSpec( # Use ServerlessSpec
                    cloud="aws",
                    region="us-east-1" # Changed region to us-east-1
                )
            )

        # Initialize vector store
        self.vector_store = Pinecone.from_existing_index(
            index_name=self.index_name,
            embedding=self.embeddings
        )

    def load_documents(self, text: str) -> List[str]:
        """
        Split text into chunks
        """
        splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=100
        )
        return splitter.split_text(text)

    def add_texts(self, texts: List[str]) -> None:
        """
        Add texts to Pinecone
        """
        self.vector_store.add_texts(texts)
        print(f"Added {len(texts)} chunks to Pinecone")

    def query(self, question: str) -> str:
        """
        Query the RAG system
        """
        qa_chain = RetrievalQA.from_chain_type(
            llm=self.llm,
            retriever=self.vector_store.as_retriever(search_kwargs={"k": 3}),
            return_source_documents=True
        )
        response = qa_chain({"query": question})
        return {
            "answer": response["result"],
            "sources": [doc.page_content for doc in response["source_documents"]]
        }

    def cleanup(self) -> None:
        """
        Delete the Pinecone index
        """
        if self.index_name in self.pc.list_indexes().names():
            self.pc.delete_index(self.index_name)
            print(f"Deleted index: {self.index_name}")

# Example usage
def main():
    # Initialize the RAG system
    rag = SimpleRAG(
        pinecone_api_key=userdata.get("PINECONE_API_KEY"),
        google_api_key=userdata.get("GOOGLE_API_KEY"),
        index_name="test-index"
    )

    # Example text
    sample_text = """
    Artificial Intelligence (AI) is revolutionizing various industries.
    Machine Learning, a subset of AI, enables systems to learn from data.
    Deep Learning, a type of Machine Learning, uses neural networks with multiple layers.
    Natural Language Processing (NLP) allows computers to understand human language.
    Computer Vision helps machines interpret and analyze visual information.

    Introduction: The Pentium processor series, launched by Intel in 1993,
    marked a significant leap in performance over its predecessor, the 486 series.
    Architecture: Introduced advanced features like superscalar architecture,
    allowing multiple instructions per clock cycle.
    Variants: Evolved through multiple generations, including Pentium Pro,
    Pentium II, Pentium III, and Pentium 4, offering enhanced speed and functionality.
    Technology: Incorporated technologies like MMX, Hyper-Threading, and higher
    clock speeds to cater to evolving computing needs.
    Legacy: Paved the way for modern processors, blending performance and
    efficiency for desktops and laptops.

    The future of machine learning (ML) is poised for transformative growth,
    driving innovation across industries. Advances in neural networks, quantum
    computing, and explainable AI will make ML more powerful and transparent.
    It will revolutionize healthcare with precise diagnostics, enhance automation
    in manufacturing, and optimize personalized experiences in retail and
    entertainment. Ethical AI and robust frameworks will address challenges like
    bias and privacy. Seamless integration with IoT, robotics, and edge
    computing will bring AI closer to users. As ML democratizes through
    accessible tools, its potential to solve global challenges, from climate
    change to education, will shape a smarter, sustainable world.

    Pakistanis hold a special affection for their traditional dishes, with
    biryani, paye, and nihari reigning supreme. Biryani, a fragrant mix of rice,
    meat, and spices, is a celebratory dish enjoyed at weddings, festivals, and
    casual gatherings. Its rich flavors and endless variations make it a
    nationwide favorite. Paye, a slow-cooked delicacy made from trotters,
    offers a hearty, flavorful experience often relished during breakfast or
    family feasts. Nihari, a spicy stew of tender meat simmered overnight, is
    synonymous with comfort food, particularly loved in winters. These dishes
    are more than just meals—they represent Pakistan’s rich culinary heritage
    and the warmth of sharing food. Served with naan, parathas, or raita, they
    bring families and friends together, embodying a deep-rooted tradition of
    hospitality. Whether in bustling cities or quiet villages, the love for
    biryani, paye, and nihari reflects the soul of Pakistani cuisine.
    """

    # Process and add documents
    chunks = rag.load_documents(sample_text)
    rag.add_texts(chunks)

    # Ask a question
    # question = "What is Machine Learning and how does it relate to AI?"
    # question = "What is Machine Learning future and how does it relate to AI?"
    # question = "What is JSP and Servlet?"
    # question = "Difference between Pentium 2 and Pentium 4?"
    # question = "Difference between 486 and Pentium?"

    while True:
      question = input("How can I help you? ")

      if question == "exit":
        break

      result = rag.query(question)

      # Print results
      print("\nQuestion:", question)
      print("\nAnswer:", textwrap.fill(result["answer"], width=80))
      # textwrap.fill(response, width=80)
      print("\nSources used:")
      for i, source in enumerate(result["sources"], 1):
          print(f"\nSource {i}:", source)

      print("\n *********************** \n")

      # Optional: Clean up
      # rag.cleanup()

if __name__ == "__main__":
    main()

Added 5 chunks to Pinecone
How can I help you? what is biryani?

Question: what is biryani?

Answer: The provided context does not contain the answer to the question "what is
biryani?".

Sources used:

Source 1: biryani, paye, and nihari reflects the soul of Pakistani cuisine.

Source 2: biryani, paye, and nihari reflects the soul of Pakistani cuisine.

Source 3: hospitality. Whether in bustling cities or quiet villages, the love for 
    biryani, paye, and nihari reflects the soul of Pakistani cuisine.

 *********************** 

How can I help you? what is ML?

Question: what is ML?

Answer: Machine Learning (ML)

Sources used:

Source 1: The future of machine learning (ML) is poised for transformative growth, 
    driving innovation across industries. Advances in neural networks, quantum 
    computing, and explainable AI will make ML more powerful and transparent. 
    It will revolutionize healthcare with precise diagnostics, enhance automation 
    in manufacturing, and optimize 