In [None]:
!pip install \
  langchain \
  langchain-community \
  langchain-core \
  langchain-google-genai \
  langchain-pinecone \
  pinecone-client \
  google-search-results \
  wikipedia \
  sentence-transformers \
  python-dotenv \
  tqdm \
  requests


In [1]:
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()
GEMINI_API_KEY = os.getenv("GOOGLE_API_KEY")

# Initialize Gemini Flash LLM
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",  # This internally uses Flash for fast responses
    google_api_key=GEMINI_API_KEY,
    convert_system_message_to_human=True,
    temperature=0.2
)

# Test it
response = llm.invoke("Explain the difference between Python lists and tuples in simple terms.")
print(response.content)




Imagine you have two containers for storing items:

*   **List:** Like a shopping list you can change. You can add items, remove items, or change the order of items.  It's **mutable** (changeable).  You write it using square brackets: `[apple, banana, cherry]`

*   **Tuple:** Like a set of coordinates (x, y). Once you define them, you can't change them. You can't add, remove, or change the order. It's **immutable** (unchangeable). You write it using parentheses: `(10, 20)`

**In short:**

*   **List:** Changeable, like a shopping list.
*   **Tuple:** Unchangeable, like coordinates.

**Why use tuples?**

*   **Safety:**  Guarantees data won't be accidentally modified.
*   **Speed:**  Tuples are generally a bit faster than lists because Python knows they won't change.
*   **Keys in Dictionaries:**  Tuples can be used as keys in dictionaries, while lists cannot (because dictionary keys must be immutable).


In [None]:
!pip install langchain_experimental


In [2]:
from langchain.agents import Tool
from langchain_experimental.tools.python.tool import PythonREPLTool
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import SerpAPIWrapper
from langchain_community.utilities.wikipedia import WikipediaAPIWrapper
import os
load_dotenv()

# Initialize the Wikipedia tool
wiki = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

# Initialize the Python REPL tool
python_repl = PythonREPLTool()

# Initialize the SERP API tool
search = SerpAPIWrapper(serpapi_api_key=os.getenv("SERPAPI_API_KEY"))

# Bundle the tools
tools = [
    Tool(
        name="Python_REPL",
        func=python_repl.run,
        description="Useful for executing Python code for tasks such as math, logic, and plotting.",
    ),
    Tool(
        name="Wikipedia",
        func=wiki.run,
        description="Useful for retrieving factual and historical information from Wikipedia.",
    ),
    Tool(
        name="Web_Search",
        func=search.run,
        description="Useful for answering current event or factual queries using the SERP API.",
    ),
]


In [None]:
!pip install pypdf

In [3]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone, ServerlessSpec
import uuid
import os

def create_pinecone_retriever_from_pdf(pdf_path: str, index_name="doc-chatbot") -> tuple:
    """
    Ingest a PDF file, upload to Pinecone under a unique namespace, and return a retriever + namespace.
    """
    # Init Pinecone
    pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

    # Load and split PDF
    loader = PyPDFLoader(pdf_path)
    docs = loader.load()
    
    splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    chunks = splitter.split_documents(docs)

    # Embedding model
    embed_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

    # Unique namespace
    namespace = f"session-{uuid.uuid4()}"

    # Upload to Pinecone
    vectorstore = PineconeVectorStore.from_documents(
        documents=chunks,
        embedding=embed_model,
        index_name=index_name,
        namespace=namespace
    )

    retriever = vectorstore.as_retriever()
    return retriever, namespace


  from tqdm.autonotebook import tqdm


In [4]:
pdf_path = "/Users/praddy/Documents/CODE/learning_langchain_folder/chatbot-docqa/data/attention.pdf"  # You’ll dynamically pass this later via Streamlit
retriever, namespace = create_pinecone_retriever_from_pdf(pdf_path)

# Wrap as LangChain tool
rag_tool = Tool(
    name="Document_Retrieval",
    func=lambda q: retriever.get_relevant_documents(q),
    description=(
        "Use this tool to answer any questions related to the user's uploaded PDF document. "
        "It contains detailed information the user may reference or ask questions about."
    )
)


tools.append(rag_tool)


  embed_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


In [8]:
query = "What is the main idea discussed in the uploaded document?"
docs = retriever.get_relevant_documents(query)

# Then send that as context to the LLM
from langchain.schema import HumanMessage

response = llm.invoke([
    HumanMessage(content=f"Based on the following context, answer the question:\n\n{docs}\n\nQuestion: {query}")
])
print(response.content)


  docs = retriever.get_relevant_documents(query)


Based on the provided documents, the main idea discussed is the **attention mechanism**, specifically within the context of the Transformer model architecture. The documents cover:

*   The Transformer model architecture using stacked self-attention.
*   The encoder and decoder stacks within the Transformer, including multi-head self-attention.
*   Visualizations of the attention mechanism, showing how it follows long-distance dependencies and is involved in anaphora resolution.


In [12]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)


In [15]:
from langchain.agents import initialize_agent, AgentType

# Initialize the agent
agent = initialize_agent(
    tools=tools,                     # All your tools, including RAG
    llm=llm,                         # Gemini Flash
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,  # Most compatible agent type
    memory=memory,
    verbose=True,
    handle_parsing_errors=True
)


In [16]:
while True:
    query = input("You: ")
    if query.lower() in {"exit", "quit"}:
        break
    response = agent.invoke(query)
    print("Bot:", response["output"] if isinstance(response, dict) else response)




[1m> Entering new AgentExecutor chain...[0m




[32;1m[1;3mI should start by using Wikipedia to get a general overview of the attention mechanism in LLMs.
Action: Wikipedia
Action Input: attention mechanism in large language models[0m
Observation: [33;1m[1;3mPage: Large language model
Summary: A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.
The largest and most capable LLMs are generative pretrained transformers (GPTs). Modern models can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained in.

Page: Transformer (deep learning architecture)
Summary: The transformer is a deep learning architecture that was deve



[32;1m[1;3mThe attention mechanism is a key component of the transformer architecture, which is widely used in large language models (LLMs). The transformer, introduced in the "Attention Is All You Need" paper, relies on multi-head attention to contextualize tokens within a context window, amplifying important tokens and diminishing less important ones. This architecture has advantages over recurrent neural networks (RNNs) like LSTMs, requiring less training time. The transformer architecture has become foundational in modern AI and has been applied to various tasks beyond machine translation, including natural language processing, computer vision, and more.
Final Answer: The attention mechanism is a key component of the transformer architecture, which is widely used in large language models (LLMs). The transformer, introduced in the "Attention Is All You Need" paper, relies on multi-head attention to contextualize tokens within a context window, amplifying important tokens and dimin



[32;1m[1;3mI need to use the Document_Retrieval tool to find information about the mechanism of attention from the uploaded document.
Action: Document_Retrieval
Action Input: mechanism of attention[0m
Observation: [36;1m[1;3m[Document(id='35fff3ea-b2b9-4ef6-a6d1-20e4633301d2', metadata={'author': '', 'creationdate': '2024-04-10T21:11:43+00:00', 'creator': 'LaTeX with hyperref', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'page': 12.0, 'page_label': '13', 'producer': 'pdfTeX-1.40.25', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'source': '/Users/praddy/Documents/CODE/learning_langchain_folder/chatbot-docqa/data/attention.pdf', 'subject': '', 'title': '', 'total_pages': 15.0, 'trapped': '/False'}, page_content='the\nregistration\nor\nvoting\nprocess\nmore\ndifficult\n.\n<EOS>\n<pad>\n<pad>\n<pad>\n<pad>\n<pad>\n<pad>\nFigure 3: An example of the attention mechanism following long-distance dependencies in th



[32;1m[1;3mThe document provides insights into the attention mechanism, particularly within the context of the Transformer model. Here's a breakdown:

*   **Encoder-Decoder Attention:** The Transformer uses multi-head attention in "encoder-decoder attention" layers. In this setup, queries originate from the previous decoder layer, while memory keys and values come from the encoder's output. This allows each position in the decoder to attend to all positions in the input sequence, mimicking typical encoder-decoder attention mechanisms.

*   **Self-Attention for Interpretability:** The document suggests that self-attention can lead to more interpretable models. By inspecting attention distributions, the authors found that individual attention heads learn to perform different tasks, and many exhibit behavior related to the syntactic and semantic structure of sentences.

*   **Examples of Attention Heads:** The document includes examples of attention heads involved in tasks like followin



[32;1m[1;3mThe document describes the following about the mechanism of attention:

*   **Self-Attention Definition:** Self-attention, also known as intra-attention, is an attention mechanism that relates different positions within a single sequence. This allows the model to compute a representation of the sequence by considering the relationships between its elements.
*   **Applications of Self-Attention:** Self-attention has been successfully applied in various tasks, including reading comprehension, abstractive summarization, textual entailment, and learning task-independent sentence representations.
*   **Interpretability:** Self-attention can yield more interpretable models. Individual attention heads learn to perform different tasks, and many exhibit behavior related to the syntactic and semantic structure of sentences.
*   **Encoder-Decoder Attention:** In "encoder-decoder attention" layers, queries come from the previous decoder layer, and the memory keys and values come from 



[32;1m[1;3mI need to find out who captained the Indian cricket team in the most recent Champions Trophy tournament. I can use a web search to find this information.
Action: Web_Search
Action Input: "who captained India in latest champions trophy"[0m
Observation: [38;5;200m[1;3mRohit Sharma[0m
Thought:



[32;1m[1;3mI now know the final answer.

Final Answer: Rohit Sharma[0m

[1m> Finished chain.[0m
Bot: Rohit Sharma
