# 📂 Section 0: Setup & Environment

In [None]:
!pip install langchain langchain-community google-generativeai faiss-cpu unstructured pypdf langchain-google-genai

Collecting langchain-google-genai
  Downloading langchain_google_genai-2.1.4-py3-none-any.whl.metadata (5.2 kB)
INFO: pip is looking at multiple versions of langchain-google-genai to determine which version is compatible with other requirements. This could take a while.
  Downloading langchain_google_genai-2.1.3-py3-none-any.whl.metadata (4.7 kB)
  Downloading langchain_google_genai-2.1.2-py3-none-any.whl.metadata (4.7 kB)
  Downloading langchain_google_genai-2.1.1-py3-none-any.whl.metadata (4.7 kB)
  Downloading langchain_google_genai-2.1.0-py3-none-any.whl.metadata (3.6 kB)
  Downloading langchain_google_genai-2.0.11-py3-none-any.whl.metadata (3.6 kB)
  Downloading langchain_google_genai-2.0.10-py3-none-any.whl.metadata (3.6 kB)
Downloading langchain_google_genai-2.0.10-py3-none-any.whl (41 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: langchain-google-genai
Successfully i


# 📂 Section 1: Understanding LangChain & LLM Application Architecture

## What is LangChain?
LangChain is a modular framework to build applications using Large Language Models (LLMs). It simplifies tasks like prompt management, chaining multiple actions, using agents, integrating tools, and managing memory.

## Why LangChain?
- Handles orchestration of tasks like retrieval, generation, and reasoning.
- Pluggable architecture to work with various LLMs, vector stores, APIs, and tools.

### Example: Simple Gemini LLM Call


In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", google_api_key="AIzaSyAfJRjcAoAqp-spWm0mJh5L4lHQRht1JNg")
llm.invoke("Explain LangChain in simple terms.")


AIMessage(content='Imagine you have a super smart AI model, like ChatGPT, that can understand and generate text. But on its own, it\'s a bit like a brain without hands. It knows a lot, but it can\'t easily interact with the real world or use other tools.\n\n**LangChain is like building those hands (and other body parts!) for that AI brain.** It\'s a framework that lets you connect a large language model (LLM) to:\n\n*   **External data sources:**  Think databases, websites, PDFs, etc.  This gives the LLM access to up-to-date information or specific knowledge it wasn\'t trained on.\n*   **Tools and utilities:**  Like a calculator, a search engine, or even an email sending program.  This allows the LLM to perform tasks beyond just generating text.\n*   **Chains of operations:**  Allows you to link together multiple LLM calls and tools to create more complex and sophisticated workflows.\n\n**In short, LangChain helps you build applications that use LLMs to do more than just chat.  It help


# 📂 Section 2: Components of LangChain

| Component | Purpose | Example |
|-----------|---------|---------|
| LLM Wrappers | Interface to LLMs | Gemini, Cohere |
| PromptTemplates | Manage dynamic prompts | Fill variables |
| Chains | Sequential actions | Prompt → LLM → Output |
| Memory | Store conversations | Chatbots |
| Tools | External actions | Search, Calculator |
| Agents | LLMs making decisions | Multi-tool reasoning |

### Example: Using a Chain with Gemini


In [None]:
from langchain_core.prompts import PromptTemplate
# Try importing LLMChain from the older location
# from langchain_core.chains import LLMChain
from langchain.chains import LLMChain

prompt = PromptTemplate.from_template("Explain {concept} in easy words.")
chain = LLMChain(llm=llm, prompt=prompt)

chain.invoke({"concept": "Embeddings"})

  chain = LLMChain(llm=llm, prompt=prompt)


{'concept': 'Embeddings',
 'text': 'Imagine you have a bunch of words, like "cat", "dog", "happy", "sad", "king", "queen". A computer doesn\'t understand these words like we do. It just sees them as symbols.\n\n**Embeddings are a way to give these words (or other things like images or even sentences) a numerical representation that captures their meaning and relationships.**\n\nThink of it like this:\n\n* **Turning words into coordinates on a map.**\n* **Words with similar meanings are placed closer together on the map.**\n\nSo, "cat" and "dog" would be closer together on the map because they are both animals. "Happy" and "sad" would be further apart because they are opposites. "King" and "queen" might be closer together because they are both related to royalty.\n\n**These coordinates are called "embeddings".** They are just lists of numbers (like [0.2, 0.8, -0.5]).\n\n**Why are embeddings useful?**\n\n* **Computers can now "understand" the relationships between words.**  They can do t


# 📂 Section 3: Embeddings & Semantic Search

## What are Embeddings?
Embeddings convert text into high-dimensional vectors capturing semantic meaning. Similar meanings result in vectors close to each other in space.

## Semantic Search vs Traditional Search
- **Semantic Search**: Understands meaning and context.
- **Traditional Search**: Relies on exact keyword matches.

### Example: Gemini Embeddings with FAISS Search


In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores import FAISS

texts = ["LangChain builds LLM apps", "Embeddings map text to vectors","Modi is PM of India"]
# Add the 'model' argument with a suitable embedding model name
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key="AIzaSyAfJRjcAoAqp-spWm0mJh5L4lHQRht1JNg")

vectorstore = FAISS.from_texts(texts, embeddings)
vectorstore.similarity_search("Who is PM of India?", k=1)

[Document(id='d1074dcd-a499-4200-b943-74cc5eb58401', metadata={}, page_content='Modi is PM of India')]


# 📂 Section 4: Loading Documents (PDF, CSV, Word)

LangChain supports multiple document loaders to ingest data from various sources.

## Supported Loaders
- PDFs (PyPDFLoader, UnstructuredPDFLoader)
- Word Docs (UnstructuredWordDocumentLoader)
- Excel/CSV (CSVLoader, UnstructuredExcelLoader)

### Example: Loading a PDF


In [None]:

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/Python_Fundamentals.ipynb - Colab.pdf")
documents = loader.load()

# View first document chunk
documents[0]


Document(metadata={'producer': 'Skia/PDF m136', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36', 'creationdate': '2025-05-12T15:15:12+00:00', 'title': 'Python_Fundamentals.ipynb - Colab', 'moddate': '2025-05-12T15:15:12+00:00', 'source': '/content/Python_Fundamentals.ipynb - Colab.pdf', 'total_pages': 30, 'page': 0, 'page_label': '1'}, page_content="Python is a high-level, general-purpose programming language known for its:\nReadability: Syntax resembles plain English, making it beginner-friendly.\nVersatility: Applicable to various domains like web development, data science, machine\nlearning, and automation.\nLarge Standard Library: Rich collection of built-in modules and functions for diverse tasks.\nOpen-Source Community: Extensive support and contributions from a vast developer\ncommunity.\nKey Features of Python:\nInterpreted Language: Code is executed line by line without prior compilation, offering\nfa


# 📂 Section 5: Chunking & Storing Embeddings

## Why Chunking?
- LLMs have token limits.
- Chunking breaks large documents into manageable parts.

### Example: Chunking Documents


In [None]:

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = text_splitter.split_documents(documents)

# View chunked document
chunks[0]


Document(metadata={'producer': 'Skia/PDF m136', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36', 'creationdate': '2025-05-12T15:15:12+00:00', 'title': 'Python_Fundamentals.ipynb - Colab', 'moddate': '2025-05-12T15:15:12+00:00', 'source': '/content/Python_Fundamentals.ipynb - Colab.pdf', 'total_pages': 30, 'page': 0, 'page_label': '1'}, page_content='Python is a high-level, general-purpose programming language known for its:\nReadability: Syntax resembles plain English, making it beginner-friendly.\nVersatility: Applicable to various domains like web development, data science, machine\nlearning, and automation.\nLarge Standard Library: Rich collection of built-in modules and functions for diverse tasks.\nOpen-Source Community: Extensive support and contributions from a vast developer\ncommunity.\nKey Features of Python:')


### Storing Chunks as Embeddings in FAISS


In [None]:

vectorstore = FAISS.from_documents(chunks, embeddings)



# 📂 Section 7: Memory in LangChain

## What is Memory?
Memory stores previous interactions to enable context-aware conversations.

## Types of Memory
- ConversationBufferMemory
- ConversationSummaryMemory
- ConversationBufferWindowMemory

### Example: Using ConversationBufferMemory


In [None]:
from langchain.memory import ConversationBufferMemory
# Changed import path for ConversationChain
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()
conversation = ConversationChain(llm=llm, memory=memory)

conversation.invoke({"input": "Hello, who are you?"})
conversation.invoke({"input": "What did I just ask you?"})

  memory = ConversationBufferMemory()
  conversation = ConversationChain(llm=llm, memory=memory)


{'input': 'What did I just ask you?',
 'history': "Human: Hello, who are you?\nAI: Hello! I am a large language model, trained by Google. I'm designed to be informative and comprehensive. I can communicate and generate human-like text in response to a wide range of prompts and questions. For example, I can provide summaries of factual topics, create stories, and translate languages.\n\nI'm still under development, which means I'm constantly learning and being improved. My knowledge is based on the massive dataset of text and code that I was trained on. I can't access information in real time, so my knowledge is limited to what was available in that dataset up to a certain point in time. I can't tell you exactly *when* that point is, as that's proprietary information, but I can say it's not happening *right now* - I'm not actively browsing the web or anything like that! I'm just using the information I've already learned.\n\nIs there anything specific you'd like to know or anything I ca


# 📂 Section 8: Tools Integration & Real-time Search

LangChain Agents can use external tools (e.g., Search, Calculator) to ground LLM responses.

### Example: Adding a Search Tool (DuckDuckGo)


In [None]:
!pip install duckduckgo-search

Collecting duckduckgo-search
  Downloading duckduckgo_search-8.0.2-py3-none-any.whl.metadata (16 kB)
Collecting primp>=0.15.0 (from duckduckgo-search)
  Downloading primp-0.15.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Downloading duckduckgo_search-8.0.2-py3-none-any.whl (18 kB)
Downloading primp-0.15.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m24.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: primp, duckduckgo-search
Successfully installed duckduckgo-search-8.0.2 primp-0.15.0


In [None]:

from langchain.tools.ddg_search import DuckDuckGoSearchRun
from langchain.agents import initialize_agent, AgentType

search = DuckDuckGoSearchRun()
tools = [search]

agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)

agent.invoke("What is Operation Sindoor?")


{'input': 'What is Operation Sindoor?',
 'output': 'Operation Sindoor was a military operation launched by India against Pakistan in May 2025, involving missile strikes in response to a terrorist attack in Pahalgam.'}


# 📂 Section 9: Building a RAG Pipeline with Gemini

### Steps:
1. Load documents
2. Chunk documents
3. Embed & store in vector DB
4. Retrieve relevant chunks
5. Combine with LLM to answer queries

### Example: Simple RAG Flow


In [None]:

retriever = vectorstore.as_retriever()

from langchain.chains import RetrievalQA

rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
rag_chain.invoke({"query": "What are the features of Python?"})


{'query': 'What are the features of Python?',
 'result': "Here's a summary of Python's key features:\n\n*   **Readability:** Python's syntax is similar to plain English, making it easy to learn and understand.\n*   **Versatility:** It can be used in many different fields like web development, data science, machine learning, and automation.\n*   **Interpreted Language:** Python code is executed line by line, allowing for faster development.\n*   **Dynamic Typing:** You don't need to declare variable types explicitly.\n*   **Object-Oriented Programming (OOP):** Supports concepts like classes, objects, inheritance, and polymorphism.\n*   **Large Standard Library:** It has a rich collection of built-in modules and functions.\n*   **Strong Libraries and Frameworks:** Many third-party libraries and frameworks extend Python's capabilities.\n*   **Cross-Platform:** Python code can run on different operating systems without major changes.\n*   **Open-Source Community:** It has extensive support


# 📂 Section 10: Capstone Project — Build a RAG Chatbot

### Goal:
Build a chatbot that can answer questions from uploaded PDF/CSV/Word files using Gemini LLM.

### Steps:
1. Upload a document (PDF/CSV/Word).
2. Load, Chunk & Embed it.
3. Store in FAISS index.
4. Use RetrievalQA Chain for querying.

### Code Template (Reusable)


In [None]:

# Load your document (example: PDF)
loader = PyPDFLoader("/content/Python_Fundamentals.ipynb - Colab.pdf")
documents = loader.load()

# Chunking
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = text_splitter.split_documents(documents)

# Embedding & Storing
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever()

# Building RAG Chatbot
rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
rag_chain.invoke({"query": "How for loop works.?"})


{'query': 'How for loop works.?',
 'result': 'The for loop iterates over a sequence of elements (list, tuple, string). The variable element takes on the value of each item in the sequence during each iteration.'}