LangChain is a framework for developing applications powered by language models.

- GitHub: https://github.com/hwchase17/langchain
- Docs: https://python.langchain.com/v0.2/docs/introduction/

### Overview:
- Installation
- LLMs
- Prompt Templates
- Chains
- Agents and Tools
- Memory
- Document Loaders
- Indexes
- Multi-dataframe-agents


#**01: Install All the Required Packages**

In [None]:
!pip install langchain langchain_community
#For API Calls
!pip install huggingface_hub
!pip install transformers
!pip install accelerate
!pip install  bitsandbytes

Collecting bitsandbytes
  Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl.metadata (11 kB)
Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl (61.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.3/61.3 MB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.47.0


#**02: Import All the Required Libraries**

In [None]:
from langchain import PromptTemplate, HuggingFaceHub, LLMChain
import os

#**03: Setting the Environment**

In [None]:
os.environ["add_huggingface_token"] = "..."

#**04: Approach 1:  Access Models Hosted on Hugging Face Through API**

#**Text2Text Generation Models | Seq2Seq Models | Encoder-Decoder Models**

In [None]:
prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name for a company that makes {product}"
)

In [None]:
from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline

# Load the model locally
pipe = pipeline("text2text-generation", model="google/flan-t5-large")

# Wrap in LangChain
llm = HuggingFacePipeline(pipeline=pipe)

# You can then use this 'llm' object in your LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

Device set to use cpu


In [None]:
chain.run("colorful socks")

'taylor swift'

In [None]:
prompt = PromptTemplate(
    input_variables=["name"],
    template="Can you tell me about famous footballer {name}"
)

In [None]:
from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline

# Load the model locally
pipe = pipeline("text2text-generation", model="google/flan-t5-large")

# Wrap in LangChain
llm = HuggingFacePipeline(pipeline=pipe)

# You can then use this 'llm' object in your LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

Device set to use cpu


In [None]:
chain.run("Messi")

'Messi is a footballer from Argentina.'

In [None]:
prompt = PromptTemplate(
    input_variables=["cusine"],
    template="Can you tell me food items for a  {cusine} restuarant"
)

In [None]:
from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline

# Load the model locally
pipe = pipeline("text2text-generation", model="google/flan-t5-large")

# Wrap in LangChain
llm = HuggingFacePipeline(pipeline=pipe)

# You can then use this 'llm' object in your LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

Device set to use cpu


In [None]:
chain.run("indian")

'Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Vegetables, Ve'

#**Apprach 01: Text Generation Models | Decoder Only Models**

In [None]:
prompt = PromptTemplate(
    input_variables=["name"],
    template="Can you tell me about famous footballer {name}"
)

In [None]:
from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline

# Load the model locally
pipe = pipeline("text2text-generation", model="google/flan-t5-large")

# Wrap in LangChain
llm = HuggingFacePipeline(pipeline=pipe)

# You can then use this 'llm' object in your LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

Device set to use cpu


In [None]:
chain.run("Messi")

'Messi is a footballer from Argentina.'

#**05: Approach 02: Download Model Locally (Create Pipelines)**

#**Import All the Required Libraries**

In [None]:
from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

In [None]:
model_id = 'google/flan-t5-large'

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

In [None]:
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, device_map='auto')

In [None]:
pipeline = pipeline("text2text-generation", model=model, tokenizer=tokenizer, max_length=128)

Device set to use cpu


In [None]:
local_llm = HuggingFacePipeline(pipeline=pipeline)

  local_llm = HuggingFacePipeline(pipeline=pipeline)


In [None]:
prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name for a company that makes {product}"
)

In [None]:
chain = LLMChain(llm=local_llm,prompt = prompt)

In [None]:
chain.run("colorful socks")

'taylor swift'

In [None]:
prompt = PromptTemplate(
    input_variables=["name"],
    template="Can you tell me about famous footballer {name}"
)

In [None]:
chain = LLMChain(llm=local_llm,prompt = prompt)

In [None]:
chain.run("Messi")

'Messi is a footballer from Argentina.'

#**01: Installation**

In [None]:
!pip install langchain langchain_community

Collecting langchain_community
  Downloading langchain_community-0.3.29-py3-none-any.whl.metadata (2.9 kB)
Collecting requests<3,>=2 (from langchain)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7,>=0.6.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.6.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.6.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.6.7->langchain_community)
  Downloading mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB)
Downloading langchain_community-0.3.29-py3-none-any.whl (2.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.

#**02: Setup the Environment**

In [None]:
import os

In [None]:
os.environ["add_huggingface_token"] = "..."

##**03: Large Language Models**

The basic building block of LangChain is a Large Language Model which takes text as input and generates more text.

Suppose we want to generate a company name based on the company description, so we will first initialize an OpenAI wrapper. In this case, since we want the output to be more random, we will intialize our model with high temprature.

The temperature parameter adjusts the randomness of the output. Higher values like 0.7 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

temperature value--> how creative we want our model to be

- 0 --> temperature it means model is  very safe it is not taking any bets.

- 1 --> it will take risk it might generate wrong output but it is very creative

A generic interface for all LLMs. See all LLM providers: https://python.langchain.com/en/latest/modules/models/llms/integrations.html

In [None]:
!pip install langchain_huggingface

Collecting langchain_huggingface
  Downloading langchain_huggingface-0.3.1-py3-none-any.whl.metadata (996 bytes)
Downloading langchain_huggingface-0.3.1-py3-none-any.whl (27 kB)
Installing collected packages: langchain_huggingface
Successfully installed langchain_huggingface-0.3.1


In [None]:
!pip install huggingface_hub



In [None]:
!pip install langchain_huggingface



In [None]:
from langchain import HuggingFaceHub

In [None]:
from huggingface_hub import InferenceClient

# Your HF token
client = InferenceClient(token="add_huggingface_token"")

# Run translation (English → German)
result = client.translation(
    model="facebook/mbart-large-50-many-to-many-mmt",
    text="How old are you?",
    src_lang="en_XX",
    tgt_lang="hi_IN"
)

print(result)


TranslationOutput(translation_text='आप कितने वर्ष के हैं?')


In [None]:
from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline

# Load the model locally
pipe = pipeline("text2text-generation", model="google/flan-t5-base")

# Wrap in LangChain
llm = HuggingFacePipeline(pipeline=pipe)

# Run inference
name = llm.invoke("I want to open a restaurant for Indian food. Suggest a fancy name for this.")
print(name)


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu


Indian restaurant


##**04: Prompt Templates**

Currently in the above applications we are writing an entire prompt, if you are creating a user directed application then this is not an ideal case.

LangChain faciliates prompt management and optimization.

Normally when you use an LLM in an application, you are not sending user input directly to the LLM. Instead, you need to take the user input and construct a prompt, and only then send that to the LLM.

In [None]:
from langchain.prompts import PromptTemplate

prompt_template_name = PromptTemplate(
    input_variables =['cuisine'],
    template = "I want to open a restaurant for {cuisine} food. Suggest a fency name for this."
)
p = prompt_template_name.format(cuisine="indian")
print(p)

I want to open a restaurant for indian food. Suggest a fency name for this.


In [None]:
from langchain.prompts import PromptTemplate
prompt = PromptTemplate.from_template("What is a good name for a company that makes {product}")
prompt.format(product="colorful socks")

'What is a good name for a company that makes colorful socks'

##**05: Chains**

Combine LLMs and Prompts in multi-step workflows:

Using Chains, we can link together the model, the PromptTemplate, and other Chains.

The simplest and most common type of Chain is LLMChain. It first passes the input to the PromptTemplate and then to the Large Language Model.

LLMChain is responsible for executing the PromptTemplate. For every PromptTemplate, we typically define a separate LLMChain.

In [None]:
# Hugging Face model connector
from langchain_huggingface import HuggingFaceEndpoint

# For creating dynamic prompt templates
from langchain.prompts import PromptTemplate

# For linking LLM + Prompt
from langchain.chains import LLMChain


In [None]:
# Choose a Hugging Face model (Flan-T5 is good for text-to-text tasks)
llm = HuggingFaceEndpoint(
    repo_id="google/flan-t5-base",
    task="text2text-generation",
    temperature=0.7,                  # Controls creativity (0 = factual, 1 = creative)
    max_new_tokens=64                 # Limit output length
)

In [None]:
# Template for generating company names
prompt_company = PromptTemplate.from_template(
    "What is a good name for a company that makes {product}?"
)

# Example check
print(prompt_company.format(product="colorful socks"))


What is a good name for a company that makes colorful socks?


In [None]:
from transformers import pipeline

# Load FLAN-T5 locally (runs on your CPU/GPU, no API needed)
generator = pipeline("text2text-generation", model="google/flan-t5-base")

result = generator(
    "What is a good name for a company that makes colorful socks?",
    max_new_tokens=64
)

print("Company name suggestion:", result[0]["generated_text"])


Device set to use cpu


Company name suggestion: mcdonalds


In [None]:
# Template for generating restaurant names
prompt_restaurant = PromptTemplate(
    input_variables=['cuisine'],
    template="I want to open a restaurant for {cuisine} food. Suggest a fancy name for this."
)

# Example check
print(prompt_restaurant.format(cuisine="Mexican"))


I want to open a restaurant for Mexican food. Suggest a fancy name for this.


In [None]:
from transformers import pipeline
from langchain.prompts import PromptTemplate

# Load Flan-T5 locally
generator = pipeline("text2text-generation", model="google/flan-t5-large", device=0)  # use GPU if available

# Define prompt
prompt_restaurant = PromptTemplate.from_template(
    "I want to open a restaurant for {cuisine} food. Suggest a fancy and creative name for this."
)

# Format prompt
final_prompt = prompt_restaurant.format(cuisine="Mexican")

# Run model with sampling
result = generator(final_prompt, max_new_tokens=64, do_sample=True, top_p=0.9, top_k=50)
print("Restaurant name suggestion:", result[0]["generated_text"])


config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu


Restaurant name suggestion: La Taqueria de Mexicanos


#Can we combine Multiple PromptTemplates, We will try to combine Multiple PromptTemplates

We can chain multiple PromptTemplates such that the output of one prompt becomes the input for the next. This allows building multi-step workflows with LLMs, where each step refines, expands, or transforms the data.

**The output from the first PromptTemplate is passed to the next PromptTemplate as input.**

#**To combine the Chain and  to set a sequence for that we use SimpleSequentialChain**

In [None]:
# Step 1: Imports
from transformers import pipeline
from langchain.prompts import PromptTemplate

# Step 2: Load local Flan-T5 model (text2text-generation)
generator = pipeline("text2text-generation", model="google/flan-t5-large")  # use large for better outputs

# Step 3: Define first PromptTemplate (restaurant name)
prompt_template_name = PromptTemplate(
    input_variables=['cuisine'],
    template=(
        "You are a creative branding expert. "
        "Suggest a unique and fancy name for a modern restaurant that serves {cuisine} cuisine. "
        "Do not repeat the word 'restaurant' unnecessarily."
    )
)

# Step 4: Define second PromptTemplate (menu items)
prompt_template_items = PromptTemplate(
    input_variables=['restaurant_name'],
    template=(
        "You are a professional chef. "
        "Provide 5–10 unique and appetizing menu items for the restaurant named {restaurant_name}. "
        "List them as short, distinct items separated by commas."
    )
)

# Step 5: Improved LLM function to prevent repetition
def run_llm(prompt_text):
    result = generator(
        prompt_text,
        max_new_tokens=256,      # allow longer outputs
        do_sample=True,          # enable sampling
        temperature=0.8,         # creativity
        top_p=0.95,              # nucleus sampling
        top_k=50,                # top-k sampling
        repetition_penalty=2.0,  # reduce repetition
        no_repeat_ngram_size=3    # prevent 3-gram repeats
    )
    return result[0]["generated_text"]

# Step 6: Refined sequential chain
def sequential_chain(input_cuisine):
    # Generate restaurant name
    restaurant_name = run_llm(prompt_template_name.format(cuisine=input_cuisine))

    # Generate menu items (comma-separated)
    menu_items = run_llm(
        prompt_template_items.format(restaurant_name=restaurant_name) +
        " List them as short items separated by commas."
    )

    return f"Restaurant Name: {restaurant_name}\nSuggested Menu Items: {menu_items}"

# Run the chain
content = sequential_chain("Indian")
print(content)


Device set to use cpu


Restaurant Name: samantha
Suggested Menu Items: bbq pork chops


In [None]:
# Sequential execution using local Hugging Face model
def sequential_chain(input_cuisine):
    # Step 1: Generate restaurant name
    restaurant_name = run_llm(prompt_template_name.format(cuisine=input_cuisine))

    # Step 2: Generate menu items
    menu_items = run_llm(prompt_template_items.format(restaurant_name=restaurant_name))

    return f"Restaurant Name: {restaurant_name}\nSuggested Menu Items: {menu_items}"

# Run the chain
content = sequential_chain("Indian")
print(content)

Restaurant Name: bhaag
Suggested Menu Items: chicken tikka masala


**There is a issue with SimpleSequentialChain it only shows last input information**

##**To show the entire information i will use SequentialChain.**

##**Sequential Chain**

In [None]:
# Step 1: Imports
from transformers import pipeline
from langchain.prompts import PromptTemplate

# Step 2: Load local Flan-T5 model
generator = pipeline("text2text-generation", model="google/flan-t5-base")

# Step 3: Define PromptTemplates
prompt_template_name = PromptTemplate(
    input_variables=['cuisine'],
    template="I want to open a restaurant for {cuisine} food. Suggest a fancy name for this."
)

prompt_template_items = PromptTemplate(
    input_variables=['restaurant_name'],
    template="Suggest some menu items for {restaurant_name}."
)

# Step 4: LLM function
def run_llm(prompt_text):
    result = generator(
        prompt_text,
        max_new_tokens=64,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        top_k=50,
        repetition_penalty=2.0,
        no_repeat_ngram_size=3
    )
    return result[0]["generated_text"]

# Step 5: Sequential execution
def sequential_chain(input_cuisine):
    # Generate restaurant name
    restaurant_name = run_llm(prompt_template_name.format(cuisine=input_cuisine))

    # Generate menu items
    menu_items = run_llm(prompt_template_items.format(restaurant_name=restaurant_name))

    return {
        "restaurant_name": restaurant_name,
        "menu_items": menu_items
    }

# Step 6: Run chain
output = sequential_chain("Indian")
print(output)

Device set to use cpu


{'restaurant_name': 'Indian restaurant', 'menu_items': 'Indian food, Beef, Chicken, Vegetables'}


##**06. Agents and Tools**

Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done.


When used correctly agents can be extremely powerful. In order to load agents, you should understand the following concepts:

- Tool: A function that performs a specific duty. This can be things like: Google Search, Database lookup, Python REPL, other chains.
- LLM: The language model powering the agent.
- Agent: The agent to use.


Agent is a very powerful concept in LangChain

For example I have to travel from Dubai to Canada, I type this in ChatGPT

---> Give me  two flight options from Dubai to Canada on September 1, 2024 | ChatGPT will not be able to answer because has knowledge till
September 2021

ChatGPT plus has Expedia Plugin, if we enable this plugin it will go to Expedia Plugin and will try to pull information about Flights & it will show the information

SerpApi is a real-time API to access Google search results.

In [None]:
!pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11678 sha256=9de9afdd03ab7d598ddae0fbce8207f174ef5be34e63ceb6ea0c5ec5517a82ed
  Stored in directory: /root/.cache/pip/wheels/63/47/7c/a9688349aa74d228ce0a9023229c6c0ac52ca2a40fe87679b8
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [None]:
from transformers import pipeline

# Load local Flan-T5 model
generator = pipeline("text2text-generation", model="google/flan-t5-large")

# Your prompt
prompt = "Based on the following text, answer: What was the GDP of the US in 2024?\n\nText: The US economy ..."

# Generate answer
result = generator(prompt, max_new_tokens=128)
print(result[0]["generated_text"])


Device set to use cpu


$1.07 trillion


In [None]:
from transformers import pipeline

generator = pipeline("text2text-generation", model="google/flan-t5-large")

prompt = "Answer: What was the GDP of the US in 2024?"

result = generator(prompt, max_new_tokens=128)
print(result[0]["generated_text"])

Device set to use cpu


$13 trillion


##**07: Memory**

Chatbot application like ChatGPT, you will notice that it remember past information.

1. ConversationBufferMemory

- Purpose: Stores the entire conversation in memory as a single buffer.

- Behavior: Keeps all previous interactions so the model can access full context.

- Use case: Chatbots where you want the model to remember everything said so far.
---

2. ConversationChain

- Purpose: A simple chain combining a memory and an LLM.

- Behavior: Passes user input to the LLM and keeps context in memory.

- Use case: Single-step chat interface with memory support.
---

3. ConversationBufferWindowMemory

- Purpose: Stores only the last N interactions (sliding window memory).

- Behavior: Limits context length to prevent long-term memory overload.

- Use case: Chatbots with short-term memory, avoiding excessive context.

In [None]:
# Step 1: Imports
from transformers import pipeline
from langchain.prompts import PromptTemplate

# Step 2: Load a local Flan-T5 model (text2text-generation)
generator = pipeline("text2text-generation", model="google/flan-t5-base")

# Step 3: Define PromptTemplate
prompt_template_name = PromptTemplate(
    input_variables=['cuisine'],
    template="I want to open a restaurant for {cuisine} food. Suggest a fancy name for this."
)

# Step 4: Function to simulate LLMChain
def run_chain(cuisine_input):
    # Format the prompt
    prompt_text = prompt_template_name.format(cuisine=cuisine_input)
    # Generate output using the local Hugging Face model
    result = generator(prompt_text, max_new_tokens=64, do_sample=True, top_p=0.9, top_k=50, temperature=0.7)
    return result[0]["generated_text"]

# Step 5: Run the chain
name = run_chain("Mexican")
print("Restaurant name suggestion:", name)

Device set to use cpu


Restaurant name suggestion: Mexican restaurant


In [None]:
# Run the chain for Indian cuisine
name1 = run_chain("Indian")
print("Restaurant name suggestion:", name1)

Restaurant name suggestion: Indian restaurant


In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_huggingface import HuggingFacePipeline
from transformers import pipeline

# Step 1: Load a local Flan-T5 model using HuggingFacePipeline
# This avoids issues with the Hugging Face inference endpoint
pipe = pipeline("text2text-generation", model="google/flan-t5-base")
llm = HuggingFacePipeline(pipeline=pipe)


# Step 2: Initialize memory
memory = ConversationBufferMemory()

# Step 3: Create a ConversationChain
# The ConversationChain includes an LLM and a memory component
chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True # Set verbose to True to see the chain's internal steps, including memory
)

# Step 4: Run the chain with inputs and observe how memory is used
print("First interaction:")
response1 = chain.invoke({"input": "Hi, my name is Alice"})
print(response1)

print("\nSecond interaction:")
# The chain should now remember the name "Alice" from the previous interaction
response2 = chain.invoke({"input": "What is my name?"})
print(response2)

# You can also explicitly access the memory
print("\nMemory contents:")
print(chain.memory.load_memory_variables({}))

Device set to use cpu


First interaction:


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Alice
AI:[0m

[1m> Finished chain.[0m
{'input': 'Hi, my name is Alice', 'history': '', 'response': 'Alice: Hello, how can I help you?'}

Second interaction:


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Alice
AI: Alice: Hello, how can I help you?
Human: What is my name?
AI:[0m

[1m> Finis

In [None]:
from transformers import pipeline
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate

# Local Hugging Face LLM
generator = pipeline("text2text-generation", model="google/flan-t5-base")

# Simple function to simulate LLM call
def run_llm(prompt_text):
    result = generator(prompt_text, max_new_tokens=64)
    return result[0]["generated_text"]

# Memory
memory = ConversationBufferMemory(memory_key="history")

# Custom ConversationChain
class LocalConversationChain:
    def __init__(self, memory):
        self.memory = memory

    def run(self, user_input):
        # Append input to memory
        history = self.memory.load_memory_variables({})["history"]
        prompt_text = f"{history}\nUser: {user_input}\nAssistant:"
        response = run_llm(prompt_text)
        # Update memory
        self.memory.save_context({"input": user_input}, {"output": response})
        return response

conversation = LocalConversationChain(memory=memory)

# Chat
print(conversation.run("Hello! How are you?"))
print(conversation.run("Tell me a joke."))

Device set to use cpu


I'm fine.
I'm fine.


In [None]:
from transformers import pipeline
from langchain.memory import ConversationBufferMemory

# Local LLM
generator = pipeline("text2text-generation", model="google/flan-t5-base")

# Memory
memory = ConversationBufferMemory(memory_key="history")

# Custom ConversationChain
class LocalConversationChain:
    def __init__(self, memory):
        self.memory = memory

    def run(self, user_input):
        history = self.memory.load_memory_variables({})["history"]
        prompt = f"{history}\nUser: {user_input}\nAssistant:"
        response = generator(prompt, max_new_tokens=64)[0]["generated_text"]
        self.memory.save_context({"input": user_input}, {"output": response})
        return response

conversation = LocalConversationChain(memory=memory)

print(conversation.run("Hello! How are you?"))
print(conversation.run("Tell me a joke"))


Device set to use cpu


I'm fine.
I'm fine.


In [None]:
from transformers import pipeline

# Load local Flan-T5 model
generator = pipeline("text2text-generation", model="google/flan-t5-base")

# Simple memory for last 3 interactions
history = []

def run_conversation(user_input):
    # Append user input
    history.append(f"Human: {user_input}")

    # Build prompt with history
    prompt = "\n".join(history[-6:]) + "\nAI:"  # last 3 interactions (user+AI)

    # Generate response
    result = generator(prompt, max_new_tokens=64, do_sample=True, temperature=0.7, top_p=0.9, top_k=50)
    ai_response = result[0]["generated_text"].strip()

    # Append AI response to history
    history.append(f"AI: {ai_response}")

    return ai_response

# Example usage
print(run_conversation("What’s the weather today?"))
print(run_conversation("And tomorrow?"))


Device set to use cpu


Weather today is cloudy.
Weather tomorrow is cloudy.


#**08: Document Loaders**


In [None]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-6.0.0-py3-none-any.whl.metadata (7.1 kB)
Downloading pypdf-6.0.0-py3-none-any.whl (310 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/310.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.4/310.5 kB[0m [31m2.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.5/310.5 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-6.0.0


In [None]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/Attention_is_all_you_need.pdf")
pages = loader.load()

In [None]:
pages

[Document(metadata={'producer': 'pdfTeX-1.40.17', 'creator': 'LaTeX with hyperref package', 'creationdate': '2017-12-07T01:03:15+00:00', 'author': '', 'keywords': '', 'moddate': '2017-12-07T01:03:15+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) kpathsea version 6.2.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': '/content/Attention_is_all_you_need.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='Attention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser ∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurr

#**09: Indexes**

An Index organizes documents so that an LLM can retrieve relevant information efficiently. Commonly used in Retrieval-Augmented Generation (RAG) pipelines.
Examples: VectorStoreIndex, FAISS, Chroma, Weaviate, Milvus.

How Indexing Works:

- Load documents → e.g., PDFs, text files, or web pages.

- Split documents into chunks → smaller pieces for better retrieval.

- Embed chunks into vectors → using embeddings (e.g., OpenAI, Hugging Face).

- Store embeddings in a Vector Database → enables similarity search.

- Query index → retrieve top-k relevant chunks for LLM to answer questions.


Document Loader → Index → Retrieval → LLM query, so it’s ready for a RAG pipeline in one script.



In [None]:
!pip install chromadb



In [None]:
from transformers import pipeline
from langchain.prompts import PromptTemplate
from langchain.vectorstores import Chroma
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter

# --- Step 1: Load document ---
with open("example.txt", "r", encoding="utf-8") as f:
    text = f.read()

# --- Step 2: Split document into chunks (optional for short text) ---
splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = splitter.split_text(text)

# --- Step 3: Create embeddings and vector DB ---
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
db = Chroma.from_texts(texts, embedding=embeddings, persist_directory=None)

# --- Step 4: Retrieve most relevant chunk(s) ---
query = "Explain the main topic of this document"
results = db.similarity_search(query, k=1)  # k=1 for concise summary
top_text = results[0].page_content

# --- Step 5: Setup local Flan-T5 pipeline ---
generator = pipeline("text2text-generation", model="google/flan-t5-base")

# --- Step 6: Prepare prompt for summarization ---
prompt = PromptTemplate(
    input_variables=["content"],
    template="Summarize the main idea of the following text in one concise sentence:\n\n{content}"
)
formatted_prompt = prompt.format(content=top_text)

# --- Step 7: Generate summary ---
summary = generator(formatted_prompt, max_new_tokens=64)[0]["generated_text"]

print("Main topic of the document:")
print(summary)


Device set to use cpu


Main topic of the document:
Upload or read a document. Create embeddings for each chunk of the document.


In [None]:
!pip install -U bitsandbytes



In [None]:
!pip install -U bitsandbytes

