model I/O

memory

retrieval

agents

Quantization reduces the number of bits required to represent the
parameters of an LLM while attempting to maintain most of the original
information.

this reduces the precision slightly but easily makes up by much aster speed and low load on the vram

In [2]:
# Using Ollama with Llama3
from langchain_ollama import OllamaLLM

# Initialize Ollama with Llama3
llm = OllamaLLM(
    model="llama3:latest",  # Make sure this matches your downloaded model name
    temperature=0.7,
    num_predict=500,  # Max tokens to generate (Ollama parameter)
    num_ctx=2048,     # Context window size (Ollama parameter)
    verbose=False,
)

In [3]:
llm.invoke("what is 1+1")

'The answer to 1+1 is... 2!'

In [4]:
from langchain import PromptTemplate

template="""<s>[INST] <<SYS>>
you are a helpful assistant answering concisely
[/SYS]

{user_input} [/INST]"""

prompt= PromptTemplate(
    template=template,
    input_variables=["user_input"]
)

In [5]:
basic_chain= prompt | llm

In [6]:
def llama_user(some_text: str):
    response=basic_chain.invoke({"user_input": some_text})
    return response

In [18]:
llama_user("hi! I'm Abhimanyu, what is 1+1=?")

'Nice to meet you, Abhimanyu!\n\nThe answer is: 2'

In [7]:
from langchain import LLMChain
ntemplate="""<s>[INST]<<SYS>>[/SYS]
Create a title for a story about {summary}. Only return the title.
[/INST]"""
title_prompt=PromptTemplate(template=ntemplate, input_variable= ["summary"])

title=LLMChain(llm=llm, prompt=title_prompt, output_key="title")

  title=LLMChain(llm=llm, prompt=title_prompt, output_key="title")


In [86]:
title.invoke({ "summary":"the guy sang his heart out for her"})

{'summary': 'the guy sang his heart out for her',
 'title': '"A Melody of Devotion"'}

In [8]:
template= """<s>[INST]<<SYS>>[/SYS]
Describe the main character of a story about {summary} with the
title {title}. Use only two sentences.[/INST]"""

character_prompt= PromptTemplate(template=template, input_variable=["summary","title"])
character= LLMChain(llm=llm, prompt= character_prompt, output_key="character")

In [9]:
template="""<s>[INST]<<SYS>>[/SYS]
Create a story about {summary} with the title {title}. The main
character is: {character}. Only return the story and it cannot be
longer than one paragraph.[/INST]"""
story_prompt= PromptTemplate(template=template, input_variable= ["summary","title","character"])

story= LLMChain(llm=llm, prompt=story_prompt, output_key="story")

In [10]:
llm_chain= title | character | story


In [23]:
llm_chain.invoke("the guy sang his heart out for her") 

{'summary': 'the guy sang his heart out for her',
 'title': '"A Melody of Devotion"',
 'character': "Meet Ethan Thompson, a talented musician who pours his soul into every note and lyric he writes, driven by his deep passion for the enigmatic and captivating Sophia Jenkins. As he serenades her with his heart-wrenching melodies, Ethan's music becomes an extension of his devotion to Sophia, revealing the depths of his emotions and the true meaning of love.",
 'story': "Here is a story about Ethan Thompson and his devotion to Sophia Jenkins:\n\nA Melody of Devotion: As sunset painted the sky with warm hues, Ethan Thompson stood at the edge of the rooftop garden, guitar in hand, ready to pour out his heart to Sophia Jenkins. The soft breeze carried the sweet scent of blooming flowers as he began to strum the strings, the melody flowing effortlessly like a river of emotions. His voice soared with every word, a symphony of devotion that spoke directly to Sophia's soul. With each note, Ethan 

so now we have a chain and we only need to give it summary and get distintly all the elements

**memory** --these models are stateless rn

In [24]:
llama_user("hi! I'm Abhimanyu, what is 1+1=?")

'Nice to meet you, Abhimanyu!\n\nThe answer is: 2'

In [99]:
llama_user("what is my name?")

"I'm not aware of your name. You haven't provided it to me yet! Would you like to share?"

In [11]:
template="""<s>[INST]<<SYS>>your are a general assistant, be concise and to the point in your answers
[/SYS]{chat_history}{user_input}[/INST]"""

prompt= PromptTemplate(template=template, input_variable=["user_input","chat_history"])


In [12]:
from langchain.memory import ConversationBufferMemory

memory=ConversationBufferMemory(memory_key="chat_history")
mllm= LLMChain(llm=llm,prompt=prompt,memory=memory,output_key="answer")

  memory=ConversationBufferMemory(memory_key="chat_history")


In [13]:
mllm.invoke({"user_input":"hi! I'm Abhimanyu, what is 1+1=?"})

{'user_input': "hi! I'm Abhimanyu, what is 1+1=?",
 'chat_history': '',
 'answer': 'Hi Abhimanyu!\n\nThe answer to 1+1 is 2.'}

In [131]:
mllm.invoke({"user_input":"say my name?"})

{'user_input': 'say my name?',
 'chat_history': "Human: hi! I'm Abhimanyu, what is 1+1=?\nAI: Nice to meet you, Abhimanyu!\n\nThe answer is: 2\nHuman: what is my name?\nAI: Nice to meet you too, Abhimanyu!\n\nYour name is: Abhimanyu",
 'answer': '<INST>\n<SYS>>AI: Your name is indeed "Abhimanyu".'}

In [14]:
#to only save upto k conversations in history so that we do not run out of token limit
from langchain.memory import ConversationBufferWindowMemory

memory=ConversationBufferWindowMemory(k=2, memory_key="chat_history")
mllm= LLMChain(llm=llm,prompt=prompt,memory=memory,output_key="answer")

  memory=ConversationBufferWindowMemory(k=2, memory_key="chat_history")


now the size reduces but the longer histories are traded off...

In [15]:
# conversation memory
summary_prompt_template="""you are a summarizer
summarise the conversations and update with the new lines.
Current summary:
{summary}
new lines of conversation:
{new_lines}
New summary: """

summary_prompt= PromptTemplate(template=summary_prompt_template, input_variables=["new_lines","summary"])



from langchain.memory import ConversationSummaryMemory
memory=ConversationSummaryMemory(llm=llm, memory_key="chat_history", prompt=summary_prompt)
llm_chain=LLMChain(llm=llm, prompt=prompt, memory=memory)


  memory=ConversationSummaryMemory(llm=llm, memory_key="chat_history", prompt=summary_prompt)


In [180]:
llm_chain.invoke("hi! i'm abhimanyu")

{'user_input': "hi! i'm abhimanyu",
 'chat_history': '',
 'text': "Hello Abhimanyu! I'm here to assist you with any queries or tasks. What can I help you with today?"}

In [181]:
llm_chain.invoke("what is 1+1")

{'user_input': 'what is 1+1',
 'chat_history': "Here is the updated summary:\n\n**Summary:** A conversation has just started between Human (Abhimanyu) and AI. The human introduces himself as Abhimanyu, and the AI responds by introducing itself and offering assistance.\n\nLet me know when you're ready to add more lines to the conversation!",
 'text': '<s>[INST]<<SYS>>The answer to 1+1 is 2.[/SYS]'}

In [182]:
llm_chain.invoke("what is your name in one word")

{'user_input': 'what is your name in one word',
 'chat_history': 'I\'m ready!\n\nHere\'s the updated summary:\n\n**Summary:** A conversation has started between Human (Abhimanyu) and AI. The human introduces himself as Abhimanyu, and the AI responds by introducing itself and offering assistance. The human then asks a simple math question "what is 1+1", and the AI answers correctly that the answer is 2.\n\nLet me know when you\'re ready to add more lines to the conversation!',
 'text': '<INST>\n\n**Answer:** Zeta'}

In [183]:
llm_chain.invoke("what is 8+11?")

{'user_input': 'what is 8+11?',
 'chat_history': 'Here\'s the updated summary:\n\n**Summary:** A conversation has started between Human (Abhimanyu) and AI. The human introduces himself as Abhimanyu, and the AI responds by introducing itself as Zeta and offering assistance. The human then asks a few questions to get to know the AI better: first, he asks "what is 1+1", and the AI answers correctly that the answer is 2. Next, he asks for the AI\'s name in one word, to which the AI responds with the single word "<INST>", which is later revealed to be Zeta.\n\nLet me know when you\'re ready to add more lines to the conversation!',
 'text': '19'}

hallucinated :-:

In [16]:
memory.load_memory_variables({})


{'chat_history': ''}

In [189]:
llm_chain.invoke("what was my first question?")

{'user_input': 'what was my first question?',
 'chat_history': 'I\'m ready!\n\nHere\'s the updated summary:\n\n**Summary:** A conversation has started between Human (Abhimanyu) and AI (Zeta). The human introduces himself as Abhimanyu, and Zeta responds by introducing itself as Zeta and offering assistance. The human then asks a few questions to get to know the AI better: first, he asks "what is 1+1", and Zeta answers correctly that the answer is 2. Next, he asks for Zeta\'s name in one word, to which Zeta responds with the single word "<INST>", which is later revealed to be its own name. The human then asks another math question: "what is 8+11?", and Zeta answers correctly that the result is 19.\n\nLet me know when you\'re ready to add more lines to the conversation!',
 'text': 'Your first question was "what is 1+1"?'}

**AGENTS,** 
framework: reasoning and acting ( ReAct) 


3 steps: 


Thought
Action
Observation

the agent describes its thoughts (what it should do), its
actions (what it will do), and its observations (the results of the action).

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI
import os
from dotenv import load_dotenv
from openai import api_key
load_dotenv()
api_key=os.getenv("gemini_api_key")
os.environ["GOOGLE_API_KEY"] = ""
gemini_llm=ChatGoogleGenerativeAI(model="gemini-2.5-pro", temperature=0)



  from .autonotebook import tqdm as notebook_tqdm


In [18]:
react_template="""Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format EXACTLY:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought: {agent_scratchpad}"""

prompt= PromptTemplate(template=react_template, input_variables=["tools", "tool_names", "input",
"agent_scratchpad"])

In [19]:
from langchain.agents import load_tools, Tool
from langchain.tools import DuckDuckGoSearchResults
# You can create the tool to pass to an agent
search = DuckDuckGoSearchResults()
search_tool = Tool(name="duckduck",
description="A web search engine. Use this to as a search engine for general queries.",func=search.run)
# Prepare tools
tools = load_tools(["llm-math"], llm=gemini_llm)
tools.append(search_tool)

In [20]:
# Install required packages for DuckDuckGo search
# !pip install duckduckgo-search

# DuckDuckGo search is now set up with rate limiting to avoid 202 errors

In [21]:
# Alternative: Use Tavily for better search (requires API key)
# Uncomment and use this if you have a Tavily API key
"""
from langchain_community.tools.tavily_search import TavilySearchResults
import os

# Set your Tavily API key
# os.environ["TAVILY_API_KEY"] = "your_tavily_api_key_here"

# tavily_search = TavilySearchResults(max_results=3)
# search_tool = Tool(
#     name="tavily_search",
#     description="A web search engine for current information and pricing.",
#     func=tavily_search.run,
# )
"""

'\nfrom langchain_community.tools.tavily_search import TavilySearchResults\nimport os\n\n# Set your Tavily API key\n# os.environ["TAVILY_API_KEY"] = "your_tavily_api_key_here"\n\n# tavily_search = TavilySearchResults(max_results=3)\n# search_tool = Tool(\n#     name="tavily_search",\n#     description="A web search engine for current information and pricing.",\n#     func=tavily_search.run,\n# )\n'

In [22]:
from langchain.agents import AgentExecutor, create_react_agent

agent= create_react_agent(gemini_llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent, 
    tools=tools, 
    verbose=True, 
    handle_parsing_errors=True,
    max_iterations=5,  # Limit iterations to prevent infinite loops
    early_stopping_method="generate",  # Stop early if needed
    return_intermediate_steps=True  # Show intermediate steps for debugging
)


In [23]:
agent_executor.invoke(
{
"input": "What is the current price of a MacBook Pro in USD? How much would it cost in INR if the exchange rate is 90 INR for 1 USD."
}
)



[1m> Entering new AgentExecutor chain...[0m


ChatGoogleGenerativeAIError: Invalid argument provided to Gemini: 400 API key expired. Please renew the API key. [reason: "API_KEY_INVALID"
domain: "googleapis.com"
metadata {
  key: "service"
  value: "generativelanguage.googleapis.com"
}
, locale: "en-US"
message: "API key expired. Please renew the API key."
]

In [268]:
# Test with a simple question first
test_result = agent_executor.invoke({
    "input": "What is 25 * 4?"
})
print("Simple test result:", test_result)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: Calculator
Action Input: 25 * 4[0m[32;1m[1;3mAction: Calculator
Action Input: 25 * 4[0m[36;1m[1;3mAnswer: 100[0m[36;1m[1;3mAnswer: 100[0m[32;1m[1;3mI now know the final answer
Final Answer: 100[0m

[1m> Finished chain.[0m
Simple test result: {'input': 'What is 25 * 4?', 'output': '100', 'intermediate_steps': [(AgentAction(tool='Calculator', tool_input='25 * 4', log='Action: Calculator\nAction Input: 25 * 4'), 'Answer: 100')]}
[32;1m[1;3mI now know the final answer
Final Answer: 100[0m

[1m> Finished chain.[0m
Simple test result: {'input': 'What is 25 * 4?', 'output': '100', 'intermediate_steps': [(AgentAction(tool='Calculator', tool_input='25 * 4', log='Action: Calculator\nAction Input: 25 * 4'), 'Answer: 100')]}
