# NLP Exercise‚Äî LLMs, Prompts, and Agents 

**Hands-on goals (1.5h):**

1. Interact with LLM

2. Basic chatbot

3. LLM capabilities and limitations

4. Prompt engineering

5. Agents


In [1]:
import os, sys, warnings
## Silence the annoying warnings
# 1) Python warnings (UserWarning, DeprecationWarning, etc.)
warnings.filterwarnings("ignore")

# 2) gRPC native logs (ALTS, channelz, etc.)
os.environ["GRPC_VERBOSITY"] = "NONE"
os.environ["GRPC_TRACE"] = ""

# 1. Install packages

In [2]:
# !pip install -r requirements.txt --quiet
# !pip uninstall jupyterlab --yes --quiet
# !pip install jupyterlab==3.6.8  --quiet 

In [3]:
from typing import List, Dict, Any
from dotenv import load_dotenv
from IPython.display import display, Markdown

# Load environment variables from .env file
load_dotenv()

# We'll use LangChain + LangGraph for the agent
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

# Make sure GOOGLE_API_KEY is set in .env file or environment
# api_key = os.getenv("GOOGLE_API_KEY")
# assert api_key, "Please set GOOGLE_API_KEY in your .env file or environment."


# 2. Interact with LLM

In [4]:
import time
from IPython.display import display, Markdown
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage

# initiate the LLM model, use model="gemini-2.5-flash" as example
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")

def safe_llm_invoke(messages, max_retries=5, base_delay=1.0):
    """Invoke LLM with retries on transient errors."""
    for attempt in range(max_retries):
        try:
            resp = llm.invoke(messages)
            return resp
        except Exception as e:
            # Last attempt ‚Üí re-raise
            if attempt == max_retries - 1:
                raise
            # Exponential backoff
            delay = base_delay * (2 ** attempt)
#             print(f"Retry {attempt+1}/{max_retries} after error: {e}")
            time.sleep(delay)

## Exercise: 
Ask LLM standalone questions.  
For example: 
- safe_llm_invoke('who are you?')
- safe_llm_invoke('what is your capability?')
- safe_llm_invoke('what is the previous question we discussed?')
  
There is no memory, so LLM doesn't know what has been discussed and you can't ask follow up questions yet.

In [5]:
resp = safe_llm_invoke('what is the previous question we discussed?')
display(Markdown(f"<p style='margin:2px 0'>{resp.content}</p>"))

<p style='margin:2px 0'>As an AI, I don't have memory of past conversations. Each interaction is treated as a new one.

Could you please remind me what we were discussing?</p>

# 3. Simple Chatbot
Save the conversation to history, pass the whole history to LLM, now you have a **chatbot**!

### Sample questions:
- Hi
- Who are you?
- What is your capability and limitation?
- Answer it within 200 words

### Caution
If you can't type o as it will fold/unfold the result area, try **capital O**, LLM don't mind it.

In [6]:
###############################################################################
# STREAMING CHAT UI
###############################################################################

import time
import sys
import markdown
import ipywidgets as widgets
from IPython.display import display, Markdown
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, AIMessage


# ---------------------------------------------------------------------------
# LLM SETUP
# ---------------------------------------------------------------------------
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0.2,
)

# ---------------------------------------------------------------------------
# SAFE STREAM (with retry + exponential backoff)
# ---------------------------------------------------------------------------
def safe_stream(messages, max_retries=5, base_delay=1.0):
    for attempt in range(max_retries):
        try:
            return llm.stream(messages)
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            delay = base_delay * (2 ** attempt)
#             print(f"[Retry {attempt+1}/{max_retries}] {e}")
            time.sleep(delay)

# ---------------------------------------------------------------------------
# UI COMPONENTS
# ---------------------------------------------------------------------------
output_area = widgets.Output(
    layout={
        "border": "1px solid #ddd",
        "height": "300px",
        "width": "100%",
        "overflow_y": "auto",
        "padding": "10px"
    }
)

input_box = widgets.Text(
    placeholder="Type your message and press Enter‚Ä¶",
    layout={'width': '100%'}
)

history = []

# ---------------------------------------------------------------------------
# HANDLER: triggered when user presses Enter
# ---------------------------------------------------------------------------
def on_send(text_widget):

    text = text_widget.value
    if not text:
        return

    text_widget.value = ""   # clear UI input

    # Show user's message
    with output_area:
        display(Markdown(
            f"<div style='margin:0; padding:0; line-height:1.0;'>&nbsp;&nbsp;<b>You:</b> {text}</div>"
        ))

    history.append(HumanMessage(content=text))


    # -----------------------------------------------------------------------
    # STREAM CHATBOT RESPONSE (Markdown live update)
    # -----------------------------------------------------------------------
    with output_area:
        # Initial header
        bubble = widgets.HTML(
            value="<b>Chatbot:</b> ",
            layout={"overflow_y": "auto", "max_height": "300px", "padding": "6px"}
        )
        display(bubble)

    chunks = safe_stream(history)
    full = ""

    for chunk in chunks:
        token = chunk.content or ""
        full += token
        html_content = markdown.markdown(full)
        # force inline display for paragraph output
        html_content = html_content.replace("<p>", "<span>").replace("</p>", "</span>")


    # Update bubble with HTML-rendered content
        bubble.value = (
            "<b>Chatbot:</b>"
            "<div style='margin-left:40px'>"
            f"<div style='line-height:1.2; margin:0; padding:0'>"
            f"{html_content}"
            "</div>"
            "</div>"
        )

    history.append(AIMessage(content=full))



# ---------------------------------------------------------------------------
# CONNECT ENTER KEY TO HANDLER
# ---------------------------------------------------------------------------
input_box.on_submit(on_send)

# Show Chat UI
display(output_area)
display(input_box)


Output(layout=Layout(border_bottom='1px solid #ddd', border_left='1px solid #ddd', border_right='1px solid #dd‚Ä¶

Text(value='', layout=Layout(width='100%'), placeholder='Type your message and press Enter‚Ä¶')

#### You can inspect what gets saved in history.
All the history is passed to `ChatGoogleGenerativeAI`, but only the actual
`content` strings are sent to the LLM.

LangChain converts messages into Gemini‚Äôs expected schema:

```json
{
  "contents": [
    {"role": "user",  "parts": [{"text": "User text..."}]},
    {"role": "model", "parts": [{"text": "Model reply..."}]},
    {"role": "user",  "parts": [{"text": "Next user message..."}]}
  ]
}



In [8]:
history[0], history[1]

(HumanMessage(content='hi', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hi there! How can I help you today?', additional_kwargs={}, response_metadata={}))

# 4. Test LLM capability and limitation
- Does it know the current time? It only has knowledge from the data it is trained on. So no up to date information
  
- Test it on a domain you know well, how does the LLM perform?
- What is my name? Where am I located? LLM has no private memory
- try ‚ÄúSummarize the book ‚ÄòThe Yellow Star Algorithm‚Äô.‚Äù (It doesn‚Äôt exist.) LLM may hallucinate and make up things. It is a statistical model, has no sense of right or wrong, will generate output for whatever input based on its algorithm.

In [9]:
# type "exit", "quit" or "q" to quit
def chat():
    history = []
    while True:
        user = input("You: ")
        if user.lower() in ["exit", "quit", "q"]:
            break

        history.append(HumanMessage(content=user))
        resp = safe_llm_invoke(history)
        display(Markdown(f"**Chatbot:** {resp.content}"))
        history.append(resp)
chat()

You: hi


**Chatbot:** Hi there! How can I help you today?

You: q


# 5 Prompt Engineering

## 1. Prompt Engineering Basics: 

In this section:

- Call the LLM directly.

- Change **role** and **style** etc.

- See how answers differ for the same question.

**Things you can instruct in the system prompt:**
1. Persona ‚Äî Who the model acts as (e.g., analyst, teacher, engineer).
2. Style ‚Äî How the response is written (plain, formal, concise, narrative).
3. Tone ‚Äî Emotional flavor (friendly, neutral, strict, encouraging).
4. Reasoning Steps ‚Äî How the model thinks before answering (identify factors, compare options, summarize, etc.).
5. Output Format ‚Äî The required shape of the answer (bullets, JSON, table, short paragraph).



In [10]:
from langchain_core.messages import SystemMessage, HumanMessage

def chat(system_promt=None):
    if system_promt is None:
        system_promt='''
        '''
    history = [
        SystemMessage(content=system_promt)
    ]

    while True:
        user = input("You: ")
        if user.lower() in ["exit", "quit", "q"]:
            break

        history.append(HumanMessage(content=user))
        resp = safe_llm_invoke(history)
        display(Markdown(f"**Chatbot:** {resp.content}"))
        history.append(resp)
prompts={
    'finalcial_analyst': '''
        You are a senior financial analyst. Use a structured format for the response. 
        Prioritize accuracy, highlight main risks and drivers, 
        and provide short context when needed. 
        Avoid speculation and avoid long paragraphs.
    ''',
    'teacher': '''
        You are a high school teacher, answer question patiently and nicely. 
        Encourage student to ask follow up questions.
    '''
}

In [11]:
chat(system_promt=prompts['teacher'])

You: hi


**Chatbot:** Hi there! üëã It's great to hear from you.

How can I help you today? Do you have a question about a specific subject, a homework problem, or just want to chat about something you're learning?

Don't hesitate to ask anything at all! üòä

You: q


In [12]:
chat(system_promt=prompts['finalcial_analyst'])

You: q


### üëâ Student TODO

1. Copy the previous cell.

2. Change the **role** and **audience**, e.g.:

   - "You are a chief risk officer."

   - "Explain to a first-year finance student."

3. Run and compare the tone and focus.


# 6. Building an Agent with LangGraph

We'll use:

- LangChain's `ChatGoogleGenerativeAI` as the model wrapper.

- LangChain `@tool` decorators for Python tools.

- LangGraph's `StateGraph` to define the agent workflow.

The agent will:

1. Start with just LLM

2. Adding tools: get_time, web_search, 

2. LLM decides: answer directly vs call a tool.

3. If tool is called, we run the Python function.

4. LLM uses the tool result to produce the final answer.


## Step 1: Import Required Libraries


In [16]:
from datetime import datetime
from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage, ToolMessage
from langchain_core.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

# Reuse the LLM and safe_llm_invoke from cell 6
# We'll create a new LLM instance with tools bound to it later


## Step 2: Define Tools

We'll create two tools:
1. `get_time` - A custom tool using the `@tool` decorator
2. `web_search` - DuckDuckGo search tool (pre-built from langchain-community)


In [17]:
# Custom tool: get_time
@tool
def get_time() -> str:
    """Get the current date and time. Use this when the user asks about the current time, date, or what day it is."""
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# Pre-built tool: DuckDuckGo web search
web_search = DuckDuckGoSearchRun()

# Create list of tools
tools = [get_time, web_search]

# Create LLM with tools bound to it
from langchain_google_genai import ChatGoogleGenerativeAI
llm_with_tools = ChatGoogleGenerativeAI(model="gemini-2.5-flash").bind_tools(tools)


## Step 3: Define Agent State

The agent state holds the conversation history (messages).


In [18]:
class AgentState(TypedDict):
    """State for the agent. Contains the conversation history."""
    messages: Annotated[Sequence[BaseMessage], add_messages]


## Step 4: Create Agent Nodes

We need three nodes:
1. **LLM node**: Calls the LLM with tool calling capability
2. **Tool execution node**: Executes tools when LLM requests them
3. **Router node**: Decides whether to continue (call tools) or end (final answer)


In [19]:
# Create a tool map for easy lookup
tool_map = {tool.name: tool for tool in tools}

def call_llm(state: AgentState) -> AgentState:
    """LLM node: Call the LLM with tool calling capability."""
    # Reuse safe_llm_invoke from cell 6, but adapt it for our llm_with_tools
    import time
    max_retries = 5
    base_delay = 1.0
    
    for attempt in range(max_retries):
        try:
            response = llm_with_tools.invoke(state["messages"])
            return {"messages": [response]}
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            delay = base_delay * (2 ** attempt)
            time.sleep(delay)

def call_tool(state: AgentState) -> AgentState:
    """Tool execution node: Execute tools when LLM requests them."""
    last_message = state["messages"][-1]
    
    # Check if the last message has tool calls
    tool_messages = []
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        for tool_call in last_message.tool_calls:
            tool_name = tool_call["name"]
            tool_args = tool_call.get("args", {})
            tool_id = tool_call.get("id", "")
            
            # Get the tool and execute it
            if tool_name in tool_map:
                tool_result = tool_map[tool_name].invoke(tool_args)
                tool_messages.append(
                    ToolMessage(content=str(tool_result), tool_call_id=tool_id)
                )
            else:
                tool_messages.append(
                    ToolMessage(content=f"Tool {tool_name} not found", tool_call_id=tool_id)
                )
    
    return {"messages": tool_messages}

def should_continue(state: AgentState) -> str:
    """Router node: Decide whether to continue (call tools) or end."""
    last_message = state["messages"][-1]
    
    # If the last message has tool calls, we need to execute tools
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    # Otherwise, we're done
    return "end"


## Step 5: Build StateGraph

Create the graph with nodes and edges, then compile it.


In [20]:
# Create the graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("llm", call_llm)
workflow.add_node("tools", call_tool)

# Add edges
workflow.add_edge(START, "llm")
workflow.add_conditional_edges(
    "llm",
    should_continue,
    {
        "tools": "tools",
        "end": END
    }
)
workflow.add_edge("tools", "llm")

# Compile the graph
agent = workflow.compile()


## Step 6: Create Agent Invoke Function

A wrapper function to easily invoke the agent with a user question.


In [21]:
def run_agent(question: str):
    """Run the agent with a user question and return the final answer."""
    # Initialize state with user question
    initial_state = {"messages": [HumanMessage(content=question)]}
    
    # Invoke the agent
    final_state = agent.invoke(initial_state)
    
    # Get the last message (should be the final answer)
    last_message = final_state["messages"][-1]
    
    return last_message.content


## Step 7: Example Usage

Let's test the agent with different types of questions:
1. Questions that require tool use (time, web search)
2. Questions that can be answered directly


In [23]:
# Example 1: Question requiring get_time tool
print("Question: What time is it?")
print("\nAgent Response:")
response = run_agent("What time is it?")
display(Markdown(f"**Agent:** {response}"))


Question: What time is it?

Agent Response:


**Agent:** The time is 14:32:32 on November 20, 2025.

In [24]:
# Example 2: Question requiring web search tool
print("Question: Search for latest AAPL stock news")
print("\nAgent Response:")
response = run_agent("Search for latest AAPL stock news")
display(Markdown(f"**Agent:** {response}"))


Question: Search for latest AAPL stock news

Agent Response:


**Agent:** [{'type': 'text', 'text': "Here's a summary of the latest AAPL stock news:\n\n*   **Mixed Headlines:** Strong demand for the iPhone 17 and a perception that Apple hasn't been significantly impacted by the AI sell-off are providing support. However, concerns about CEO succession, design-team departures, and a $634 million patent verdict are weighing on sentiment.\n*   **Recent Performance:** Apple stock has seen a slight decline over the past week but has maintained gains from the previous month.\n*   **Q4 Spike:** Apple's stock spiked in Q4 due to outstanding earnings, driven by stronger-than-anticipated demand for the latest iPhone 17 series and record-setting Services revenue.\n*   **Earnings Call:** While revenue and EPS exceeded estimates, iPhone revenue and China sales missed expectations. Tim Cook discussed AI and other topics in the earnings call.\n*   **Risk Factors:** Historically, Apple stock has experienced significant declines during major sell-offs (e.g., over 80% during the Dot-Com Bubble, 61% during the Global Financial Crisis, and 30-40% during the 2018 correction and Covid sell-off).\n*   **Future Catalysts:** Historical trends suggest that future catalysts could propel Apple shares to new peaks.", 'extras': {'signature': 'CvUBAdHtim9dhVFjH4DDDpLDK88zVj6hwK9C/OL+NV5hpc4RRqeJRT+cjZwkd8shlDvu4PY0iMIzLhLdvUqzCx8bl8dmaemm5XgzN0QhVIUZlgEqiOIIBobQ9MJTRu1dnml2HQXeQCLT/D1ftKsZibAbudhoUqIbKp5BdQugAQB+Hi29f27hMHiTlZ8WUjZ5z5uLEQi8CdMTd9Vzksx/tvEgtf17OaDNrPgaKzLZjhtuzWbnJEsSs5a55m2bcAhVwapAvRfCN/WnT7yCG0zdYBS5Gc6AIKKTXP12yxESEKOiz64Htgdg41mToOci5loV+lPIWzAbrWQ='}}]

In [25]:
# Example 3: Question that can be answered directly (no tool needed)
print("Question: What is machine learning?")
print("\nAgent Response:")
response = run_agent("What is machine learning?")
display(Markdown(f"**Agent:** {response}"))


Question: What is machine learning?

Agent Response:


**Agent:** [{'type': 'text', 'text': 'Machine learning (ML) is a field of study within artificial intelligence (AI) that focuses on developing statistical algorithms. These algorithms enable computers to learn from data and make predictions or decisions without being explicitly programmed for every task. Essentially, machine learning teaches systems to identify patterns in data and apply that understanding to new, unseen data, allowing them to "think" and "understand" in a way similar to humans.', 'extras': {'signature': 'CqsFAdHtim8nhSlfSGsQPXiWt591DWua2DGCN5dCYWux9UjPbJ/RHV/4AdE87nJ8bAs8TDwIpwSovfg/1gg3gYPZ7kG9+WFrDuCTPieFtxPfyN7r2zPXWdRAK1O2aKOroakigHiIW4HbrSPuheteTw3uIqJd+NAd3r2cfgZs/FdfsuiQOloTI3LAgWY1VHT5wdeUYAFvpB5okeFl8QgNyerw+KYmxS3OcGuY+O8EkCMtnSs47SvXDoXbThUmANsPwETBQl59FMtGEA9luQQ1/ijlDIjorwNZNz5bldRnrvTDFURkjeeKolsgQeQ3JRBWS3qp5L2YEc8acSgvCjSSeqF6a5iSE8fW2aGLE54OWCQkPnKiRXY5VXt1w34OEDJaxKuD7Cxw+XxXNAr12v2/tlBGIJ+O3B7vaRPzwS/g23LRJXIYIbFI0bAK7ibrdzZp8l8EJreFdpshAA4Lv7k1i+L5345UjKN0teGq3WPuv0NnbSVeIg16T4C++2d+V+vdnpCxn5FoaQf6QqrQCvbKiiMKCRjaxi8JlbT04Z1NGrGlHBC7JDMgcbxGPgMK2Q/Cr7JhSYd0TpS4mrfSiXCKLZDNeQ9wBjnFqYoTzhQJVQP2M8DEjIcWL84QtKWuU3ghinrsNE8ophd9ZvLOhxhWLIbIaI9Q8OeeUIbnQ9RwLIonHxpxkvfMwdjjV3EQd6eMieZn5S5rLxkLDRNEVNfaMzjk8JwJ6tpvDiTyCFk4ieL0jGv3HhCJwnL2xWrKyu9b3q/9UF2xUdAYizK2S4iVzhO1Sauh3Hpr1iTzvutDaBmLiyvp0Quy/9Sx0ASFncwdu1XqYCzrMbpEvbilv1Fqb5dmGvS4fwMrE1HADzIhXqebmozsyBA7UJaodR5wFuPu1oXxIs716lhlg54KIzg='}}]

### üëâ Try it yourself!

You can ask the agent any question. The agent will automatically decide whether to:
- Use the `get_time` tool for time-related questions
- Use the `web_search` tool for current information, news, or web searches
- Answer directly if it has the knowledge

Try questions like:
- "What's the weather like today?" (will use web_search)
- "What date is it?" (will use get_time)
- "Explain quantum computing" (will answer directly)


# 7. Extensions / Experiments

- AI tool use: ChatGPT, Gemini
- Vibe coding: Cursor
- AI browser: ChatGPT Atlas
