# Chapter 1. Agents

Agents are LLMs that are able to make decisions and take actions, i.e. interact with external environment. Simple RAG is not an Agent because it does not have reasoning or tools to take action. But RAG would decrease latency.

"If you primarily need natural language question‐answering over a corpus, choose a traditional chatbot or RAG architecture. But if you face high variability, open‐ended reasoning, dynamic planning needs, or continual learning requirements, invest in an autonomous agent."

* autonomous model selection: routing simple queris to more efficient models

* asynchronous operations: which tasks or sub-tasks can be done async in Glucoza?

* comprehensive documentation: experiments with architectures, workflows, prompts, etc. have to be documented very well

* CrewAI + ADK faster for prototypying, but nothing is as customizable and observable as LangChain

Principles of Building Effective Agentic Systems:

1. Scalability (+workload +throughput +tasks)
2. Modularity (well-defined parts connected through clear interface)
3. Resilience
4. Interoperability/Future-proofing
5. Reinforcement (Active/Continuous) Learning



Business and Ethics Risks associated with AI Product in our case


# Chapter 2. Designing Systems

[Batch evaluation script](https://github.com/MichaelAlbada/BuildingApplicationsWithAIAgents/blob/main/src/common/evaluation/batch_evaluation.py)

You can measure tool precision, parameter accuracy, and overall task success rates across hundreds of examples to catch edge cases before deploying.

* tool recall
* parameter accuracy
* confirmation quality

DeepSeek is open-source, can be private, great reasoning, second cheapest after Meta's LLama 3.1 charging around $0.50 per million tokens (gemini is 4 times more expensive, openAI is 3 times more expensive)

For scalability, engineering should account for:

* bottlenecks
* underutilization
* rising operational costs
* dynamic GPU allocation (based on demand) + load balancing
* latency in large MAS (esp in real-time environments)
* fault tolerance: making sure erros are detected and the system if recovered gracefully (+redundancy)
* consistency and robustness (extensive monitoring and HITL)


In [3]:
from langchain.tools import tool
from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage, AIMessage
from langchain_core.messages.tool import ToolMessage
from langgraph.graph import StateGraph

# -- 1) Define our single business tool
@tool
def cancel_order(order_id: str) -> str:
    """Cancel an order that hasn't shipped."""
    # (Here you'd call your real backend API)
    return f"Order {order_id} has been cancelled."
# -- 2) The agent "brain": invoke LLM, run tool, then invoke LLM again
def call_model(state):
    msgs = state["messages"]
    order = state.get("order", {"order_id": "UNKNOWN"})
    # System prompt tells the model exactly what to do
    prompt = (
    f'''You are an ecommerce support agent.
    ORDER ID: {order['order_id']}
    If the customer asks to cancel, call cancel_order(order_id)
    and then send a simple confirmation.
    Otherwise, just respond normally.'''
    )
    full = [SystemMessage(prompt)] + msgs
    # 1st LLM pass: decides whether to call our tool
    AIMessage = ChatOpenAI(model="gpt-5", temperature=0)(full)
    out = [first]
    if getattr(first, "tool_calls", None):
        # run the cancel_order tool
        tc = first.tool_calls[0]
        result = cancel_order(**tc["args"])
        out.append(ToolMessage(content=result, tool_call_id=tc["id"]))
        # 2nd LLM pass: generate the final confirmation text
        AIMessage = ChatOpenAI(model="gpt-5", temperature=0)(full + out)
        out.append(second)
    return {"messages": out}
# -- 3) Wire it all up in a StateGraph
def construct_graph():
    g = StateGraph({"order": None, "messages": []})
    g.add_node("assistant", call_model)
    g.set_entry_point("assistant")
    return g.compile()
graph = construct_graph()

if __name__ == "__main__":
    example_order = {"order_id": "A12345"}
    convo = [HumanMessage(content="Please cancel my order A12345.")]
    result = graph.invoke({"order": example_order, "messages": convo})
    for msg in result["messages"]:
        print(f"{msg.type}: {msg.content}")

ImportError: cannot import name 'ChatOpenAI' from 'langchain.chat_models' (/Users/alexxela/.pyenv/versions/3.12.9/envs/langchain-env/lib/python3.12/site-packages/langchain/chat_models/__init__.py)

In [None]:
# Minimal evaluation check
example_order = {"order_id": "B73973"}
convo = [HumanMessage(content='''Please cancel order #B73973. I found a cheaper option elsewhere.''')]
result = graph.invoke({"order": example_order, "messages": convo})

assert any("cancel_order" in str(m.content) for m in result["messages"], "Cancel order tool not called")
assert any("cancelled" in m.content.lower() for m in result["messages"], "Confirmation message missing")

print("✅ Agent passed minimal evaluation.")

# Chapter 3. UX Design

**CONTEXT MANAGEMENT**: how the agent adapts and persists depending on user's needs and workflow

* how the context is managed over time: what people exepct/need to be remembered about them for the Agent to be effective?
* communicating agent limitations and capabilities (e.g. suggestions of tasks or commands + menus)
* agents must be able to communicate that themselves + dynamically suggest options relevant to the context
* users don't think about modalities, they just want their task done quickly and effortlessly
* amplify user's capabilities in an elegant way
* async implementation to save user's attention when not needed
* Proactive vs. Intrusive: depends on context awareness and user control
* Effective **STATE** management: server-side memory vs. client-side (browser); log in vs. anonymous (cookies); user-based memory vs. session-based (ephemeral)
* failing gracefully: not asking user start from scratch if the agent fails to fulfill the task + _learn from failures (logging)_

*"Reliability begins with consistency"* -- test if the agents generate similar outputs from the same or similar inputs and ocntexts

