#### Code Agents with `smolagents 🤗`
Multiple research papers have shown that having the LLM write its actions (the tool calls) in code is much better than the current standard format for tool calling, which is across the industry different shades of “writing actions as a JSON of tools names and arguments to use”.

Why is code better? Well, because we crafted our code languages specifically to be great at expressing actions performed by a computer. If JSON snippets was a better way, this package would have been written in JSON snippets and the devil would be laughing at us.

Code is just a better way to express actions on a computer. It has better:

- `Composability`: could you nest JSON actions within each other, or define a set of JSON actions to re-use later, the same way you could just define a python function?
  
- `Object management`: how do you store the output of an action like generate_image in JSON?
  
- `Generality`: code is built to express simply anything you can do have a computer do.

- `Representation in LLM training corpuses`: why not leverage this benediction of the sky that plenty of quality actions have already been included in LLM training corpuses?

This is illustrated on the figure below, taken from Executable Code Actions Elicit Better LLM Agents.

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/code_vs_json_actions.png" alt="alt text" width="800"/>



In [1]:
# Default Setup Cell.
# It imports environment variables, define 'devtools.debug" as a buildin, set PYTHONPATH and autorelaod
# Copy it in other Notebooks

import builtins

from devtools import debug
from dotenv import load_dotenv

setattr(builtins, "debug", debug)
load_dotenv(verbose=True)

%load_ext autoreload
%autoreload 2
%reset -f

!export PYTHONPATH=":./python"

In [2]:
from typing import Optional

from IPython.display import Markdown, display
from smolagents import (
    CodeAgent,
    DuckDuckGoSearchTool,
    LiteLLMModel,
    Tool,
    ToolCallingAgent,
    VisitWebpageTool,
    tool,
)

from python.ai_core.llm import LlmFactory

MODEL_ID = "gpt_4omini_openai"
model_name = LlmFactory(llm_id=MODEL_ID).get_litellm_model_name()
llm = LiteLLMModel(model_id=model_name)

* 'fields' has been removed
[32m2025-01-15 09:22:29.778[0m | [1mINFO    [0m | [36mpython.config[0m:[36msingleton[0m:[36m89[0m - [1mload /home/tcl/prj/genai-blueprint/app_conf.yaml[0m
[32m2025-01-15 09:22:29.782[0m | [1mINFO    [0m | [36mpython.config[0m:[36msingleton[0m:[36m96[0m - [1mConfiguration section selected by BLUEPRINT_CONFIG: edc_local[0m


### Simple  `smolagents 🤗` CodeAgent

...  but quite impressive !

In [3]:
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=llm)
agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?")

9.62

### Simple ToolCalling Agent

In [4]:
@tool
def get_weather(location: str, celsius: Optional[bool] = False) -> str:
    """
    Get weather in the next days at given location.
    Secretly this tool does not care about the location, it hates the weather everywhere.

    Args:
        location: the location
        celsius: the temperature
    """
    return "The weather is UNGODLY with torrential rains and temperatures below -10°C"


agent = ToolCallingAgent(tools=[get_weather], model=llm)
print(agent.run("What's the weather like in Paris?"))

{"answer":"The weather in Paris is UNGODLY with torrential rains and temperatures below -10°C."}


### RAG System with `smolagents 🤗` and `langchain 🦜️🔗`


In [5]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

from python.ai_core.embeddings import EmbeddingsFactory
from python.ai_core.vector_store import VectorStoreFactory

# Usual RAG stuff : load document, split it, and add chunks into a vectorstore
loader = TextLoader("use_case_data/other/state_of_the_union.txt")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
db = VectorStoreFactory(id="InMemory", embeddings_factory=EmbeddingsFactory()).vector_store
_ = db.add_documents(texts)

our_retriever = db.as_retriever(k=10)


# Create RetrieverTool
class RetrieverTool(Tool):
    name = "retriever"
    description = "Uses semantic search to retrieve relevant documentation"
    inputs = {
        "query": {
            "type": "string",
            "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
        }
    }
    output_type = "string"

    def __init__(self, retriever, **kwargs):
        super().__init__(**kwargs)
        self.retriever = retriever

    def forward(self, query: str) -> str:
        assert isinstance(query, str), "Your search query must be a string"
        docs = self.retriever.invoke(query)
        return "\nRetrieved documents:\n" + "".join(
            f"\n\n===== Document {i} =====\n{doc.page_content}" for i, doc in enumerate(docs)
        )


# Initialize agent
def create_rag_agent(retriever):
    retriever_tool = RetrieverTool(retriever)
    return CodeAgent(tools=[retriever_tool], model=llm, max_iterations=4, verbose=True)


agent = create_rag_agent(our_retriever)
response = agent.run("What did the president say about Ketanji Brown Jackson")
debug(response)

[32m2025-01-15 09:22:48.916[0m | [1mINFO    [0m | [36mpython.ai_core.vector_store[0m:[36mvector_store[0m:[36m176[0m - [1mget vector store  : InMemory/training_session_multilingual_MiniLM_local[0m


TypeError: MultiStepAgent.__init__() got an unexpected keyword argument 'max_iterations'

### Understand how it works.... 

In [None]:
display(Markdown(agent.system_prompt_template))

#### Langchain Tool with `smolagents 🤗` Search Tool

In [86]:
# search_tool = Tool.from_langchain(load_tools(["serpapi"])[0])

# agent = CodeAgent(tools=[search_tool], model=HfApiModel())

# agent.run("How many more blocks (also denoted as layers) are in BERT base encoder compared to the encoder from the architecture proposed in Attention is All You Need?")

In [None]:
agent = CodeAgent(
    tools=[VisitWebpageTool()],
    model=llm,
    additional_authorized_imports=["requests", "markdownify", "bs4"],
    use_e2b_executor=True,
)

agent.run("Qui est l'actuel premier ministre en France ?")

### Self Correcting Text to SQL with `smolagents 🤗`

In [6]:
from sqlalchemy import (
    Column,
    Float,
    Integer,
    MetaData,
    String,
    Table,
    create_engine,
    insert,
    inspect,
    text,
)

engine = create_engine("sqlite:///:memory:")
metadata_obj = MetaData()

table_name = "receipts"
receipts = Table(
    table_name,
    metadata_obj,
    Column("receipt_id", Integer, primary_key=True),
    Column("customer_name", String(16), primary_key=True),
    Column("price", Float),
    Column("tip", Float),
)
metadata_obj.create_all(engine)

rows = [
    {"receipt_id": 1, "customer_name": "Alan Payne", "price": 12.06, "tip": 1.20},
    {"receipt_id": 2, "customer_name": "Alex Mason", "price": 23.86, "tip": 0.24},
    {"receipt_id": 3, "customer_name": "Woodrow Wilson", "price": 53.43, "tip": 5.43},
    {"receipt_id": 4, "customer_name": "Margaret James", "price": 21.11, "tip": 1.00},
    {"receipt_id": 5, "customer_name": "John Doe", "price": 100.00, "tip": 10.00},
]
for row in rows:
    stmt = insert(receipts).values(**row)
    with engine.begin() as connection:
        cursor = connection.execute(stmt)

table_name = "waiters"
receipts = Table(
    table_name,
    metadata_obj,
    Column("receipt_id", Integer, primary_key=True),
    Column("waiter_name", String(16), primary_key=True),
)
metadata_obj.create_all(engine)

rows = [
    {"receipt_id": 1, "waiter_name": "Corey Johnson"},
    {"receipt_id": 2, "waiter_name": "Michael Watts"},
    {"receipt_id": 3, "waiter_name": "Michael Watts"},
    {"receipt_id": 4, "waiter_name": "Margaret James"},
]
for row in rows:
    stmt = insert(receipts).values(**row)
    with engine.begin() as connection:
        cursor = connection.execute(stmt)

In [None]:
updated_description = """Allows you to perform SQL queries on the table. Beware that this tool's output is a string representation of the execution output.
It can use the following tables:"""

inspector = inspect(engine)
for table in ["receipts", "waiters"]:
    columns_info = [(col["name"], col["type"]) for col in inspector.get_columns(table)]

    table_description = f"Table '{table}':\n"

    table_description += "Columns:\n" + "\n".join([f"  - {name}: {col_type}" for name, col_type in columns_info])
    updated_description += "\n\n" + table_description

print(updated_description)

In [8]:
from smolagents import tool


@tool
def sql_engine(query: str) -> str:
    """Allows you to perform SQL queries

    Args:
        query: The query to perform. This should be correct SQL."""

    output = ""
    with engine.connect() as con:
        rows = con.execute(text(query))
        for row in rows:
            output += "\n" + str(row)
    return output


sql_engine.description = updated_description

In [None]:
agent = CodeAgent(tools=[sql_engine], model=llm, verbose=True)
agent.run("Can you give me the name of the client who got the most expensive receipt?")