**In this tutorial, we’ll see how to implement an agent that leverages SQL using transformers.agents.**

What’s the advantage over a standard text-to-SQL pipeline?

A standard text-to-sq pipeline is brittle, since the generated SQL query can be incorrect. Even worse, the query could be incorrect, but not raise an error, instead giving some incorrect/useless outputs without raising an alarm.

👉 Instead, an agent system is able to critically inspect outputs and decide if the query needs to be changed or not, thus giving it a huge performance boost.

In [3]:
!pip install transformers["agents"]

Collecting diffusers (from transformers[agents])
  Downloading diffusers-0.31.0-py3-none-any.whl.metadata (18 kB)
Collecting opencv-python (from transformers[agents])
  Downloading opencv_python-4.10.0.84-cp37-abi3-win_amd64.whl.metadata (20 kB)
Collecting sentencepiece!=0.1.92,>=0.1.91 (from transformers[agents])
  Downloading sentencepiece-0.2.0-cp312-cp312-win_amd64.whl.metadata (8.3 kB)
Downloading sentencepiece-0.2.0-cp312-cp312-win_amd64.whl (991 kB)
   ---------------------------------------- 0.0/992.0 kB ? eta -:--:--
   ---------------------------------------- 0.0/992.0 kB ? eta -:--:--
   ---------------------------------------- 992.0/992.0 kB 3.9 MB/s eta 0:00:00
Downloading diffusers-0.31.0-py3-none-any.whl (2.9 MB)
   ---------------------------------------- 0.0/2.9 MB ? eta -:--:--
   ---------- ----------------------------- 0.8/2.9 MB 3.7 MB/s eta 0:00:01
   --------------------- ------------------ 1.6/2.9 MB 3.8 MB/s eta 0:00:01
   -------------------------------- -----


[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
from sqlalchemy import (
    create_engine,
    MetaData,
    Table,
    Column,
    String,
    Integer,
    Float,
    insert,
    inspect,
    text,
)

engine = create_engine("sqlite:///:memory:")
metadata_obj = MetaData()

# create city SQL table
table_name = "receipts"
receipts = Table(
    table_name,
    metadata_obj,
    Column("receipt_id", Integer, primary_key=True),
    Column("customer_name", String(16), primary_key=True),
    Column("price", Float),
    Column("tip", Float),
)
metadata_obj.create_all(engine)

In [7]:
metadata_obj.tables

FacadeDict({'receipts': Table('receipts', MetaData(), Column('receipt_id', Integer(), table=<receipts>, primary_key=True, nullable=False), Column('customer_name', String(length=16), table=<receipts>, primary_key=True, nullable=False), Column('price', Float(), table=<receipts>), Column('tip', Float(), table=<receipts>), schema=None)})

In [8]:
rows = [
    {"receipt_id": 1, "customer_name": "Alan Payne", "price": 12.06, "tip": 1.20},
    {"receipt_id": 2, "customer_name": "Alex Mason", "price": 23.86, "tip": 0.24},
    {"receipt_id": 3, "customer_name": "Woodrow Wilson", "price": 53.43, "tip": 5.43},
    {"receipt_id": 4, "customer_name": "Margaret James", "price": 21.11, "tip": 1.00},
]
for row in rows:
    stmt = insert(receipts).values(**row)
    with engine.begin() as connection:
        cursor = connection.execute(stmt)

In [9]:
with engine.connect() as con:
    rows = con.execute(text("""SELECT * from receipts"""))
    for row in rows:
        print(row)

(1, 'Alan Payne', 12.06, 1.2)
(2, 'Alex Mason', 23.86, 0.24)
(3, 'Woodrow Wilson', 53.43, 5.43)
(4, 'Margaret James', 21.11, 1.0)


In [11]:
inspector = inspect(engine)
columns_info = [(col["name"], col["type"]) for col in inspector.get_columns("receipts")]

table_description = "Columns:\n" + "\n".join([f"  - {name}: {col_type}" for name, col_type in columns_info])
print(table_description)

Columns:
  - receipt_id: INTEGER
  - customer_name: VARCHAR(16)
  - price: FLOAT
  - tip: FLOAT


In [19]:
from transformers.agents import tool


@tool
def sql_engin(query: str) -> str:
    """
    Allows you to perform SQL queries on the table. Returns a string representation of the result.
    The table is named 'receipts'. Its description is as follows:
        Columns:
        - receipt_id: INTEGER
        - customer_name: VARCHAR(16)
        - price: FLOAT
        - tip: FLOAT

    Args:
        query: The query to perform. This should be correct SQL.
    """
    output = ""
    with engine.connect() as con:
        rows = con.execute(text(query))
        for row in rows:
            output += "\n" + str(row)
    return output

In [20]:
from transformers.agents import ReactCodeAgent, HfApiEngine

agent = ReactCodeAgent(
    tools=[sql_engin],
    llm_engine=HfApiEngine("meta-llama/Meta-Llama-3-8B-Instruct"),
)

In [14]:
agent.run("Can you give me the name of the client who got the most expensive receipt?")

[37;1mCan you give me the name of the client who got the most expensive receipt?[0m
[31;20mError in code parsing: 
The code blob you used is invalid: due to the following error: 'NoneType' object has no attribute 'group'
This means that the regex pattern ```(?:py|python)?\n(.*?)\n``` was not respected: make sure to include code with the correct pattern, for instance:
Thoughts: Your thoughts
Code:
```py
# Your python code here
```<end_action>. Make sure to provide correct code[0m
Traceback (most recent call last):
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/transformers/agents/agents.py", line 121, in parse_code_blob
    return match.group(1).strip()
           ^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/transformers/agents/agents.py", line 1118, in step
    code_action = pa

'Woodrow Wilson'

In [15]:
table_name = "waiters"
receipts = Table(
    table_name,
    metadata_obj,
    Column("receipt_id", Integer, primary_key=True),
    Column("waiter_name", String(16), primary_key=True),
)
metadata_obj.create_all(engine)

rows = [
    {"receipt_id": 1, "waiter_name": "Corey Johnson"},
    {"receipt_id": 2, "waiter_name": "Michael Watts"},
    {"receipt_id": 3, "waiter_name": "Michael Watts"},
    {"receipt_id": 4, "waiter_name": "Margaret James"},
]
for row in rows:
    stmt = insert(receipts).values(**row)
    with engine.begin() as connection:
        cursor = connection.execute(stmt)

In [21]:
updated_description = """Allows you to perform SQL queries on the table. Beware that this tool's output is a string representation of the execution output.
It can use the following tables:"""

inspector = inspect(engine)
for table in ["receipts", "waiters"]:
    columns_info = [(col["name"], col["type"]) for col in inspector.get_columns(table)]

    table_description = f"Table '{table}':\n"

    table_description += "Columns:\n" + "\n".join([f"  - {name}: {col_type}" for name, col_type in columns_info])
    updated_description += "\n\n" + table_description

print(updated_description)

Allows you to perform SQL queries on the table. Beware that this tool's output is a string representation of the execution output.
It can use the following tables:

Table 'receipts':
Columns:
  - receipt_id: INTEGER
  - customer_name: VARCHAR(16)
  - price: FLOAT
  - tip: FLOAT

Table 'waiters':
Columns:
  - receipt_id: INTEGER
  - waiter_name: VARCHAR(16)


In [26]:
sql_engin.description=updated_description

agent = ReactCodeAgent(
    tools=[sql_engin],
    llm_engine=HfApiEngine("Qwen/Qwen2.5-72B-Instruct"),
)

agent.run("Which waiter got more total money from tips?")


[37;1mWhich waiter got more total money from tips?[0m
[33;1m=== Agent thoughts:[0m
[0mThought: To determine which waiter got more total money from tips, I will perform a SQL query that will sum the tips for each waiter and return their names along with the total sum of their tips. The query will be executed on the 'receipts' and 'waiters' tables by joining them on the 'receipt_id' field.[0m
[33;1m>>> Agent is executing the code below:[0m
[0m[38;5;7mquery[39m[38;5;7m [39m[38;5;109;01m=[39;00m[38;5;7m [39m[38;5;144m"""[39m
[38;5;144mSELECT w.waiter_name, SUM(r.tip) AS total_tips[39m
[38;5;144mFROM receipts r[39m
[38;5;144mJOIN waiters w ON r.receipt_id = w.receipt_id[39m
[38;5;144mGROUP BY w.waiter_name[39m
[38;5;144mORDER BY total_tips DESC[39m
[38;5;144m"""[39m

[38;5;7mresults[39m[38;5;7m [39m[38;5;109;01m=[39;00m[38;5;7m [39m[38;5;7msql_engin[39m[38;5;7m([39m[38;5;7mquery[39m[38;5;109;01m=[39;00m[38;5;7mquery[39m[38;5;7m)[39m
[38;5;

'Based on the results provided, the waiter who got the most total money from tips is **Michael Watts** with a total of **5.67**.'