# Building a SQL AI Assistant with LangChain

[LangChain](https://www.langchain.com/) is an open-source software development framework that simplifies building applications powered by large language models (LLMs).

Here's a short version for implementing a NL2SQL Assistant with LangChain:

```python
from langchain.chat_models import init_chat_model
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain_community.utilities import SQLDatabase
from langgraph.prebuilt import create_react_agent

# Create the SQL Database instance
db = SQLDatabase(...)

# Specify LLM
llm = init_chat_model(...)

# Get tools
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
tools = toolkit.get_tools()

# Create the ReAct agent
agent_executor = create_react_agent(llm, tools, prompt=system_message)

# Query the database using the tools using invoke or stream
result = agent_executor.invoke({"messages": [{"role": "user", "content": question}]})
```

# Settings

In [1]:
from langchain.chat_models import init_chat_model
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain_community.utilities import SQLDatabase
from langgraph.prebuilt import create_react_agent
from sqlalchemy import create_engine

In [2]:
# postgresql+psycopg://username:password@host:port/database
postgres_uri_template = "postgresql+psycopg://{username}:{password}@{host}:{port}/{database}"

postgres_uri = postgres_uri_template.format(
    username="postgres",
    password="postgres",
    host="localhost",
    port="5432",
    database="olist_ecommerce"
)

# Create engine
postgres_engine = create_engine(postgres_uri, connect_args={"options": "-csearch_path=ecommerce,marketing"})

In [3]:
# Create database using Langchain SQLDatabase
db = SQLDatabase(engine=postgres_engine)

# Show the available tables
print("Available tables:", db.get_usable_table_names())

# Test the database
db.run("SELECT * from ecommerce.customers limit 10")

Available tables: ['closed_deals', 'customers', 'geolocation', 'marketing_qualified_leads', 'order_items', 'order_payments', 'order_reviews', 'orders', 'product_category_name_translations', 'products', 'sellers']


"[('06b8999e2fba1a1fbc88172c00ba8bc7', '861eff4711a542e4b93843c6dd7febb0', '14409', 'franca', 'SP'), ('18955e83d337fd6b2def6b18a428ac77', '290c77bc529b7ac935b93aa66c333dc3', '09790', 'sao bernardo do campo', 'SP'), ('4e7b3e00288586ebd08712fdd0374a03', '060e732b5b29e8181a18229c7b0b2b5e', '01151', 'sao paulo', 'SP'), ('b2b6027bc5c5109e529d4dc6358b12c3', '259dac757896d24d7702b9acbbff3f3c', '08775', 'mogi das cruzes', 'SP'), ('4f2d8ab171c80ec8364f7c12e35b23ad', '345ecd01c38d18a9036ed96c73b8d066', '13056', 'campinas', 'SP'), ('879864dab9bc3047522c92c82e1212b8', '4c93744516667ad3b8f1fb645a3116a4', '89254', 'jaragua do sul', 'SC'), ('fd826e7cf63160e536e0908c76c3f441', 'addec96d2e059c80c30fe6871d30d177', '04534', 'sao paulo', 'SP'), ('5e274e7a0c3809e14aba7ad5aae0d407', '57b2a98a409812fe9618067b6b8ebe4f', '35182', 'timoteo', 'MG'), ('5adf08e34b2e993982a47070956c5c65', '1175e95fb47ddff9de6b2b06188f7e0d', '81560', 'curitiba', 'PR'), ('4b7139f34592b3a31687243a302fa75b', '9afe194fb833f79e300e37e580

In [4]:
# Create llm
llm = init_chat_model("gpt-5-nano", model_provider="openai")

In [5]:
# Get available database tools
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
tools = toolkit.get_tools()
tools

[QuerySQLDatabaseTool(description="Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.", db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x124307e00>),
 InfoSQLDatabaseTool(description='Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x124307e00>),
 ListSQLDatabaseTool(db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x124307e00>),
 QuerySQLCheckerTool(description='Use this tool to double check if your 

In [6]:
# Specify system prompt

system_message_template = """
You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run,
then look at the results of the query and return the answer. Unless the user
specifies a specific number of examples they wish to obtain, always limit your
query to at most {top_k} results.

You can order the results by a relevant column to return the most interesting
examples in the database. Never query for all the columns from a specific table,
only ask for the relevant columns given the question.

You MUST double check your query before executing it. If you get an error while
executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the
database.

To start you should ALWAYS look at the tables in the database to see what you
can query. Do NOT skip this step.

Then you should query the schema of the most relevant tables.
"""

system_message = system_message_template.format(
    dialect="PostgreSQL",
    top_k=5,
)

# Create agent
agent_executor = create_react_agent(llm, tools, prompt=system_message)

In [7]:
question = "How many customers are there in the database?"

for step in agent_executor.stream(
    {"messages": [{"role": "user", "content": question}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()


How many customers are there in the database?
Tool Calls:
  sql_db_list_tables (call_SjxIzTm7LHDpCuzob6haqnMA)
 Call ID: call_SjxIzTm7LHDpCuzob6haqnMA
  Args:
Name: sql_db_list_tables

closed_deals, customers, geolocation, marketing_qualified_leads, order_items, order_payments, order_reviews, orders, product_category_name_translations, products, sellers
Tool Calls:
  sql_db_schema (call_lrhm56SLOA6jGVDbx61tWqfT)
 Call ID: call_lrhm56SLOA6jGVDbx61tWqfT
  Args:
    table_names: customers
Name: sql_db_schema


CREATE TABLE customers (
	customer_id TEXT NOT NULL, 
	customer_unique_id TEXT NOT NULL, 
	customer_zip_code_prefix TEXT, 
	customer_city TEXT, 
	customer_state TEXT, 
	CONSTRAINT customers_pkey PRIMARY KEY (customer_id)
)

/*
3 rows from customers table:
customer_id	customer_unique_id	customer_zip_code_prefix	customer_city	customer_state
06b8999e2fba1a1fbc88172c00ba8bc7	861eff4711a542e4b93843c6dd7febb0	14409	franca	SP
18955e83d337fd6b2def6b18a428ac77	290c77bc529b7ac935b93aa66c333d

# Troubleshoot

- Initilizing SQLDatabase directly from URI will fail if tables are not in the public schema. 
    - Solution: Use a pre-initialized engine with updated search path.