# SQL Chain example

This example demonstrates the use of the `SQLDatabaseChain` for answering questions over a database.

Under the hood, LangChain uses SQLAlchemy to connect to SQL databases. The `SQLDatabaseChain` can therefore be used with any SQL dialect supported by SQLAlchemy, such as MS SQL, MySQL, MariaDB, PostgreSQL, Oracle SQL, and SQLite. Please refer to the SQLAlchemy documentation for more information about requirements for connecting to your database. For example, a connection to MySQL requires an appropriate connector such as PyMySQL. A URI for a MySQL connection might look like: `mysql+pymysql://user:pass@some_mysql_db_address/db_name`

This demonstration uses SQLite and the example Chinook database.
To set it up, follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the `.db` file in a notebooks folder at the root of this repository.

In [8]:
%pip install pymysql

Collecting pymysql
  Downloading PyMySQL-1.0.3-py3-none-any.whl (43 kB)
[K     |████████████████████████████████| 43 kB 581 kB/s eta 0:00:01
[?25hInstalling collected packages: pymysql
Successfully installed pymysql-1.0.3
Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install mysql-connector-python

Collecting mysql-connector-python
  Downloading mysql_connector_python-8.0.33-py2.py3-none-any.whl (390 kB)
[K     |████████████████████████████████| 390 kB 3.2 MB/s eta 0:00:01
Installing collected packages: mysql-connector-python
Successfully installed mysql-connector-python-8.0.33
Note: you may need to restart the kernel to use updated packages.


In [2]:
from langchain import OpenAI, SQLDatabase, SQLDatabaseChain
import pandas as pd
import os 
import pymysql
# from sqlalchemy import create_engine
import pandas as pd

In [2]:
import pandas as pd
# 创建连接引擎
engine = create_engine("mysql+pymysql://root:xrd93123875@127.0.0.1:3306",echo=True)
# 查询表数据
query = '''
SELECT * FROM invoicing.clients
'''
df = pd.read_sql_query(query, engine)
df

2023-04-26 22:34:26,789 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'sql_mode'
2023-04-26 22:34:26,790 INFO sqlalchemy.engine.Engine [raw sql] {}
2023-04-26 22:34:26,807 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2023-04-26 22:34:26,808 INFO sqlalchemy.engine.Engine [generated in 0.00078s] {}
2023-04-26 22:34:26,812 INFO sqlalchemy.engine.Engine SELECT DATABASE()
2023-04-26 22:34:26,813 INFO sqlalchemy.engine.Engine [raw sql] {}
2023-04-26 22:34:26,819 INFO sqlalchemy.engine.Engine 
SELECT * FROM invoicing.clients

2023-04-26 22:34:26,820 INFO sqlalchemy.engine.Engine [raw sql] {}


Unnamed: 0,client_id,name,address,city,state,phone
0,1,Vinte,3 Nevada Parkway,Syracuse,NY,315-252-7305
1,2,Myworks,34267 Glendale Parkway,Huntington,WV,304-659-1170
2,3,Yadel,096 Pawling Parkway,San Francisco,CA,415-144-6037
3,4,Kwideo,81674 Westerfield Circle,Waco,TX,254-750-0784
4,5,Topiclounge,0863 Farmco Road,Portland,OR,971-888-9129


In [3]:
import sqlalchemy
# sqlalchemy.orm.configure_mappers()
db = SQLDatabase.from_uri("mysql+pymysql://address/classicmodels")


### py文件导入

In [14]:
OPENAI_API_KEY = 'open-ai-key'
# db = SQLDatabase.from_uri("mysql+pymysql://root:xrd93123875@localhost:3306/store")
db = SQLDatabase.from_uri("mysql+pymysql://address/classicmodels")
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True) 
db_chain.run('''
             write an SQL query to find how many customers live in USA.
             ''')



[1m> Entering new SQLDatabaseChain chain...[0m

             write an SQL query to find how many customers live in USA.
             
SQLQuery:[32;1m[1;3m 
SELECT COUNT(*) AS 'Number of Customers in USA' 
FROM customers 
WHERE country = 'USA';
[0m
SQLResult: [33;1m[1;3m[(36,)][0m
Answer:[32;1m[1;3m 36 customers live in USA.[0m
[1m> Finished chain.[0m


' 36 customers live in USA.'

**NOTE:** For data-sensitive projects, you can specify `return_direct=True` in the `SQLDatabaseChain` initialization to directly return the output of the SQL query without any additional formatting. This prevents the LLM from seeing any contents within the database. Note, however, the LLM still has access to the database scheme (i.e. dialect, table and key names) by default.

# English Version

In [19]:
db_chain.run('''
write SQL to  Query the hierarchical relationship between employees and their subordinates in the classicmodels.employees table,sort by hierarchical, please use recursive CTE, and return the column name with data in the output, use table format.
''')



[1m> Entering new SQLDatabaseChain chain...[0m

write SQL to  Query the hierarchical relationship between employees and their subordinates in the classicmodels.employees table,sort by hierarchical, please use recursive CTE, and return the column name with data in the output, use table format.

SQLQuery:[32;1m[1;3m 
WITH RECURSIVE cte (employeeNumber, lastName, firstName, reportsTo, jobTitle, level) AS (
    SELECT employeeNumber, lastName, firstName, reportsTo, jobTitle, 0 AS level
    FROM employees
    WHERE reportsTo IS NULL
    UNION ALL
    SELECT e.employeeNumber, e.lastName, e.firstName, e.reportsTo, e.jobTitle, cte.level + 1
    FROM employees e
    INNER JOIN cte
    ON e.reportsTo = cte.employeeNumber
)
SELECT employeeNumber, lastName, firstName, reportsTo, jobTitle, level
FROM cte
ORDER BY level;
[0m
SQLResult: [33;1m[1;3m[(1002, 'Murphy', 'Diane', None, 'President', 0), (1056, 'Patterson', 'Mary', 1002, 'VP Sales', 1), (1076, 'Firrelli', 'Jeff', 1002, 'VP Marketing

' The hierarchical relationship between employees and their subordinates in the classicmodels.employees table is as follows: \n\nemployeeNumber\tlastName\tfirstName\treportsTo\tjobTitle\tlevel\n1002\tMurphy\tDiane\tNone\tPresident\t0\n1056\tPatterson\tMary\t1002\tVP Sales\t1\n1076\tFirrelli\tJeff\t1002\tVP Marketing\t1\n1102\tBondur\tGerard\t1056\tSale Manager (EMEA)\t2\n1143\tBow\tAnthony\t1056\tSales Manager (NA)\t2\n1621\tNishi\tMami\t1056\tSales Rep\t2\n1088\tPatterson\tWilliam\t1056\tSales Manager (APAC)\t2\n1401\tCastillo\tPamela\t1102\tSales Rep\t3\n1286\tTseng\tFoon Yue\t1143\tSales Rep\t3\n1501\tBott\tLarry\t1102\tSales Rep\t3\n1323\tVanauf\tGeorge\t1143\tSales Rep\t3\n1504\tJones\tBarry\t1102\tSales Rep\t3\n1625\tK'

MultiLanguag Prompt Example: Chinese(Manderin)

In [13]:
db_chain.run('''
写一个SQL查询，查找classicmodels.employees表中员工及其下属的层级关系，并按层级排序，请使用递归CTE，输出使用英文，并且识别列名放在输出里。
''')



[1m> Entering new SQLDatabaseChain chain...[0m

写一个SQL查询，查找classicmodels.employees表中员工及其下属的层级关系，并按层级排序，请使用递归CTE，输出使用英文，并且识别列名放在输出里。

SQLQuery:[32;1m[1;3m 
WITH RECURSIVE employee_hierarchy (employeeNumber, lastName, firstName, reportsTo, level) AS (
    SELECT employeeNumber, lastName, firstName, reportsTo, 0 AS level
    FROM employees
    WHERE reportsTo IS NULL
    UNION ALL
    SELECT e.employeeNumber, e.lastName, e.firstName, e.reportsTo, eh.level + 1
    FROM employees e
    INNER JOIN employee_hierarchy eh
    ON e.reportsTo = eh.employeeNumber
)
SELECT employeeNumber, lastName, firstName, reportsTo, level
FROM employee_hierarchy
ORDER BY level;
[0m
SQLResult: [33;1m[1;3m[(1002, 'Murphy', 'Diane', None, 0), (1056, 'Patterson', 'Mary', 1002, 1), (1076, 'Firrelli', 'Jeff', 1002, 1), (1102, 'Bondur', 'Gerard', 1056, 2), (1143, 'Bow', 'Anthony', 1056, 2), (1621, 'Nishi', 'Mami', 1056, 2), (1088, 'Patterson', 'William', 1056, 2), (1401, 'Castillo', 'Pamela', 1102, 3), (1286,

' 员工及其下属的层级关系按层级排序为：1002 (Murphy, Diane), 1056 (Patterson, Mary), 1076 (Firrelli, Jeff), 1102 (Bondur, Gerard), 1143 (Bow, Anthony), 1621 (Nishi, Mami), 1088 (Patterson, William), 1401 (Castillo, Pamela), 1286 (Tseng, Foon Yue), 1501 (Bott, Larry), 1323 (Vanauf, George), 1504 (Jones, Barry), 1625 (Kato, Yoshimi), 1611 (Fixter, Andy), 1702 (Gerard, Martin), 1612 (Marsh, Peter), 1165 (Jennings, Leslie), 1619 (King, Tom), 1166 (Thompson, Leslie), 1337 (Bondur, Loui), 1188 (Firrelli, Julie), 1370 (Hernandez, Gerard), 1216 (Patterson, Steve).'

In [12]:
OPENAI_API_KEY = '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'

In [15]:
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)

In [16]:
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)

In [34]:
db_chain.run('''
            write an SQL query to find the most frequently purchased product name for each customer. 
             If there is a tie, return the product name with the smaller lexicographic order.
             ''')



[1m> Entering new SQLDatabaseChain chain...[0m

            write an SQL query to find the most frequently purchased product name for each customer. 
             If there is a tie, return the product name with the smaller lexicographic order.
             
SQLQuery:[32;1m[1;3m 
            SELECT c.name, p.name 
            FROM clients c 
            JOIN payments pa ON c.client_id = pa.client_id 
            JOIN payment_methods p ON pa.payment_method = p.payment_method_id 
            GROUP BY c.name, p.name 
            ORDER BY COUNT(*) DESC, p.name ASC 
            LIMIT 3;
            [0m
SQLResult: [33;1m[1;3m[('Topiclounge', 'Credit Card'), ('Yadel', 'Credit Card'), ('Topiclounge', 'Cash')][0m
Answer:[32;1m[1;3m The most frequently purchased product name for each customer is Credit Card, followed by Cash.[0m
[1m> Finished chain.[0m


' The most frequently purchased product name for each customer is Credit Card, followed by Cash.'

## Customize Prompt
You can also customize the prompt that is used. Here is an example prompting it to understand that foobar is the same as the Employee table

In [18]:
from langchain.prompts.prompt import PromptTemplate

_DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Use the following format:

Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"

Only use the following tables:

{table_info}

If someone asks for the table foobar, they really mean the employee table.

Question: {input}"""
PROMPT = PromptTemplate(
    input_variables=["input", "table_info", "dialect"], template=_DEFAULT_TEMPLATE
)

In [19]:
db_chain = SQLDatabaseChain(llm=llm, database=db, prompt=PROMPT, verbose=True)

In [21]:
db_chain.run("How many client locate in NY?")



[1m> Entering new SQLDatabaseChain chain...[0m
How many client locate in NY?
SQLQuery:[32;1m[1;3m SELECT COUNT(*) FROM clients WHERE state = 'NY';[0m
SQLResult: [33;1m[1;3m[(1,)][0m
Answer:[32;1m[1;3m There is 1 client located in NY.[0m
[1m> Finished chain.[0m


' There is 1 client located in NY.'

## Return Intermediate Steps

You can also return the intermediate steps of the SQLDatabaseChain. This allows you to access the SQL statement that was generated, as well as the result of running that against the SQL Database.

In [24]:
db_chain = SQLDatabaseChain(llm=llm, database=db, prompt=PROMPT, verbose=True, return_intermediate_steps=True)

In [25]:
result = db_chain("How many client locate in NY?")
result["intermediate_steps"]



[1m> Entering new SQLDatabaseChain chain...[0m
How many client locate in NY?
SQLQuery:[32;1m[1;3m SELECT COUNT(*) FROM clients WHERE state = 'NY';[0m
SQLResult: [33;1m[1;3m[(1,)][0m
Answer:[32;1m[1;3m There is 1 client located in NY.[0m
[1m> Finished chain.[0m


[" SELECT COUNT(*) FROM clients WHERE state = 'NY';", '[(1,)]']

## Choosing how to limit the number of rows returned
If you are querying for several rows of a table you can select the maximum number of results you want to get by using the 'top_k' parameter (default is 10). This is useful for avoiding query results that exceed the prompt max length or consume tokens unnecessarily.

In [26]:
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True, top_k=3)

In [11]:
db_chain.run("What are some example tracks by composer Johann Sebastian Bach?")



[1m> Entering new SQLDatabaseChain chain...[0m
What are some example tracks by composer Johann Sebastian Bach? 
SQLQuery:[32;1m[1;3m SELECT Name, Composer FROM Track WHERE Composer LIKE '%Johann Sebastian Bach%' LIMIT 3;[0m
SQLResult: [33;1m[1;3m[('Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Johann Sebastian Bach'), ('Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria', 'Johann Sebastian Bach'), ('Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude', 'Johann Sebastian Bach')][0m
Answer:[32;1m[1;3m Some example tracks by composer Johann Sebastian Bach are 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria', and 'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude'.[0m
[1m> Finished chain.[0m


' Some example tracks by composer Johann Sebastian Bach are \'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace\', \'Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria\', and \'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude\'.'

## Adding example rows from each table
Sometimes, the format of the data is not obvious and it is optimal to include a sample of rows from the tables in the prompt to allow the LLM to understand the data before providing a final query. Here we will use this feature to let the LLM know that artists are saved with their full names by providing two rows from the `Track` table.

In [12]:
db = SQLDatabase.from_uri(
    "sqlite:///../../../../notebooks/Chinook.db",
    include_tables=['Track'], # we include only one table to save tokens in the prompt :)
    sample_rows_in_table_info=2)

The sample rows are added to the prompt after each corresponding table's column information:

In [13]:
print(db.table_info)


CREATE TABLE "Track" (
	"TrackId" INTEGER NOT NULL, 
	"Name" NVARCHAR(200) NOT NULL, 
	"AlbumId" INTEGER, 
	"MediaTypeId" INTEGER NOT NULL, 
	"GenreId" INTEGER, 
	"Composer" NVARCHAR(220), 
	"Milliseconds" INTEGER NOT NULL, 
	"Bytes" INTEGER, 
	"UnitPrice" NUMERIC(10, 2) NOT NULL, 
	PRIMARY KEY ("TrackId"), 
	FOREIGN KEY("MediaTypeId") REFERENCES "MediaType" ("MediaTypeId"), 
	FOREIGN KEY("GenreId") REFERENCES "Genre" ("GenreId"), 
	FOREIGN KEY("AlbumId") REFERENCES "Album" ("AlbumId")
)
/*
2 rows from Track table:
TrackId	Name	AlbumId	MediaTypeId	GenreId	Composer	Milliseconds	Bytes	UnitPrice
1	For Those About To Rock (We Salute You)	1	1	1	Angus Young, Malcolm Young, Brian Johnson	343719	11170334	0.99
2	Balls to the Wall	2	2	1	None	342562	5510424	0.99
*/


  sample_rows = connection.execute(command)


In [14]:
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)

In [15]:
db_chain.run("What are some example tracks by Bach?")



[1m> Entering new SQLDatabaseChain chain...[0m
What are some example tracks by Bach? 
SQLQuery:[32;1m[1;3m SELECT Name FROM Track WHERE Composer LIKE '%Bach%' LIMIT 5;[0m
SQLResult: [33;1m[1;3m[('American Woman',), ('Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace',), ('Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria',), ('Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude',), ('Toccata and Fugue in D Minor, BWV 565: I. Toccata',)][0m
Answer:[32;1m[1;3m Some example tracks by Bach are 'American Woman', 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria', 'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude', and 'Toccata and Fugue in D Minor, BWV 565: I. Toccata'.[0m
[1m> Finished chain.[0m


' Some example tracks by Bach are \'American Woman\', \'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace\', \'Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria\', \'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude\', and \'Toccata and Fugue in D Minor, BWV 565: I. Toccata\'.'

### Custom Table Info
In some cases, it can be useful to provide custom table information instead of using the automatically generated table definitions and the first `sample_rows_in_table_info` sample rows. For example, if you know that the first few rows of a table are uninformative, it could help to manually provide example rows that are more diverse or provide more information to the model. It is also possible to limit the columns that will be visible to the model if there are unnecessary columns. 

This information can be provided as a dictionary with table names as the keys and table information as the values. For example, let's provide a custom definition and sample rows for the Track table with only a few columns:

In [16]:
custom_table_info = {
    "Track": """CREATE TABLE Track (
	"TrackId" INTEGER NOT NULL, 
	"Name" NVARCHAR(200) NOT NULL,
	"Composer" NVARCHAR(220),
	PRIMARY KEY ("TrackId")
)
/*
3 rows from Track table:
TrackId	Name	Composer
1	For Those About To Rock (We Salute You)	Angus Young, Malcolm Young, Brian Johnson
2	Balls to the Wall	None
3	My favorite song ever	The coolest composer of all time
*/"""
}

In [17]:
db = SQLDatabase.from_uri(
    "sqlite:///../../../../notebooks/Chinook.db",
    include_tables=['Track', 'Playlist'],
    sample_rows_in_table_info=2,
    custom_table_info=custom_table_info)

print(db.table_info)


CREATE TABLE "Playlist" (
	"PlaylistId" INTEGER NOT NULL, 
	"Name" NVARCHAR(120), 
	PRIMARY KEY ("PlaylistId")
)
/*
2 rows from Playlist table:
PlaylistId	Name
1	Music
2	Movies
*/

CREATE TABLE Track (
	"TrackId" INTEGER NOT NULL, 
	"Name" NVARCHAR(200) NOT NULL,
	"Composer" NVARCHAR(220),
	PRIMARY KEY ("TrackId")
)
/*
3 rows from Track table:
TrackId	Name	Composer
1	For Those About To Rock (We Salute You)	Angus Young, Malcolm Young, Brian Johnson
2	Balls to the Wall	None
3	My favorite song ever	The coolest composer of all time
*/


Note how our custom table definition and sample rows for `Track` overrides the `sample_rows_in_table_info` parameter. Tables that are not overridden by `custom_table_info`, in this example `Playlist`, will have their table info gathered automatically as usual.

In [18]:
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)
db_chain.run("What are some example tracks by Bach?")



[1m> Entering new SQLDatabaseChain chain...[0m
What are some example tracks by Bach? 
SQLQuery:[32;1m[1;3m SELECT Name, Composer FROM Track WHERE Composer LIKE '%Bach%' LIMIT 5;[0m
SQLResult: [33;1m[1;3m[('American Woman', 'B. Cummings/G. Peterson/M.J. Kale/R. Bachman'), ('Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Johann Sebastian Bach'), ('Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria', 'Johann Sebastian Bach'), ('Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude', 'Johann Sebastian Bach'), ('Toccata and Fugue in D Minor, BWV 565: I. Toccata', 'Johann Sebastian Bach')][0m
Answer:[32;1m[1;3m Some example tracks by Bach are 'American Woman', 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria', 'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude', and 'Toccata and Fugue in D Minor, BWV 565: I. Toccata'.[0m
[1m> Finished chain.[0m


' Some example tracks by Bach are \'American Woman\', \'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace\', \'Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria\', \'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude\', and \'Toccata and Fugue in D Minor, BWV 565: I. Toccata\'.'

## SQLDatabaseSequentialChain

Chain for querying SQL database that is a sequential chain.

The chain is as follows:

    1. Based on the query, determine which tables to use.
    2. Based on those tables, call the normal SQL database chain.

This is useful in cases where the number of tables in the database is large.

In [20]:
from langchain.chains import SQLDatabaseSequentialChain
db = SQLDatabase.from_uri("sqlite:///../../../../notebooks/Chinook.db")

In [21]:
chain = SQLDatabaseSequentialChain.from_llm(llm, db, verbose=True)

In [22]:
chain.run("How many employees are also customers?")



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['Customer', 'Employee'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
How many employees are also customers? 
SQLQuery:[32;1m[1;3m SELECT COUNT(*) FROM Employee INNER JOIN Customer ON Employee.EmployeeId = Customer.SupportRepId;[0m
SQLResult: [33;1m[1;3m[(59,)][0m
Answer:[32;1m[1;3m 59 employees are also customers.[0m
[1m> Finished chain.[0m

[1m> Finished chain.[0m


' 59 employees are also customers.'