https://python.langchain.com/docs/use_cases/sql/quickstart/

### **Test the sqldb**

In [1]:
from langchain_community.utilities import SQLDatabase
from pyprojroot import here
import warnings
warnings.filterwarnings("ignore")

**Connecting to the sqldb**

In [2]:
db_path = str(here("data")) + "/sqldb.db"
db = SQLDatabase.from_uri(f"sqlite:///{db_path}")

In [3]:
db

<langchain_community.utilities.sql_database.SQLDatabase at 0x159f66a96a0>

In [4]:
# validate the connection to the vectordb
print(db.dialect)

sqlite


In [5]:
print(db.get_usable_table_names())

['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']


In [7]:

db._execute("SELECT * FROM Artist LIMIT 10;")

[{'ArtistId': 1, 'Name': 'AC/DC'},
 {'ArtistId': 2, 'Name': 'Accept'},
 {'ArtistId': 3, 'Name': 'Aerosmith'},
 {'ArtistId': 4, 'Name': 'Alanis Morissette'},
 {'ArtistId': 5, 'Name': 'Alice In Chains'},
 {'ArtistId': 6, 'Name': 'Antônio Carlos Jobim'},
 {'ArtistId': 7, 'Name': 'Apocalyptica'},
 {'ArtistId': 8, 'Name': 'Audioslave'},
 {'ArtistId': 9, 'Name': 'BackBeat'},
 {'ArtistId': 10, 'Name': 'Billy Cobham'}]

### **Test the access to the environment variables**

In [None]:
import os
from langchain_openai import ChatOpenAI


os.environ['GITHUB_TOKEN'] = "your_actual_github_token"  # Replace with your actual GitHub token
token = os.environ.get("GITHUB_TOKEN")
endpoint = "https://models.github.ai/inference"
model_name = "openai/gpt-4.1-mini" 

if not token:
    raise ValueError("GITHUB_TOKEN environment variable not set. Please provide a valid token.")

llm =ChatOpenAI(
    model_name=model_name,
    openai_api_key=token,
    openai_api_base=endpoint,
    temperature=0.5,
)

### **Test your GPT model**

In [9]:
from langchain.schema import HumanMessage
response = llm.invoke([
    HumanMessage(content="What's the capital of Bangladesh?")
])

print(response.content)

The capital of Bangladesh is Dhaka.


In [10]:
response

AIMessage(content='The capital of Bangladesh is Dhaka.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 13, 'total_tokens': 22, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_4f3d32ad4e', 'id': 'chatcmpl-C08QVj5las0uYk9hOfClgCSWLWjkW', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--c4bba3b4-bcfc-490a-bf15-047663a5a361-0', usage_metadata={'input_tokens': 13, 'output_tokens': 9, 'total_tokens': 22, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

### **1. SQL query chain**

In [11]:
from langchain.chains import create_sql_query_chain

chain = create_sql_query_chain(llm, db,)
response = chain.invoke({"question": "How many employees are there"})
print(response)

Question: How many employees are there
SQLQuery: SELECT COUNT("EmployeeId") AS "EmployeeCount" FROM "Employee" LIMIT 5


In [13]:
import re
def extract_sql_query_simple(response):
    # Use regex to find SQL query after "SQLQuery:" label
    match = re.search(r"SQLQuery:\s*(.*?)(?:\n|$)", response, re.IGNORECASE)
    if match:
        sql_query = match.group(1).strip()
        # print("SQL Query found:")
        # print(sql_query)
        return sql_query
    return None

In [14]:
query=extract_sql_query_simple(response)
print(query)

SELECT COUNT("EmployeeId") AS "EmployeeCount" FROM "Employee" LIMIT 5


Execute the query to make sure it’s valid

In [15]:
db.run(query)

'[(8,)]'

In [16]:
db._execute(query)

[{'EmployeeCount': 8}]

In [12]:
chain.get_prompts()[0].pretty_print()

You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most 5 results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Pay attention to use date('now') function to get the current date, if the question involves "today".

Use the following format:

Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result

### **Add QuerySQLDataBaseTool to the chain**
Execute SQL query

**This is the most dangerous part of creating a SQL chain.** Consider carefully if it is OK to run automated queries over your data. Minimize the database connection permissions as much as possible. Consider adding a human approval step to you chains before query execution (see below).

We can use the QuerySQLDatabaseTool to easily add query execution to our chain:

In [None]:
# userquestion->create_sql_query_chain->response->Regex(extract_sql_query_simple)->query->db.run(query)->db._execute(query)

In [19]:
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool

write_query = create_sql_query_chain(llm, db)
execute_query = QuerySQLDataBaseTool(db=db)

chain = write_query  |extract_sql_query_simple | execute_query

chain.invoke({"question": "How many employees are there"})

'[(8,)]'

In [20]:
chain.invoke({"question": "Give me the names of all employees in the company (dont use any limit)"})

"[('Andrew', 'Adams'), ('Nancy', 'Edwards'), ('Jane', 'Peacock'), ('Margaret', 'Park'), ('Steve', 'Johnson'), ('Michael', 'Mitchell'), ('Robert', 'King'), ('Laura', 'Callahan')]"

In [21]:
chain.invoke({"question": "GIve me all employee title ? (dont use any limit)"})

"[('General Manager',), ('Sales Manager',), ('Sales Support Agent',), ('IT Manager',), ('IT Staff',)]"

### **Answer the question in a user friendly manner**

In [22]:
chain=write_query |extract_sql_query_simple

In [None]:
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

answer_prompt = PromptTemplate.from_template(
    """Given the following user question, corresponding SQL query, and SQL result, answer the user question.

Question: {question}
SQL Query: {query}
SQL Result: {result}

Answer: """
)

answer = answer_prompt | llm | StrOutputParser()

query=chain.invoke({"question": "How many employees are there"})
sql_query_response = db._execute(query)

answer.invoke({"question": "How many employees are there","query": query, "result": sql_query_response})

'There are 8 employees.'

In [25]:
query=chain.invoke({"question": "Give me the names of all employees in the company (dont use any limit)"})
sql_query_response = db._execute(query)

answer.invoke({"question": "Give me the names of all employees in the company (dont use any limit)","query": query, "result": sql_query_response})

'The names of all employees in the company are:\n\n- Andrew Adams  \n- Nancy Edwards  \n- Jane Peacock  \n- Margaret Park  \n- Steve Johnson  \n- Michael Mitchell  \n- Robert King  \n- Laura Callahan'

### **2. Agents**

Agent which provides a more flexible way of interacting with SQL databases. The main advantages of using the SQL Agent are:

- It can answer questions based on the databases’ schema as well as on the databases’ content (like describing a specific table).
- It can recover from errors by running a generated query, catching the traceback and regenerating it correctly.
- It can answer questions that require multiple dependent queries.
- It will save tokens by only considering the schema from relevant tables.

To initialize the agent, we use create_sql_agent function. This agent contains the SQLDatabaseToolkit which contains tools to:

- Create and execute queries
- Check query syntax
- Retrieve table descriptions
- …

In [27]:
from langchain_community.agent_toolkits import create_sql_agent

agent_executor = create_sql_agent(llm, db=db, agent_type="openai-tools", verbose=True)

In [28]:
agent_executor.invoke(
    {
        "input": "List the total sales per country. Which country's customers spent the most?"
    }
)



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3m
Invoking: `sql_db_list_tables` with `{}`


[0m[38;5;200m[1;3mAlbum, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track[0m[32;1m[1;3m
Invoking: `sql_db_schema` with `{'table_names': 'Customer, Invoice'}`


[0m[33;1m[1;3m
CREATE TABLE "Customer" (
	"CustomerId" INTEGER NOT NULL, 
	"FirstName" NVARCHAR(40) NOT NULL, 
	"LastName" NVARCHAR(20) NOT NULL, 
	"Company" NVARCHAR(80), 
	"Address" NVARCHAR(70), 
	"City" NVARCHAR(40), 
	"State" NVARCHAR(40), 
	"Country" NVARCHAR(40), 
	"PostalCode" NVARCHAR(10), 
	"Phone" NVARCHAR(24), 
	"Fax" NVARCHAR(24), 
	"Email" NVARCHAR(60) NOT NULL, 
	"SupportRepId" INTEGER, 
	PRIMARY KEY ("CustomerId"), 
	FOREIGN KEY("SupportRepId") REFERENCES "Employee" ("EmployeeId")
)

/*
3 rows from Customer table:
CustomerId	FirstName	LastName	Company	Address	City	State	Country	PostalCode	Phone	Fax	Email	SupportRepId
1	Luís	Gonçalves	Embraer - Emp

{'input': "List the total sales per country. Which country's customers spent the most?",
 'output': 'The total sales per country for the top 10 countries are as follows:\n- USA: 523.06\n- Canada: 303.96\n- France: 195.1\n- Brazil: 190.1\n- Germany: 156.48\n- United Kingdom: 112.86\n- Czech Republic: 90.24\n- Portugal: 77.24\n- India: 75.26\n- Chile: 46.62\n\nThe customers from the USA spent the most in total.'}

In [29]:
agent_executor.invoke({"input": "Describe the playlisttrack table"})
# agent_executor.invoke("Describe the playlisttrack table")



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3m
Invoking: `sql_db_list_tables` with `{}`


[0m[38;5;200m[1;3mAlbum, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track[0m[32;1m[1;3m
Invoking: `sql_db_schema` with `{'table_names': 'PlaylistTrack'}`


[0m[33;1m[1;3m
CREATE TABLE "PlaylistTrack" (
	"PlaylistId" INTEGER NOT NULL, 
	"TrackId" INTEGER NOT NULL, 
	PRIMARY KEY ("PlaylistId", "TrackId"), 
	FOREIGN KEY("TrackId") REFERENCES "Track" ("TrackId"), 
	FOREIGN KEY("PlaylistId") REFERENCES "Playlist" ("PlaylistId")
)

/*
3 rows from PlaylistTrack table:
PlaylistId	TrackId
1	3402
1	3389
1	3390
*/[0m[32;1m[1;3mThe PlaylistTrack table contains two columns: PlaylistId and TrackId. Both columns are integers and together form the primary key of the table. The PlaylistId column is a foreign key referencing the Playlist table, and the TrackId column is a foreign key referencing the Track table. This table represents 

{'input': 'Describe the playlisttrack table',
 'output': 'The PlaylistTrack table contains two columns: PlaylistId and TrackId. Both columns are integers and together form the primary key of the table. The PlaylistId column is a foreign key referencing the Playlist table, and the TrackId column is a foreign key referencing the Track table. This table represents the relationship between playlists and tracks, indicating which tracks belong to which playlists.'}

In [30]:
agent_executor.invoke({"input": "Tell me about the Artist table. What is the primary key?"})



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3m
Invoking: `sql_db_list_tables` with `{}`


[0m[38;5;200m[1;3mAlbum, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track[0m[32;1m[1;3m
Invoking: `sql_db_schema` with `{'table_names': 'Artist'}`


[0m[33;1m[1;3m
CREATE TABLE "Artist" (
	"ArtistId" INTEGER NOT NULL, 
	"Name" NVARCHAR(120), 
	PRIMARY KEY ("ArtistId")
)

/*
3 rows from Artist table:
ArtistId	Name
1	AC/DC
2	Accept
3	Aerosmith
*/[0m[32;1m[1;3mThe Artist table has two columns: ArtistId and Name. The primary key of the Artist table is ArtistId.[0m

[1m> Finished chain.[0m


{'input': 'Tell me about the Artist table. What is the primary key?',
 'output': 'The Artist table has two columns: ArtistId and Name. The primary key of the Artist table is ArtistId.'}