# How to better prompt when doing SQL question-answering

This guide will show you how to write better instructions (prompts) to get a computer program to create accurate SQL queries for you. We'll concentrate on giving the program the right information about your specific database.

We'll explore:

1. How the type of database you're using (like MySQL or PostgreSQL) changes the instructions the program needs.

2. How to clearly show the program the structure of your database using a tool called SQLDatabase.get_context.

3. How to provide examples (called "few-shot examples") to help the program learn and create better queries.

Essentially, we're going to learn how to give the computer the best possible instructions so it can understand your database and write the correct SQL queries for you.

# 1. Install necessary dependencies

In [1]:
!apt-get update
!apt-get install sqlite3

Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:7 https://r2u.stat.illinois.edu/ubuntu jammy/main all Packages [8,704 kB]
Get:8 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2,639 kB]
Hit:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Get:11 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Hit:12 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:13 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 P

In [6]:
%pip install --upgrade --quiet  langchain langchain-community langchain-experimental langchain-openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.4/2.5 MB[0m [31m11.8 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.5/2.5 MB[0m [31m38.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m29.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/209.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.2/209.2 kB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.3/55.3 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m414.3/414.3 kB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━

# 2. Setup Colab environment

This code snippet sets up an environment in Google Colab to create and populate an SQLite database called Chinook.db. It does this by:

Mounting your Google Drive. Creating a specific directory in your Google Drive to store the database. Downloading a SQL script from a GitHub repository. Using the sqlite3 command-line tool to create the database and execute the SQL script, thereby populating the database with data. This is a typical workflow for setting up a data science environment in Google Colab, where you need to access and create files in your Google Drive and use command-line tools to interact with data.

### Mounts Google Drive

a. from google.colab import drive: Imports the drive module from the google.colab library, which is specific to Google Colaboratory.

b. drive.mount('/content/drive'): This line mounts your Google Drive to the Colab runtime. After executing this, you'll be prompted to authorize Colab to access your Google Drive. Once authorized, your Drive files become accessible within the Colab environment under the /content/drive directory.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Sets up the directory

a. import os: Imports the os module for interacting with the operating system, specifically for file and directory operations.

b. directory_path = '/content/drive/MyDrive/Colab Notebooks/Chinook': Defines the path to a directory within your Google Drive. This is where the SQLite database will be created.

c. if not os.path.exists(directory_path): os.makedirs(directory_path): This checks if the specified directory exists. If it doesn't, it creates the directory and any necessary parent directories. This is important to ensure that the database file can be created in the correct location.

In [3]:
import os

# Path to the directory within your Google Drive
directory_path = '/content/drive/MyDrive/Colab Notebooks/Chinook'

# Create the directory if it doesn't exist
if not os.path.exists(directory_path):
    os.makedirs(directory_path)

### Downloads and Creates the SQLite Database

This line executes a shell command that downloads a SQL script and uses it to create an SQLite database.

In [4]:
!curl -s https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql | sqlite3 '/content/drive/MyDrive/Colab Notebooks/Chinook/Chinook.db'

### Testing the connection

This code snippet establishes a connection to an SQLite database (Chinook.db) using LangChain's SQLDatabase utility, prints the database dialect and usable table names, and then executes a simple SQL query to retrieve and display the first 20 rows from the Artist table. This demonstrates how to use LangChain to interact with SQL databases and execute SQL queries.

In [8]:
from langchain_community.utilities import SQLDatabase

db = SQLDatabase.from_uri("sqlite:////content/drive/MyDrive/Colab Notebooks/Chinook/Chinook.db", sample_rows_in_table_info=3)
print(db.dialect)
print(db.get_usable_table_names())
print(db.run("SELECT * FROM Artist LIMIT 10;"))

sqlite
['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']
[(1, 'AC/DC'), (2, 'Accept'), (3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains'), (6, 'Antônio Carlos Jobim'), (7, 'Apocalyptica'), (8, 'Audioslave'), (9, 'BackBeat'), (10, 'Billy Cobham')]


# 3. Dialect-specific prompting

In essence, this code retrieves the names of the different prompt templates that LangChain provides for interacting with SQL databases. These prompt templates are pre-designed instructions that guide language models in translating natural language questions into executable SQL queries.

In [9]:
from langchain.chains.sql_database.prompt import SQL_PROMPTS

list(SQL_PROMPTS)

['crate',
 'duckdb',
 'googlesql',
 'mssql',
 'mysql',
 'mariadb',
 'oracle',
 'postgresql',
 'sqlite',
 'clickhouse',
 'prestodb']

### Select chat model

In [26]:
%pip install -qU "langchain[openai]"

In [11]:
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

Enter API key for OpenAI: ··········


In [12]:
from langchain.chains import create_sql_query_chain

chain = create_sql_query_chain(llm, db)
chain.get_prompts()[0].pretty_print()

You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most 5 results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Pay attention to use date('now') function to get the current date, if the question involves "today".

Use the following format:

Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result

# 4. Table definitions and example rows

In most SQL chains, we'll need to feed the model at least part of the database schema. Without this it won't be able to write valid queries. Our database comes with some convenience methods to give us the relevant context. Specifically, we can get the table names, their schemas, and a sample of rows from each table.

This code snippet retrieves context information from a database, displays the keys of the context data, and then displays the information specifically related to the database's table structure. This is a common pattern when working with database interactions, as it allows you to inspect the database schema and understand the available data before performing queries.

In [13]:
context = db.get_context()
print(list(context))
print(context["table_info"])

['table_info', 'table_names']

CREATE TABLE "Album" (
	"AlbumId" INTEGER NOT NULL, 
	"Title" NVARCHAR(160) NOT NULL, 
	"ArtistId" INTEGER NOT NULL, 
	PRIMARY KEY ("AlbumId"), 
	FOREIGN KEY("ArtistId") REFERENCES "Artist" ("ArtistId")
)

/*
3 rows from Album table:
AlbumId	Title	ArtistId
1	For Those About To Rock We Salute You	1
2	Balls to the Wall	2
3	Restless and Wild	2
*/


CREATE TABLE "Artist" (
	"ArtistId" INTEGER NOT NULL, 
	"Name" NVARCHAR(120), 
	PRIMARY KEY ("ArtistId")
)

/*
3 rows from Artist table:
ArtistId	Name
1	AC/DC
2	Accept
3	Aerosmith
*/


CREATE TABLE "Customer" (
	"CustomerId" INTEGER NOT NULL, 
	"FirstName" NVARCHAR(40) NOT NULL, 
	"LastName" NVARCHAR(20) NOT NULL, 
	"Company" NVARCHAR(80), 
	"Address" NVARCHAR(70), 
	"City" NVARCHAR(40), 
	"State" NVARCHAR(40), 
	"Country" NVARCHAR(40), 
	"PostalCode" NVARCHAR(10), 
	"Phone" NVARCHAR(24), 
	"Fax" NVARCHAR(24), 
	"Email" NVARCHAR(60) NOT NULL, 
	"SupportRepId" INTEGER, 
	PRIMARY KEY ("CustomerId"), 
	FOREIGN KEY("S

This code retrieves the main prompt template from a LangChain SQL database chain, inserts the database schema information into the prompt, and then displays a truncated version of the resulting prompt. This is a common debugging or inspection step to see how the prompt will look when it's sent to the language model.

In [14]:
prompt_with_context = chain.get_prompts()[0].partial(table_info=context["table_info"])
print(prompt_with_context.pretty_repr()[:1500])

You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most 5 results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Pay attention to use date('now') function to get the current date, if the question involves "today".

Use the following format:

Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result

# 5. Few-shot examples

### Define a list of example question-SQL query pairs

1. example list

a. examples: This is a list of dictionaries. Each dictionary represents a single example.

b. Each dictionary has two keys:

"input": The natural language question.

"query": The corresponding SQL query that answers the question.

2. example pairs

The list contains several example pairs, covering a variety of SQL query types:

a. Simple Queries:

"List all artists." -> SELECT * FROM Artist; (Retrieves all rows from the Artist table)

"List all customers from Canada." -> SELECT * FROM Customer WHERE Country = 'Canada'; (Filters rows based on a condition)

"Find the total number of invoices." -> SELECT COUNT(*) FROM Invoice; (Counts the number of rows)

"How many employees are there" -> SELECT COUNT(*) FROM "Employee" (Counts the number of rows from a table containing a space in its name, which requires quoting)

b. Subqueries:

"Find all albums for the artist 'AC/DC'." -> SELECT * FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'AC/DC'); (Uses a subquery to find the ArtistId)

"List all tracks in the 'Rock' genre." -> SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock'); (Similar subquery pattern)

c. Aggregate Functions:

"Find the total duration of all tracks." -> SELECT SUM(Milliseconds) FROM Track; (Calculates the sum of a column)

"How many tracks are there in the album with ID 5?" -> SELECT COUNT(*) FROM Track WHERE AlbumId = 5; (Counts rows with a specific condition)

"Who are the top 5 customers by total purchase?" -> SELECT CustomerId, SUM(Total) AS TotalPurchase FROM Invoice GROUP BY CustomerId ORDER BY TotalPurchase DESC LIMIT 5; (Uses SUM, GROUP BY, ORDER BY, and LIMIT)

d. Filtering:

"List all tracks that are longer than 5 minutes." -> SELECT * FROM Track WHERE Milliseconds > 300000; (Filters based on a numerical condition)

"Which albums are from the year 2000?" -> SELECT * FROM Album WHERE strftime('%Y', ReleaseDate) = '2000'; (Filters based on a date condition, using strftime)

Summary

This code defines a set of examples that map natural language questions to their corresponding SQL queries. These examples showcase a range of SQL query types and can be used for various purposes, such as training, testing, or demonstrating a natural language to SQL system. They provide a clear mapping between human language and database queries.

In [15]:
examples = [
    {"input": "List all artists.", "query": "SELECT * FROM Artist;"},
    {
        "input": "Find all albums for the artist 'AC/DC'.",
        "query": "SELECT * FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'AC/DC');",
    },
    {
        "input": "List all tracks in the 'Rock' genre.",
        "query": "SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');",
    },
    {
        "input": "Find the total duration of all tracks.",
        "query": "SELECT SUM(Milliseconds) FROM Track;",
    },
    {
        "input": "List all customers from Canada.",
        "query": "SELECT * FROM Customer WHERE Country = 'Canada';",
    },
    {
        "input": "How many tracks are there in the album with ID 5?",
        "query": "SELECT COUNT(*) FROM Track WHERE AlbumId = 5;",
    },
    {
        "input": "Find the total number of invoices.",
        "query": "SELECT COUNT(*) FROM Invoice;",
    },
    {
        "input": "List all tracks that are longer than 5 minutes.",
        "query": "SELECT * FROM Track WHERE Milliseconds > 300000;",
    },
    {
        "input": "Who are the top 5 customers by total purchase?",
        "query": "SELECT CustomerId, SUM(Total) AS TotalPurchase FROM Invoice GROUP BY CustomerId ORDER BY TotalPurchase DESC LIMIT 5;",
    },
    {
        "input": "Which albums are from the year 2000?",
        "query": "SELECT * FROM Album WHERE strftime('%Y', ReleaseDate) = '2000';",
    },
    {
        "input": "How many employees are there",
        "query": 'SELECT COUNT(*) FROM "Employee"',
    },
]

### Setup the FewShotPromptTemplate

How it Works:

1. Examples: The examples list provides the LLM with a few examples of how to translate natural language questions into SQL queries.
2. Formatting: The example_prompt template is used to format each example into the desired structure.
3. Prefix and Suffix: The prefix and suffix provide context and instructions to the LLM.
4. Placeholders: The placeholders ({input}, {query}, {top_k}, {table_info}) allow you to dynamically insert data into the prompt.

This code creates a FewShotPromptTemplate that provides the LLM with a few examples of question-SQL query pairs, along with instructions and context. This allows the LLM to learn from the examples and generate accurate SQL queries for new questions. The use of placeholders makes the prompt template flexible and reusable.

In [16]:
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate

example_prompt = PromptTemplate.from_template("User input: {input}\nSQL query: {query}")
prompt = FewShotPromptTemplate(
    examples=examples[:5],
    example_prompt=example_prompt,
    prefix="You are a SQLite expert. Given an input question, create a syntactically correct SQLite query to run. Unless otherwise specificed, do not return more than {top_k} rows.\n\nHere is the relevant table info: {table_info}\n\nBelow are a number of examples of questions and their corresponding SQL queries.",
    suffix="User input: {input}\nSQL query: ",
    input_variables=["input", "top_k", "table_info"],
)

This code demonstrates how to format a FewShotPromptTemplate with specific input values and then print the resulting prompt. This allows you to see how the prompt will look when it's sent to the language model, which is useful for debugging and understanding the prompt structure.

In [17]:
print(prompt.format(input="How many artists are there?", top_k=3, table_info="foo"))

You are a SQLite expert. Given an input question, create a syntactically correct SQLite query to run. Unless otherwise specificed, do not return more than 3 rows.

Here is the relevant table info: foo

Below are a number of examples of questions and their corresponding SQL queries.

User input: List all artists.
SQL query: SELECT * FROM Artist;

User input: Find all albums for the artist 'AC/DC'.
SQL query: SELECT * FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'AC/DC');

User input: List all tracks in the 'Rock' genre.
SQL query: SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');

User input: Find the total duration of all tracks.
SQL query: SELECT SUM(Milliseconds) FROM Track;

User input: List all customers from Canada.
SQL query: SELECT * FROM Customer WHERE Country = 'Canada';

User input: How many artists are there?
SQL query: 


# 6.  Dynamic few-shot examples

In [19]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl (30.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/30.7 MB[0m [31m22.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.10.0


### Setting up the semantically similarity example selector

How it Works:

1. Embeddings Generation:

When the SemanticSimilarityExampleSelector is created, it generates embeddings for the "input" (natural language question) in each example using the OpenAIEmbeddings model.These embeddings are then stored in the FAISS vector store.

2. Similarity Search:

When a new input question is provided to the selector, it generates an embedding for the new question using the same OpenAIEmbeddings model.
It then performs a similarity search in the FAISS vector store to find the examples with the most similar embeddings.

3. Example Retrieval:

The selector returns the top k (in this case, 5) examples that are most semantically similar to the input question.

Purpose:

1. Semantic Similarity: This example selector allows the system to find examples that are semantically similar to the user's question, even if the phrasing is different.

2. Contextual Examples: By retrieving similar examples, the system can provide the LLM with relevant context and guidance for generating accurate SQL queries.

3. Improved Accuracy: Using semantic similarity can improve the accuracy of the generated SQL queries by providing the LLM with more relevant examples.

Summary:

This code sets up a semantic similarity example selector that uses OpenAI embeddings and FAISS to find the most similar example questions to a given input question. This allows the system to provide the LLM with relevant examples, which helps it generate more accurate SQL queries.

In [20]:
from langchain_community.vectorstores import FAISS
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_openai import OpenAIEmbeddings

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    OpenAIEmbeddings(),
    FAISS,
    k=5,
    input_keys=["input"],
)

This code demonstrates how to use the SemanticSimilarityExampleSelector to retrieve relevant examples based on an input question. This allows the system to provide the LLM with contextual examples, which helps it generate more accurate SQL queries.

In [21]:
example_selector.select_examples({"input": "how many artists are there?"})

[{'input': 'List all artists.', 'query': 'SELECT * FROM Artist;'},
 {'input': 'How many employees are there',
  'query': 'SELECT COUNT(*) FROM "Employee"'},
 {'input': 'How many tracks are there in the album with ID 5?',
  'query': 'SELECT COUNT(*) FROM Track WHERE AlbumId = 5;'},
 {'input': 'Which albums are from the year 2000?',
  'query': "SELECT * FROM Album WHERE strftime('%Y', ReleaseDate) = '2000';"},
 {'input': "List all tracks in the 'Rock' genre.",
  'query': "SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');"}]

How it Works (with Example Selector):

1. Dynamic Example Selection: When the prompt.format(...) method is called, the example_selector will be used to select the most semantically similar examples based on the input question.

2. Formatting: The example_prompt template is used to format each selected example into the desired structure.

3. Prefix and Suffix: The prefix and suffix provide context and instructions to the LLM.

4. Placeholders: The placeholders ({input}, {top_k}, {table_info}) allow you to dynamically insert data into the prompt.

Summary

This code creates a FewShotPromptTemplate that uses a SemanticSimilarityExampleSelector to dynamically select the most relevant examples for a given input question. This allows the system to provide the LLM with contextual examples, which helps it generate more accurate SQL queries. The use of placeholders makes the prompt template flexible and reusable.

In [22]:
prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="You are a SQLite expert. Given an input question, create a syntactically correct SQLite query to run. Unless otherwise specificed, do not return more than {top_k} rows.\n\nHere is the relevant table info: {table_info}\n\nBelow are a number of examples of questions and their corresponding SQL queries.",
    suffix="User input: {input}\nSQL query: ",
    input_variables=["input", "top_k", "table_info"],
)

This code demonstrates how to format a FewShotPromptTemplate with specific input values, using a SemanticSimilarityExampleSelector to dynamically select the examples. It then prints the resulting prompt, which would be sent to a language model to generate a SQL query. The dynamic selection of examples makes the prompt more adaptable to different types of questions.

In [23]:
print(prompt.format(input="how many artists are there?", top_k=3, table_info="foo"))

You are a SQLite expert. Given an input question, create a syntactically correct SQLite query to run. Unless otherwise specificed, do not return more than 3 rows.

Here is the relevant table info: foo

Below are a number of examples of questions and their corresponding SQL queries.

User input: List all artists.
SQL query: SELECT * FROM Artist;

User input: How many employees are there
SQL query: SELECT COUNT(*) FROM "Employee"

User input: How many tracks are there in the album with ID 5?
SQL query: SELECT COUNT(*) FROM Track WHERE AlbumId = 5;

User input: Which albums are from the year 2000?
SQL query: SELECT * FROM Album WHERE strftime('%Y', ReleaseDate) = '2000';

User input: List all tracks in the 'Rock' genre.
SQL query: SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');

User input: how many artists are there?
SQL query: 


This code demonstrates how to use a LangChain chain to generate and execute SQL queries based on natural language questions. It sets up a chain, provides a question, and then prints the results of the query.

In [24]:
chain = create_sql_query_chain(llm, db, prompt)
chain.invoke({"question": "how many artists are there?"})

'```sql\nSELECT COUNT(*) FROM Artist;\n```'

In [27]:
chain = create_sql_query_chain(llm, db, prompt)
chain.invoke({"question": "how many customers are there?"})

'```sql\nSELECT COUNT(*) FROM "Customer";\n```'

In [28]:
chain = create_sql_query_chain(llm, db, prompt)
chain.invoke({"question": "how many albums are there?"})

'```sql\nSELECT COUNT(*) FROM Album;\n```'

In [29]:
chain = create_sql_query_chain(llm, db, prompt)
chain.invoke({"question": "name the artist with most number of albums?"})

'```sql\nSELECT Artist.Name, COUNT(Album.AlbumId) AS AlbumCount\nFROM Artist\nJOIN Album ON Artist.ArtistId = Album.ArtistId\nGROUP BY Artist.ArtistId\nORDER BY AlbumCount DESC\nLIMIT 1;\n```'