# Optimize SQL Prompts: Context Engineering for Better Query Results

Previously, we tested the below prompt:

>I am working with a dataset that contains information about the Red30 Tech company's product sales online. 
>I want to create a SQL query that calculates the total sales by state.

While it returns a SQL code, it was generic and cannot be used out of the box as it missing critical context. 

In this example we will see how we can improve the LLM's response with providing additional context.

Let's start by importing the openai and os libraries:


In [1]:
from openai import OpenAI
import os

And the API key:

In [2]:
openai_api_key = os.environ.get("OPENAI_API_KEY")


In this example, we will use GPT5 model and follow the same workflow as before:

In [3]:
max_tokens = 5000

And set the client object:

In [4]:
client = OpenAI(api_key=openai_api_key)

Next, let's add to the original prompt additional information with the table and columns names:

In [5]:
content_prompt = """
I am working with a dataset that contains information about the Red30 Tech company's product sales online. 

I want to create a SQL query that calculates the total sales by state.

The table name is retail_sales, and the CustState and OrderTotal columns represent the state name and corresponding sales in USD.
"""


Let's now use the GPT5 model to send the prompt:

In [6]:
response = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": content_prompt}],
    max_completion_tokens=max_tokens,
)


In [7]:
print(response.choices[0].message.content)

Here’s a straightforward query to sum sales by state and list states from highest to lowest total:

SELECT
  CustState,
  SUM(OrderTotal) AS total_sales_usd
FROM retail_sales
GROUP BY CustState
ORDER BY total_sales_usd DESC;

Optional: If some rows have a NULL state and you want to group them as “Unknown”:

SELECT
  COALESCE(CustState, 'Unknown') AS CustState,
  SUM(OrderTotal) AS total_sales_usd
FROM retail_sales
GROUP BY COALESCE(CustState, 'Unknown')
ORDER BY total_sales_usd DESC;


In [8]:
import pandas as pd
import duckdb as db

In [9]:
retail_sales = pd.read_csv("../data/Red30 Tech US online retail sales.csv")

In [10]:
query = """
SELECT
  CustState,
  SUM(OrderTotal) AS total_sales_usd
FROM retail_sales
GROUP BY CustState
ORDER BY total_sales_usd DESC;
"""

In [11]:
db.sql(query)


┌────────────────┬────────────────────┐
│   CustState    │  total_sales_usd   │
│    varchar     │       double       │
├────────────────┼────────────────────┤
│ New York       │  616925.8400000001 │
│ California     │  540285.5199999997 │
│ Florida        │  394483.7399999998 │
│ Texas          │ 349925.48000000004 │
│ North Carolina │  345118.2600000001 │
│ Pennsylvania   │ 290120.09000000014 │
│ Minnesota      │ 267308.95000000007 │
│ Washington     │ 257140.79999999993 │
│ Virginia       │ 248797.89000000004 │
│ Georgia        │ 237607.26000000004 │
│    ·           │          ·         │
│    ·           │          ·         │
│    ·           │          ·         │
│ New Mexico     │ 36696.380000000005 │
│ New Jersey     │            35971.9 │
│ North Dakota   │            29907.5 │
│ West Virginia  │           23967.97 │
│ Nebraska       │           22604.36 │
│ Wyoming        │  9712.550000000001 │
│ Wisconsin      │  7345.029999999999 │
│ South Dakota   │            4071.62 │
