Discover the power of Light-weight Text-2-SQL on Pandas Data Frames, a user-friendly Python module that transforms natural language questions into SQL queries for effortless data analysis. With the pandasqlite library, you can load CSV files into Pandas DataFrames, ingest them, and query data using simple questions like "What are the categories for products sold in Italy?" or "Who are the top 10 customers by turnover last month?" The tool generates and executes SQL queries instantly, returning results in JSON format for seamless integration into your workflows. Perfect for data analysts and developers, this lightweight solution eliminates the need for complex SQL knowledge or database setup, making it ideal for quick insights from sales, customer, or inventory data. Optimized for simplicity, it streamlines data exploration, boosts productivity, and empowers users to uncover valuable insights with minimal code, enhancing efficiency in any data-driven project.
- Clone repository
git clone git@github.com:IBM/pandasqlite.git
- Install PandaSQLite
cd pandasqlite
pip install .
- Choose language model:
- Get your watsonx.ai API key from https://www.ibm.com/products/watsonx-ai, then continue with step 4.
- OR use a custom language model (see below)
- Set environment variables:
WXAI_PROJECT_ID
- Set to your watsonx.ai project IDWXAI_API_KEY
- Set to your watsonx.ai API keyPANDASQLITE_CACHE_DIR
- (optional) Set the cache directory location
import json
import pandas as pd
from pandasqlite import pandasqlite as pdsql
# load CSV as pandas dataframe(s)
df1 = pd.read_csv("my.csv")
df2 = ...
# ingest dataframe(s)
ingestion, db, _ = pdsql.ingest([df1, df2, ...])
# ask some questions
for question in [
"Show the categories for products sold in Italy.",
"Return the top 10 customers with highest turnover last month, sorted alphabetically by last name.",
"What's the average number of items sold per purchase?",
"Generate an interesting query."
]:
sql = pdsql.text2sql(question, ingestion) # generate query
result = pd.read_sql(sql, db) # execute query
print(question)
print(json.dumps(result.to_json()) + "\n")
You can also take a look at this example.
Not ready to use watsonx.ai? You can plug-in a custom language model callback function as a parameter:
def my_model_callback(input):
# resolve input string by call to local model or external service
output = ...
return output
pdsql.ingest([df1, df2, ...], my_model_callback) # ingest with custom model
sql = pdsql.text2sql(question, ingestion, my_model_callback) # generate query with custom model
Daniel Karl I. Weidele and Gaetano Rossiello. PandaSQLite: Light-weight Text-2-SQL on Pandas Data Frames in Python. GitHub, https://github.com/IBM/pandasqlite. 2025.