Multi-Database Query System

A natural language to SQL query system that can work with multiple SQLite databases. The system uses OpenAI's GPT model to generate SQL queries based on natural language questions and routes queries to the appropriate database.

System Overview

The system consists of several components:

Database servers for each database
SQL query generator
Client interface with database routing
Natural language processing for database selection

Database Sources

Database 1: E-commerce Dataset by Olist

Source: Kaggle - E-commerce Dataset by Olist
Description: Brazilian E-commerce data from Olist Store containing information about 100k orders from 2016 to 2018
Tables:
- orders: Order details and timestamps
- customers: Customer information and location
- products: Product details and measurements
- sellers: Seller information and location
- order_items: Order items, prices, and freight values

Database 2: US County Demographics and 2016 Election Data

Source: Kaggle - 2016 US Election Dataset
Description: US County-level demographic data combined with 2016 election results
Tables:
- county_facts: Demographic information by county
- county_facts_dictionary: Column descriptions and metadata
- election_results: County-level election results

Setup Instructions

Create a virtual environment and activate it:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up your environment variables:

Create a .env file in the root directory
Add your OpenAI API key:

OPENAI_API_KEY=your_api_key_here

Download and place the SQLite databases:

Download Database 1 from Olist E-commerce Dataset
Download Database 2 from 2016 US Election Dataset
Place the files in the data/ directory:

data/
  ├── database1.sqlite  # Rename the Olist database file
  └── database2.sqlite  # Rename the Election database file

Usage

Start the system:

python client.py

Enter your questions in natural language. The system will:

Determine which database to use
Generate an appropriate SQL query
Execute the query
Display the results in a formatted table

Example questions:

For Database 1 (E-commerce):
- "How many orders were placed in 2018?"
- "What are the top selling products by revenue?"
- "Show me the average delivery time of products by state"
For Database 2 (Demographics):
- "What are the top 5 counties by population?"
- "Show me the election results in Florida counties"
- "What's the median age in counties with population over 1 million?"

Customizing for Different Databases

If you want to use different databases, you'll need to modify the following files:

1. Database Server Files (`database1_server.py` and `database2_server.py`)

Update the database file paths in get_schema():

conn = sqlite3.connect("data/your_database.sqlite")

2. Client Configuration (`client.py`)

Update database1_info and database2_info with your database schemas:

self.database1_info = {
    "name": "Your Database Name",
    "description": "Your database description",
    "tables": {
        "table_name": ["column1", "column2", ...],
        ...
    }
}

Modify keywords and weights in determine_database():

database1_keywords = {
    "keyword1": weight1,
    "keyword2": weight2,
    ...
}

3. SQL Generator (`sql_generator.py`)

Update the system prompt rules for your databases:

"Important rules for query generation:
1. For Database 1:
   - Your specific rules here
2. For Database 2:
   - Your specific rules here"

File Structure

.
├── client.py           # Main client interface
├── sql_generator.py    # SQL query generation
├── database1_server.py # Server for first database
├── database2_server.py # Server for second database
├── requirements.txt    # Python dependencies
├── .env               # Environment variables
└── data/              # Database files
    ├── database1.sqlite
    └── database2.sqlite

Key Components to Modify When Changing Databases

Database Schema:
- Update the schema information in the database server files
- Modify the schema information in client.py
Keyword Weights:
- Adjust keywords and their weights in client.py to match your new databases
- Add domain-specific keywords that indicate which database to use
Query Generation Rules:
- Update the rules in sql_generator.py to match your new database structures
- Add specific JOIN conditions and table relationships
Table Display:
- Modify the result formatting in client.py if your new databases require different display formats

Troubleshooting

If queries aren't routing to the correct database:
- Check the keyword weights in client.py
- Add more specific keywords for your databases
- Adjust the scoring system if needed
If SQL queries are incorrect:
- Verify the schema information is up to date
- Check the rules in sql_generator.py
- Ensure table relationships are properly defined
If results aren't displaying properly:
- Modify the display formatting in client.py
- Adjust column widths and alignment as needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Database Query System

System Overview

Database Sources

Database 1: E-commerce Dataset by Olist

Database 2: US County Demographics and 2016 Election Data

Setup Instructions

Usage

Customizing for Different Databases

1. Database Server Files (`database1_server.py` and `database2_server.py`)

2. Client Configuration (`client.py`)

3. SQL Generator (`sql_generator.py`)

File Structure

Key Components to Modify When Changing Databases

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
client.py		client.py
database1_server.py		database1_server.py
database2_server.py		database2_server.py
requirements.txt		requirements.txt
sql_generator.py		sql_generator.py

KrishGopani/MCP_Text_2_SQL

Folders and files

Latest commit

History

Repository files navigation

Multi-Database Query System

System Overview

Database Sources

Database 1: E-commerce Dataset by Olist

Database 2: US County Demographics and 2016 Election Data

Setup Instructions

Usage

Customizing for Different Databases

1. Database Server Files (database1_server.py and database2_server.py)

2. Client Configuration (client.py)

3. SQL Generator (sql_generator.py)

File Structure

Key Components to Modify When Changing Databases

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

1. Database Server Files (`database1_server.py` and `database2_server.py`)

2. Client Configuration (`client.py`)

3. SQL Generator (`sql_generator.py`)

Packages