Skip to content

KrishGopani/MCP_Text_2_SQL

Repository files navigation

Multi-Database Query System

A natural language to SQL query system that can work with multiple SQLite databases. The system uses OpenAI's GPT model to generate SQL queries based on natural language questions and routes queries to the appropriate database.

System Overview

The system consists of several components:

  • Database servers for each database
  • SQL query generator
  • Client interface with database routing
  • Natural language processing for database selection

Database Sources

Database 1: E-commerce Dataset by Olist

  • Source: Kaggle - E-commerce Dataset by Olist
  • Description: Brazilian E-commerce data from Olist Store containing information about 100k orders from 2016 to 2018
  • Tables:
    • orders: Order details and timestamps
    • customers: Customer information and location
    • products: Product details and measurements
    • sellers: Seller information and location
    • order_items: Order items, prices, and freight values

Database 2: US County Demographics and 2016 Election Data

  • Source: Kaggle - 2016 US Election Dataset
  • Description: US County-level demographic data combined with 2016 election results
  • Tables:
    • county_facts: Demographic information by county
    • county_facts_dictionary: Column descriptions and metadata
    • election_results: County-level election results

Setup Instructions

  1. Create a virtual environment and activate it:
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up your environment variables:
  • Create a .env file in the root directory
  • Add your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
  1. Download and place the SQLite databases:
data/
  ├── database1.sqlite  # Rename the Olist database file
  └── database2.sqlite  # Rename the Election database file

Usage

  1. Start the system:
python client.py
  1. Enter your questions in natural language. The system will:
  • Determine which database to use
  • Generate an appropriate SQL query
  • Execute the query
  • Display the results in a formatted table

Example questions:

  • For Database 1 (E-commerce):
    • "How many orders were placed in 2018?"
    • "What are the top selling products by revenue?"
    • "Show me the average delivery time of products by state"
  • For Database 2 (Demographics):
    • "What are the top 5 counties by population?"
    • "Show me the election results in Florida counties"
    • "What's the median age in counties with population over 1 million?"

Customizing for Different Databases

If you want to use different databases, you'll need to modify the following files:

1. Database Server Files (database1_server.py and database2_server.py)

  • Update the database file paths in get_schema():
conn = sqlite3.connect("data/your_database.sqlite")

2. Client Configuration (client.py)

  • Update database1_info and database2_info with your database schemas:
self.database1_info = {
    "name": "Your Database Name",
    "description": "Your database description",
    "tables": {
        "table_name": ["column1", "column2", ...],
        ...
    }
}
  • Modify keywords and weights in determine_database():
database1_keywords = {
    "keyword1": weight1,
    "keyword2": weight2,
    ...
}

3. SQL Generator (sql_generator.py)

  • Update the system prompt rules for your databases:
"Important rules for query generation:
1. For Database 1:
   - Your specific rules here
2. For Database 2:
   - Your specific rules here"

File Structure

.
├── client.py           # Main client interface
├── sql_generator.py    # SQL query generation
├── database1_server.py # Server for first database
├── database2_server.py # Server for second database
├── requirements.txt    # Python dependencies
├── .env               # Environment variables
└── data/              # Database files
    ├── database1.sqlite
    └── database2.sqlite

Key Components to Modify When Changing Databases

  1. Database Schema:

    • Update the schema information in the database server files
    • Modify the schema information in client.py
  2. Keyword Weights:

    • Adjust keywords and their weights in client.py to match your new databases
    • Add domain-specific keywords that indicate which database to use
  3. Query Generation Rules:

    • Update the rules in sql_generator.py to match your new database structures
    • Add specific JOIN conditions and table relationships
  4. Table Display:

    • Modify the result formatting in client.py if your new databases require different display formats

Troubleshooting

  1. If queries aren't routing to the correct database:

    • Check the keyword weights in client.py
    • Add more specific keywords for your databases
    • Adjust the scoring system if needed
  2. If SQL queries are incorrect:

    • Verify the schema information is up to date
    • Check the rules in sql_generator.py
    • Ensure table relationships are properly defined
  3. If results aren't displaying properly:

    • Modify the display formatting in client.py
    • Adjust column widths and alignment as needed

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages