# Database Connection using nl2sql modules

This notebook demonstrates how to use the `nl2sql.database` modules for PostgreSQL connection instead of hardcoding the database URI.

## Settings

In [1]:
import os

if os.getcwd().endswith("notebooks"):
    os.chdir("..")
print(os.getcwd())


/Users/cmcoutosilva/Projects/github/nl2sql-agent


In [2]:
import pandas as pd
from sqlalchemy import text
from nl2sql.database.postgresql import PostgreSQLConnector

## Database Connection using PostgreSQLConnector

Instead of hardcoding the database URI, we'll use the `PostgreSQLConnector` class which handles:
- Parameter resolution from environment variables or config files
- URI creation with proper encoding
- Engine creation with best practices


In [3]:
# Create PostgreSQL connector
# The connector will automatically resolve parameters from:
# 1. Environment variables (DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD)
# 2. Config files (if config_path is provided)
# 3. Direct parameters (if provided)

connector = PostgreSQLConnector(
    host="localhost",
    port=5432,
    database="olist_ecommerce",
    username="postgres",
    password="postgres"
) # alternatively, use nl2sql.config.load_database_config()

print(f"Database parameters: {connector.params}")
print(f"Database URI: {connector.create_uri(connector.params)}")

Database parameters: host='localhost' port=5432 database='olist_ecommerce' username='postgres'
Database URI: postgresql+psycopg://postgres:postgres@localhost:5432/olist_ecommerce


In [4]:
# Test connection using the connector's engine
with connector.engine.connect() as conn:
    result = conn.execute(text("SELECT 1 as test_value"))
    print(f"Connection test: {result.fetchall()}")

Connection test: [(1,)]


## Database Inspection using the Connector

The connector provides an inspector object for easy database exploration.

In [5]:
# Access the inspector from the connector
inspector = connector.inspector

# Get schema information
schemas = inspector.get_schema_names()
print(f"Available schemas: {schemas}")

# Inspect tables in each schema
for schema in ["ecommerce", "marketing"]:
    if schema in schemas:
        tables = inspector.get_table_names(schema=schema)
        print(f"\nTables in {schema} schema: {len(tables)}")
        for table in tables:
            print(f"  - {table}")

Available schemas: ['ecommerce', 'information_schema', 'marketing', 'public']

Tables in ecommerce schema: 9
  - geolocation
  - product_category_name_translations
  - customers
  - orders
  - order_items
  - products
  - sellers
  - order_payments
  - order_reviews

Tables in marketing schema: 2
  - marketing_qualified_leads
  - closed_deals


## Detailed Table Inspection

Let's inspect a specific table using the connector's inspector.

In [6]:
# Inspect the orders table
target_table = "orders"
target_schema = "ecommerce"

# Get table comment
table_comment = inspector.get_table_comment(target_table, schema=target_schema)
print(f"Table description: {table_comment.get('text', 'No description')}")

# Get columns with their properties
columns = inspector.get_columns(table_name=target_table, schema=target_schema)
df_columns = pd.DataFrame(columns)
print(f"\nColumns in {target_table}:")
print(df_columns[['name', 'type', 'nullable', 'comment']].to_markdown(index=False))

Table description: This is the core dataset. From each order you might find all other information.

Columns in orders:
| name                          | type      | nullable   | comment                                                                                 |
|:------------------------------|:----------|:-----------|:----------------------------------------------------------------------------------------|
| order_id                      | TEXT      | False      | unique identifier of the order.                                                         |
| customer_id                   | TEXT      | False      | key to the customer dataset. Each order has a unique customer_id.                       |
| order_status                  | TEXT      | True       | Reference to the order status (delivered, shipped, etc).                                |
| order_purchase_timestamp      | TIMESTAMP | True       | Shows the purchase timestamp.                                                

In [7]:
# Get primary and foreign keys
primary_keys = inspector.get_pk_constraint(table_name=target_table, schema=target_schema)
foreign_keys = inspector.get_foreign_keys(table_name=target_table, schema=target_schema)

print(f"Primary keys: {primary_keys.get('constrained_columns', [])}")
print(f"\nForeign keys:")
for fk in foreign_keys:
    print(f"  - {fk['constrained_columns']} -> {fk['referred_schema']}.{fk['referred_table']}.{fk['referred_columns']}")

Primary keys: ['order_id']

Foreign keys:
  - ['customer_id'] -> ecommerce.customers.['customer_id']


## Querying Data

Now let's query some actual data using the connector.

In [8]:
# Query sample data from orders table
query = """
SELECT 
    order_id,
    customer_id,
    order_status,
    order_purchase_timestamp
FROM ecommerce.orders 
LIMIT 5
"""

df_orders = pd.read_sql(query, connector.engine)
print("Sample orders data:")
print(df_orders.to_markdown(index=False))

Sample orders data:
| order_id                         | customer_id                      | order_status   | order_purchase_timestamp   |
|:---------------------------------|:---------------------------------|:---------------|:---------------------------|
| e481f51cbdc54678b7cc49136f2d6af7 | 9ef432eb6251297304e76186b10a928d | delivered      | 2017-10-02 10:56:33        |
| 53cdb2fc8bc7dce0b6741e2150273451 | b0830fb4747a6c6d20dea0b8c802d7ef | delivered      | 2018-07-24 20:41:37        |
| 47770eb9100c2d0c44946d9cf07ec65d | 41ce2a54c0b03bf3443c3d931a367089 | delivered      | 2018-08-08 08:38:49        |
| 949d5b44dbf5de918fe9c16f97b45f8a | f88197465ea7920adcdbec7375364d82 | delivered      | 2017-11-18 19:28:06        |
| ad21c59c0840e6cb83a9ceb5573f8159 | 8ab97904e6daea8866dbdbc4fb7aad2c | delivered      | 2018-02-13 21:18:39        |


In [9]:
# Get some summary statistics
summary_query = """
SELECT 
    COUNT(*) as total_orders,
    COUNT(DISTINCT customer_id) as unique_customers,
    COUNT(DISTINCT order_status) as status_types
FROM ecommerce.orders
"""

df_summary = pd.read_sql(summary_query, connector.engine)
print("Database summary:")
print(df_summary.to_markdown(index=False))

Database summary:
|   total_orders |   unique_customers |   status_types |
|---------------:|-------------------:|---------------:|
|          99441 |              99441 |              8 |


## Summary

This notebook demonstrated how to use the `nl2sql.database.postgresql.PostgreSQLConnector` class for:

1. **Structured database connections**: Using the connector class instead of hardcoded URIs
2. **Parameter management**: Automatic resolution from environment variables, config files, or direct parameters
3. **Database inspection**: Built-in inspector for exploring database structure
4. **Query execution**: Using pandas with the connector's engine for data analysis

The PostgreSQLConnector provides a clean, maintainable way to handle database connections with proper error handling and resource management.