# DatabaseClient Tutorial

This notebook demonstrates how to use the `DatabaseClient` class to:
1. Connect to databases (SQLite or PostgreSQL)
2. Retrieve database schema information
3. Execute SQL queries
4. Work with query results

The `DatabaseClient` provides a consistent interface regardless of the underlying database system.

## Setup

First, let's import the necessary classes and set up our environment. We'll need the `DatabaseClient` class from our project, as well as some other utilities.

In [None]:
import os
import sys
import logging
import pandas as pd
from pathlib import Path

# since notebook is outside of the src directory, we need to add the src directory to the path
project_root = Path.home() / "dev" / "data-analyser"
if project_root not in sys.path:
    sys.path.append(str(project_root))

from src.clients.db_client import DatabaseClient

logging.basicConfig(level=logging.INFO)

## Database Connection

The `DatabaseClient` class needs a connection string to initialize. The connection string format depends on the database used:

- SQLite: `sqlite:///path/to/database.db`
- PostgreSQL: `postgresql://username:password@host:port/database`

In [2]:
# define the path to our SQLite database
DB_PATH = os.path.expanduser("../data/porsche_analytics.db")

# Create the connection string
sqlite_connection_string = f"sqlite:///{DB_PATH}"

# Initialize the DatabaseClient
sqlite_client = DatabaseClient(sqlite_connection_string)

print(f"Connected to the database!")

Connected to the database!


## Database Schema
You can get database schema. This provides us with a list of tables and their columns along with data types.

In [3]:
db_schema = sqlite_client.get_database_schema()
tables = [t_name for t_name in db_schema]
print('Available tables: ', tables)


Available tables:  ['models', 'dealerships', 'customers', 'sales', 'service_records']


In [4]:
table_name = 'models'
table_columns = [column for column in db_schema[table_name]]
print(f'Table -> {table_name}\nColumns: {table_columns}')

Table -> models
Columns: [{'column_name': 'model_id', 'data_type': 'INTEGER'}, {'column_name': 'model_name', 'data_type': 'TEXT'}, {'column_name': 'model_code', 'data_type': 'TEXT'}, {'column_name': 'production_start_year', 'data_type': 'INTEGER'}, {'column_name': 'production_end_year', 'data_type': 'INTEGER'}, {'column_name': 'segment', 'data_type': 'TEXT'}, {'column_name': 'base_price', 'data_type': 'REAL'}, {'column_name': 'horsepower', 'data_type': 'INTEGER'}, {'column_name': 'body_type', 'data_type': 'TEXT'}, {'column_name': 'is_electric', 'data_type': 'INTEGER'}, {'column_name': 'description', 'data_type': 'TEXT'}]


# Executing SQL Queries
Now that we know the schema, let's execute some SQL queries. The `DatabaseClient.execute_query()` method allows us to run SQL queries and returns a `QueryResult` object.

In [5]:
query = """
SELECT *
FROM models
"""
result = sqlite_client.execute_query(query)
result

INFO:src.clients.db_client:Query executed successfully. Returned 13 rows.


QueryResult(data=[{'model_id': 1, 'model_name': '911 Carrera', 'model_code': 'P-911-CR', 'production_start_year': 1963, 'production_end_year': nan, 'segment': 'Sports Car', 'base_price': 101200.0, 'horsepower': 379, 'body_type': 'Coupe', 'is_electric': 0, 'description': 'Iconic rear-engine sports car'}, {'model_id': 2, 'model_name': '911 Turbo S', 'model_code': 'P-911-TS', 'production_start_year': 1975, 'production_end_year': nan, 'segment': 'Sports Car', 'base_price': 207000.0, 'horsepower': 640, 'body_type': 'Coupe', 'is_electric': 0, 'description': 'High-performance variant of the 911'}, {'model_id': 3, 'model_name': 'Taycan', 'model_code': 'P-TAY', 'production_start_year': 2019, 'production_end_year': nan, 'segment': 'Sedan', 'base_price': 86700.0, 'horsepower': 522, 'body_type': 'Sedan', 'is_electric': 1, 'description': 'All-electric four-door sports car'}, {'model_id': 4, 'model_name': 'Panamera', 'model_code': 'P-PAN', 'production_start_year': 2009, 'production_end_year': nan, '

The `QueryResult` object contains:
- `data`: A list of dictionaries, each representing a row
- `row_count`: The number of rows returned
- `column_names`: A list of column names
- `execution_time_ms`: The query execution time in milliseconds

In [6]:
# access QueryResult attributes
result.data

[{'model_id': 1,
  'model_name': '911 Carrera',
  'model_code': 'P-911-CR',
  'production_start_year': 1963,
  'production_end_year': nan,
  'segment': 'Sports Car',
  'base_price': 101200.0,
  'horsepower': 379,
  'body_type': 'Coupe',
  'is_electric': 0,
  'description': 'Iconic rear-engine sports car'},
 {'model_id': 2,
  'model_name': '911 Turbo S',
  'model_code': 'P-911-TS',
  'production_start_year': 1975,
  'production_end_year': nan,
  'segment': 'Sports Car',
  'base_price': 207000.0,
  'horsepower': 640,
  'body_type': 'Coupe',
  'is_electric': 0,
  'description': 'High-performance variant of the 911'},
 {'model_id': 3,
  'model_name': 'Taycan',
  'model_code': 'P-TAY',
  'production_start_year': 2019,
  'production_end_year': nan,
  'segment': 'Sedan',
  'base_price': 86700.0,
  'horsepower': 522,
  'body_type': 'Sedan',
  'is_electric': 1,
  'description': 'All-electric four-door sports car'},
 {'model_id': 4,
  'model_name': 'Panamera',
  'model_code': 'P-PAN',
  'product

In [7]:
# for convenience, convert the result to a pandas DataFrame
df = pd.DataFrame(result.data)
df

Unnamed: 0,model_id,model_name,model_code,production_start_year,production_end_year,segment,base_price,horsepower,body_type,is_electric,description
0,1,911 Carrera,P-911-CR,1963,,Sports Car,101200.0,379,Coupe,0,Iconic rear-engine sports car
1,2,911 Turbo S,P-911-TS,1975,,Sports Car,207000.0,640,Coupe,0,High-performance variant of the 911
2,3,Taycan,P-TAY,2019,,Sedan,86700.0,522,Sedan,1,All-electric four-door sports car
3,4,Panamera,P-PAN,2009,,Luxury,88400.0,325,Sedan,0,Four-door luxury sports car
4,5,Cayenne,P-CAY,2002,,SUV,69000.0,335,SUV,0,Mid-size luxury crossover SUV
5,6,Macan,P-MAC,2014,,SUV,54900.0,248,SUV,0,Compact luxury crossover SUV
6,7,718 Boxster,P-BOX,1996,,Sports Car,62000.0,300,Convertible,0,Mid-engine two-seater roadster
7,8,718 Cayman,P-CAY,2005,,Sports Car,60500.0,300,Coupe,0,Mid-engine two-seater coupe
8,9,Taycan Cross Turismo,P-TAYCT,2021,,Wagon,93700.0,469,Wagon,1,All-electric wagon variant of the Taycan
9,10,Cayenne Coupe,P-CAYC,2019,,SUV,76500.0,335,SUV Coupe,0,Coupe variant of the Cayenne SUV


**Complex Queries**


In [8]:
query = """
SELECT 
    m.model_name, 
    COUNT(s.sale_id) as total_sales,
    SUM(s.price) as total_revenue,
    AVG(s.price) as avg_price
FROM 
    models m
LEFT JOIN 
    sales s ON m.model_id = s.model_id
GROUP BY 
    m.model_name
ORDER BY 
    total_revenue DESC
"""
result = sqlite_client.execute_query(query)
sales_df = pd.DataFrame(result.data)
sales_df

INFO:src.clients.db_client:Query executed successfully. Returned 13 rows.


Unnamed: 0,model_name,total_sales,total_revenue,avg_price
0,911 Turbo S,1,215000.0,215000.0
1,Panamera,1,182000.0,182000.0
2,911 GT3,1,167500.0,167500.0
3,911 Carrera,1,110500.0,110500.0
4,Taycan Cross Turismo,1,98500.0,98500.0
5,Taycan,1,92400.0,92400.0
6,Cayenne Coupe,1,79900.0,79900.0
7,Cayenne,1,72500.0,72500.0
8,718 Boxster,1,65000.0,65000.0
9,Macan,1,59800.0,59800.0
