# Comprehensive Database Tutorial and Demonstration

This notebook provides a detailed walkthrough of our custom Database class, showcasing its features and functionalities. We'll cover everything from basic table access to advanced querying and analysis.

## Table of Contents
1. [Setup and Initialization](#1-setup-and-initialization)
2. [Exploring Database Structure](#2-exploring-database-structure)
3. [Accessing Tables and Views](#3-accessing-tables-and-views)
4. [Creating and Managing Views](#4-creating-and-managing-views)
5. [Merging Data](#5-merging-data)
6. [Querying the Database](#6-querying-the-database)
7. [Advanced Analysis](#7-advanced-analysis)
8. [Error Handling and Best Practices](#8-error-handling-and-best-practices)
9. [Performance Considerations](#9-performance-considerations)
10. [Conclusion and Next Steps](#10-conclusion-and-next-steps)

## 1. Setup and Initialization

In [1]:
import database_functions as func
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# Create a Database instance
db = func.Database()
print("Database initialized successfully.")

Database initialized successfully.


## 2. Exploring Database Structure

In [2]:
# List all tables in the database
print("Available tables:")
for table_name in db.tables.keys():
    print(f"- {table_name}")

# Display basic information about each table
for table_name, table in db.tables.items():
    print(f"\nTable: {table_name}")
    print(f"  Rows: {table().shape[0]}")
    print(f"  Columns: {table.get_columns()}")
    print(f"  Available views: {', '.join(table.list_views())}")

Available tables:
- ballot
- demo
- fascility
- medicare
- voter

Table: ballot
  Rows: 58
  Columns: ['county_name', 'yes_count_2020', 'no_count_2020', 'total_count_2020', 'yes_perc_2020', 'no_perc_2020', 'yes_count_2018', 'no_count_2018', 'total_count_2018', 'yes_perc_2018', 'no_perc_2018', 'yes_count_2022', 'no_count_2022', 'total_count_2022', 'yes_perc_2022', 'no_perc_2022']
  Available views: 2018, 2020, 2022, all_counts, all_percentages

Table: demo
  Rows: 58
  Columns: ['county_name', 'under_5_years', 'age_5_9', 'age_10_14', 'age_15_19', 'age_20_24', 'age_25_29', 'age_30_34', 'age_35_39', 'age_40_44', 'age_45_49', 'age_50_54', 'age_55_59', 'age_60_64', 'age_65_69', 'age_70_74', 'age_75_79', 'age_80_84', 'age_85_plus', 'age_16_plus', 'age_18_plus', 'age_21_plus', 'age_62_plus', 'age_65_plus', 'male_population', 'under_5_years_1', 'age_5_9_1', 'age_10_14_1', 'age_15_19_1', 'age_20_24_1', 'age_25_29_1', 'age_30_34_1', 'age_35_39_1', 'age_40_44_1', 'age_45_49_1', 'age_50_54_1', 'ag

## 3. Accessing Tables and Views

In [3]:
# Access a specific table
voter_table = db.voter   # retruns a Table class
voter_table_df = db.voter() # returns a dataframe
voter_table_df = db.dataframes['voter'] # returns a dataframe

# Return a list of column names in a Table class
    # table_name.get_columns() -> list[column_name: str]
voter_table.get_columns()

# Return a list of available views in a Table class
    # table_name.list_views() -> list[view_name: str]
voter_table.list_views()

# Access a specific view in 2 ways from a table class:
voter_2022 = voter_table.get_view('2022')  # table_name.get_view(view_name)
voter_2022 = voter_table._2022  # add prefix '_' to the view name starts with digit

# Access a specific view in 2 ways from the database class: 

voter_2022 = db.get_view('voter', '2022')   # db.get_view(table_name, view_name)
voter_2022 = db.voter._2022   # db.table_name.view_name

voter_2022.head()

Unnamed: 0,county_name,eligible_2022,total_registered_2022,democratic_2022,republican_2022,american_independent_2022,green_2022,libertarian_2022,peace_and_freedom_2022,unknown_2022,other_2022,no_party_preference_2022
0,Alameda,1089154,881491,0.556651,0.110469,0.01862,0.007602,0.005286,0.003049,3.4e-05,0.006646,0.291643
1,Alpine,939,758,0.411609,0.270449,0.032982,0.006596,0.007916,0.002639,0.0,0.003958,0.263852
2,Amador,27117,22305,0.287962,0.439901,0.042233,0.004573,0.013226,0.002735,0.000269,0.002286,0.206815
3,Butte,171771,122741,0.349052,0.341817,0.034552,0.007626,0.011259,0.00312,0.002623,0.009752,0.240197
4,Calaveras,36101,29591,0.273698,0.41445,0.045453,0.006353,0.015106,0.003177,0.00294,0.007908,0.230915


## 4. Creating and Managing Views

In [7]:
# Add a new view to the voter table
# db.add_view(table_name, view_name, columns)
db.add_view('voter', 'registered_voters', ['county_name', 'total_registered_2018', 'total_registered_2020', 'total_registered_2022'])
# db.voter.add_view('registered_voters', ['county_name', 'total_registered_2018', 'total_registered_2020', 'total_registered_2022'])

# Verify the new view
registered_voters = db.get_view('voter', 'registered_voters')
registered_voters.head()


Unnamed: 0,county_name,total_registered_2018,total_registered_2020,total_registered_2022
0,Alameda,881491,881491,881491
1,Alpine,758,758,758
2,Amador,22305,22305,22305
3,Butte,122741,122741,122741
4,Calaveras,29591,29591,29591


## 5. Merging Data

In [7]:
# Merge multiple views
# db.merge_views(list[(table_name, view_name), (table_name, view_name), ...], key = 'column_name')
merged_views = db.merge_views([('demo', 'population'), ('voter', 'registered_voters')])
# merged_views.head()

# Merge all tables
db_merged = db.merge_all()
# db_merged.head()

## 6. Querying the Database

In [None]:
# Query the merged database
    # db.query(conditions, columns)
    # conditions: dict[column_name: function]

# Query for counties with population over 1 million
    # conditions: populatio_january_2023 > 1000000
    # columns: ['county_name', 'population_january_2023', 'median_household_income_2021']
    
large_counties = db.query({'population_january_2023': lambda x: x > 1000000}, 
                          ['county_name', 'population_january_2023', 'median_household_income_2021'])
large_counties.head()

# Query for multiple conditions using a dictionary
conditions = {'population_january_2023': lambda x: x > 1000000,
              'median_household_income_2021': lambda x: x > 90000}
large_high_income_counties = db.query(conditions,
    ['county_name', 'population_january_2023', 'median_household_income_2021']
)
large_high_income_counties.head()


## 7. Advanced Analysis

In [None]:

# Analyze relationship between population and voter registration
merged_data = db.merge_views([('demo', 'population'), ('voter', 'registered_voters')])
merged_data['registration_rate_2022'] = merged_data['total_registered_2022'] / merged_data['population_january_2023'] * 100

plt.figure(figsize=(10, 6))
plt.scatter(merged_data['population_january_2023'], merged_data['registration_rate_2022'])
plt.title('Population vs. Voter Registration Rate (2022)')
plt.xlabel('Population')
plt.ylabel('Voter Registration Rate (%)')
plt.xscale('log')  # Use log scale for population
plt.tight_layout()
plt.show()

# Calculate correlation
correlation = merged_data['population_january_2023'].corr(merged_data['registration_rate_2022'])
print(f"\nCorrelation between population and voter registration rate: {correlation:.2f}")

## 8. Error Handling and Best Practices

In [None]:
# Demonstrate error handling
try:
    db.get_view('non_existent_table', 'some_view')
except ValueError as e:
    print("Error:", str(e))

try:
    db.get_view('voter', 'non_existent_view')
except ValueError as e:
    print("Error:", str(e))

# Best practice: Check if a view exists before trying to access it
def safe_get_view(db, table_name, view_name):
    if table_name in db.tables:
        table = db.tables[table_name]
        if view_name in table.list_views():
            return table.get_view(view_name)
        else:
            print(f"View '{view_name}' not found in table '{table_name}'")
    else:
        print(f"Table '{table_name}' not found in the database")
    return None

# Example usage of safe_get_view
safe_view = safe_get_view(db, 'voter', '2022')
if safe_view is not None:
    print("Successfully retrieved the '2022' view from the 'voter' table")
    print(safe_view.head())

safe_get_view(db, 'non_existent_table', 'some_view')
safe_get_view(db, 'voter', 'non_existent_view')

## 9. Performance Considerations

In [10]:
import time

def measure_time(func):
    start_time = time.time()
    result = func()
    end_time = time.time()
    print(f"Operation took {end_time - start_time:.4f} seconds")
    return result

print("Time to merge all tables:")
merged_all = measure_time(db.merge_all)

print("\nTime to query large counties:")
large_counties = measure_time(lambda: db.query({'population_january_2023': lambda x: x > 1000000}))

print("\nTime to access a view:")
voter_2022 = measure_time(lambda: db.get_view('voter', '2022'))

# Comparing performance of different operations
print("\nComparing performance of different operations:")
print("1. Accessing a single table:")
measure_time(lambda: db.voter())

print("\n2. Accessing a view:")
measure_time(lambda: db.get_view('voter', '2022'))

print("\n3. Merging two views:")
measure_time(lambda: db.merge_views([('demo', 'population'), ('voter', 'registered_voters')]))

print("\n4. Querying the merged dataset:")
measure_time(lambda: db.query({'median_household_income_2021': lambda x: x > 100000}))


Time to merge all tables:
Operation took 0.0000 seconds

Time to query large counties:
Operation took 0.0010 seconds

Time to access a view:
Operation took 0.0010 seconds

Comparing performance of different operations:
1. Accessing a single table:
Operation took 0.0000 seconds

2. Accessing a view:
Operation took 0.0000 seconds

3. Merging two views:
Operation took 0.0020 seconds

4. Querying the merged dataset:
Operation took 0.0010 seconds


Unnamed: 0,county_name,yes_count_2020,no_count_2020,total_count_2020,yes_perc_2020,no_perc_2020,yes_count_2018,no_count_2018,total_count_2018,yes_perc_2018,...,total_registered_2022,democratic_2022,republican_2022,american_independent_2022,green_2022,libertarian_2022,peace_and_freedom_2022,unknown_2022,other_2022,no_party_preference_2022
0,Alameda,329873.0,413277.0,743150.0,0.443885,0.556115,275550.0,280735.0,556285.0,0.49534,...,881491.0,0.556651,0.110469,0.01862,0.007602,0.005286,0.003049,3.4e-05,0.006646,0.291643
6,Contra Costa,212213.0,350108.0,562321.0,0.377388,0.622612,173937.0,235739.0,409676.0,0.424572,...,621309.0,0.488631,0.198653,0.025844,0.004738,0.006745,0.002614,0.001827,0.001606,0.269341
20,Marin,51877.0,94100.0,145977.0,0.355378,0.644622,52537.0,70415.0,122952.0,0.427297,...,160944.0,0.555628,0.148176,0.022163,0.008171,0.00635,0.001541,0.001591,0.003007,0.253374
29,Orange,486724.0,984586.0,1471310.0,0.33081,0.66919,395773.0,652270.0,1048043.0,0.37763,...,1560111.0,0.335633,0.347196,0.02517,0.003087,0.008739,0.002508,0.000129,0.002124,0.275413
30,Placer,56847.0,170430.0,227277.0,0.250122,0.749878,54794.0,115496.0,170290.0,0.321769,...,234732.0,0.282463,0.41705,0.029983,0.003873,0.016201,0.0019,0.002428,0.004051,0.242051
37,San Francisco,194596.0,222161.0,416757.0,0.466929,0.533071,202728.0,149499.0,352227.0,0.575561,...,500051.0,0.568378,0.066347,0.016078,0.008149,0.005965,0.002742,0.003844,0.002018,0.326479
40,San Mateo,135581.0,220379.0,355960.0,0.380888,0.619112,118561.0,156221.0,274782.0,0.431473,...,399351.0,0.503099,0.152207,0.020335,0.004825,0.005992,0.002301,0.002194,0.002194,0.306853
42,Santa Clara,319139.0,491885.0,811024.0,0.393501,0.606499,263653.0,322779.0,586432.0,0.449588,...,885764.0,0.455637,0.172635,0.019929,0.003859,0.00679,0.002556,8.2e-05,0.001787,0.336725


## 10. Conclusion and Next Steps