# Lesson 04: Store Data in SQLite Database

In this lesson, you'll initialize a SQLite database and insert your transformed data.

**Your Goal:**
- Initialize SQLite database from `src/db/schema.sql`
- Prepare properties DataFrame for database insertion (matching schema columns)
- Prepare quarterly stats DataFrame for database insertion
- Insert data into the database using pandas `.to_sql()` or SQLite cursor
- Handle data types and NULL values correctly
- Verify data was inserted correctly

**Reference:** Check `src/db/schema.sql` for table structures. You may also use `src/db/init_db.py` as a helper function.


In [None]:
# Import necessary libraries
# You'll need pandas, sqlite3, Path, sys
# Optional: import init_db from src.db.init_db if you want to use the helper function
# Add src to path if needed: sys.path.append('../src')


In [None]:
# Initialize database from schema
# Option A: Use the init_db helper function
# Hints:
# - Import init_database from db.init_db
# - Call init_database(db_path="../src/db/database.sqlite", schema_path="../src/db/schema.sql")
# - This will create the database and all tables/indexes

# Option B: Create database manually
# Hints:
# - Read schema.sql file
# - Connect to SQLite database (create if doesn't exist)
# - Parse SQL statements (separate CREATE TABLE and CREATE INDEX)
# - Execute CREATE TABLE statements first
# - Commit, then execute CREATE INDEX statements
# - Verify tables were created by querying sqlite_master


In [None]:
# Load or regenerate your transformed data
# Hints:
# - If you saved parquet files in previous lesson, load them
# - Otherwise, regenerate from Lesson 03 (you may want to copy relevant cells)
# - You need:
#   - Properties DataFrame (with property_type column)
#   - Quarterly stats DataFrame (with property_type column)
# - Print shapes and samples to verify


In [None]:
# Prepare properties DataFrame for database insertion
# Hints:
# - Review properties table schema in schema.sql
# - Select only columns that match schema (in correct order)
# - Add missing columns with None/NULL values:
#   - listing_date (we don't have this, set to None)
#   - days_on_market (not calculated, set to None)
# - Ensure data types match:
#   - TEXT for suburb, postcode, district, property_type
#   - DATE for contract_date, settlement_date
#   - REAL for sale_price
#   - INTEGER for days_on_market, contract_to_settlement_days
# - Handle NULL values (use None or pd.NA)
# - Print sample and verify column names match schema


In [None]:
# Prepare quarterly stats DataFrame for database insertion
# Hints:
# - Review suburb_quarterly table schema in schema.sql
# - Map column names from your DataFrame to schema columns
# - Rename columns if needed (e.g., sale_price_num_sales -> num_sales)
# - Add missing columns with None/NULL:
#   - contract_to_settlement_score
#   - qoq_price_change_percentage (quarter-over-quarter)
#   - yoy_price_change_percentage (year-over-year)
# - Ensure data types match schema
# - Select columns in schema order
# - Print sample and verify


In [None]:
# Insert properties data into database
# Hints:
# - Connect to database
# - Use pandas .to_sql() method:
#   - name='properties'
#   - con=connection
#   - if_exists='append' (or 'replace' if starting fresh)
#   - index=False
# - Or use SQLite cursor with executemany() for more control
# - Handle data type conversions (dates, NULLs)
# - Commit after insertion
# - Print how many records were inserted


In [None]:
# Insert quarterly stats data into database
# Hints:
# - Similar to properties insertion
# - Use .to_sql() with name='suburb_quarterly'
# - Handle if_exists appropriately
# - Commit after insertion
# - Print how many records were inserted


In [None]:
# Verify data was inserted correctly
# Hints:
# - Query count of records in each table
# - Compare counts to your DataFrame lengths
# - Sample a few records from each table
# - Check data types match expectations
# - Verify constraints (e.g., property_type CHECK constraint)
# - Print verification results
