# Fissio Data - Getting Started

This notebook demonstrates the DuckDB + Jupyter workflow for nuclear industry data analysis.

In [None]:
# Install DuckDB (run once)
!pip install duckdb pandas pyarrow --quiet

In [None]:
import duckdb
import pandas as pd

# Connect to DuckDB (creates file if it doesn't exist)
con = duckdb.connect('/home/jovyan/data/fissio.duckdb')
print("Connected to DuckDB")

## Create Sample Schema

Example schema for nuclear power plant data.

In [None]:
# Create schemas for organized data architecture
con.execute("CREATE SCHEMA IF NOT EXISTS plants")
con.execute("CREATE SCHEMA IF NOT EXISTS regulatory")
con.execute("CREATE SCHEMA IF NOT EXISTS market")

print("Schemas created: plants, regulatory, market")

In [None]:
# Create sample plants table
con.execute("""
CREATE TABLE IF NOT EXISTS plants.facilities (
    plant_id VARCHAR PRIMARY KEY,
    name VARCHAR NOT NULL,
    plant_type VARCHAR,  -- nuclear, gas, coal
    capacity_mw DECIMAL(10,2),
    status VARCHAR,  -- operating, construction, decommissioned
    latitude DECIMAL(9,6),
    longitude DECIMAL(9,6),
    state VARCHAR(2),
    nrc_region INTEGER,
    commercial_date DATE,
    license_expiry DATE
)
""")

print("Created plants.facilities table")

In [None]:
# Insert sample data
sample_plants = [
    ('AP1000-1', 'Vogtle Unit 3', 'nuclear', 1117, 'operating', 33.1422, -81.7597, 'GA', 2, '2023-07-31', '2063-07-31'),
    ('AP1000-2', 'Vogtle Unit 4', 'nuclear', 1117, 'operating', 33.1422, -81.7597, 'GA', 2, '2024-04-29', '2064-04-29'),
    ('BWR-1', 'Peach Bottom Unit 2', 'nuclear', 1400, 'operating', 39.7589, -76.2692, 'PA', 1, '1974-02-01', '2054-08-08'),
    ('PWR-1', 'Seabrook Station', 'nuclear', 1246, 'operating', 42.8986, -70.8489, 'NH', 1, '1990-08-19', '2050-03-15'),
    ('CCGT-1', 'Cricket Valley Energy', 'gas', 1100, 'operating', 41.5847, -73.5708, 'NY', None, '2020-04-01', None),
]

con.executemany("""
INSERT OR REPLACE INTO plants.facilities VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", sample_plants)

print(f"Inserted {len(sample_plants)} sample plants")

In [None]:
# Query the data
df = con.execute("SELECT * FROM plants.facilities").fetchdf()
df

## Export to Parquet

Parquet files can be read directly by Superset for dashboards.

In [None]:
# Export to parquet for Superset
con.execute("""
COPY (SELECT * FROM plants.facilities) 
TO '/home/jovyan/data/plants_facilities.parquet' (FORMAT PARQUET)
""")

print("Exported to data/plants_facilities.parquet")

## Next Steps

1. **Add more data**: NRC inspection records, market prices, outage data
2. **Explore in notebooks**: Build analysis workflows
3. **Dashboard in Superset**: Connect DuckDB or parquet files for visualizations

### Connecting Superset to DuckDB

In Superset, add a database connection with:
- SQLAlchemy URI: `duckdb:////app/data/fissio.duckdb`
- Or use parquet files directly via DuckDB's parquet reader

In [None]:
# Close connection when done
con.close()