Notebook 3: Feature Engineering & Aggregation
* Goal: Transform the clean df_clean (reactor-level) data into aggregated time-series datasets.

Block 1: Filter for 'Operational' Reactors

For our demand analysis, we only care about reactors that are "Operational" (i.e., consuming fuel). We'll create a new, filtered DataFrame for this.

In [16]:
# Filter df_clean to only include operational reactors
df_operational = df_clean[df_clean['Status'] == 'Operational'].copy()

print("Created 'df_operational' DataFrame with 'Operational' reactors only.")
print(f"Total rows in df_clean: {len(df_clean)}")
print(f"Operational rows in df_operational: {len(df_operational)}")

Created 'df_operational' DataFrame with 'Operational' reactors only.
Total rows in df_clean: 20830
Operational rows in df_operational: 13632


Block 2: Create Core Demand Time Series

Now we'll group the operational data by Global, National, and Technology to create our primary datasets.

In [17]:
# 1. Global Demand Time Series (by Year)
global_demand_ts = df_operational.groupby('Year')['Thermal Capacity, MWt'].sum().reset_index()
global_demand_ts = global_demand_ts.rename(columns={'Thermal Capacity, MWt': 'Total Thermal Capacity, MWt'})

# 2. National Demand Time Series (by Country, Year)
national_demand_ts = df_operational.groupby(['Country', 'Year'])['Thermal Capacity, MWt'].sum().reset_index()
national_demand_ts = national_demand_ts.rename(columns={'Thermal Capacity, MWt': 'Total Thermal Capacity, MWt'})

# 3. Technology Demand Time Series (by Type, Year)
tech_demand_ts = df_operational.groupby(['Type', 'Year'])['Thermal Capacity, MWt'].sum().reset_index()
tech_demand_ts = tech_demand_ts.rename(columns={'Thermal Capacity, MWt': 'Total Thermal Capacity, MWt'})

print("Created global, national, and technology time-series DataFrames.")
print("\nGlobal Demand Head:")
print(global_demand_ts.head())

Created global, national, and technology time-series DataFrames.

Global Demand Head:
   Year  Total Thermal Capacity, MWt
0  1969                         4755
1  1970                        11851
2  1971                        17942
3  1972                        32262
4  1973                        50505


Block 3: Create Pipeline Analysis Time Series

For this, we need to use the full df_clean to see all statuses ("Operational", "Under Construction", etc.). We will count the unique reactors in each category per year.

In [18]:
# Group by Year and Status, then count the number of unique reactors
pipeline_status_ts = df_clean.groupby(['Year', 'Status'])['Reactor name'].nunique().reset_index()
pipeline_status_ts = pipeline_status_ts.rename(columns={'Reactor name': 'Reactor Count'})

print("Created pipeline status time-series DataFrame.")
print(pipeline_status_ts.head())

Created pipeline status time-series DataFrame.
   Year              Status  Reactor Count
0     0  Under Construction             63
1  1954  Permanent Shutdown              1
2  1955  Permanent Shutdown              1
3  1956  Permanent Shutdown              2
4  1957  Permanent Shutdown              5


Block 4: Save All Aggregated Datasets

Finally, we'll save all our new DataFrames to CSV files. These files will be the sources for all our future notebooks (4, 5, 6, 7, and 8).

In [19]:
# Save all the new dataframes to CSV
global_demand_ts.to_csv('global_demand_ts.csv', index=False)
national_demand_ts.to_csv('national_demand_ts.csv', index=False)
tech_demand_ts.to_csv('tech_demand_ts.csv', index=False)
pipeline_status_ts.to_csv('pipeline_status_ts.csv', index=False)

# Also create and save the secondary electricity proxy
global_electricity_ts = df_operational.groupby('Year')['Electricity Supplied, GW.h'].sum().reset_index()
global_electricity_ts.to_csv('global_electricity_ts.csv', index=False)

print("\nNotebook 3 complete. All aggregated datasets saved to CSV files:")
print("- global_demand_ts.csv")
print("- national_demand_ts.csv")
print("- tech_demand_ts.csv")
print("- pipeline_status_ts.csv")
print("- global_electricity_ts.csv")


Notebook 3 complete. All aggregated datasets saved to CSV files:
- global_demand_ts.csv
- national_demand_ts.csv
- tech_demand_ts.csv
- pipeline_status_ts.csv
- global_electricity_ts.csv
