# Peru SWAN Pipeline: End-to-End Run
This notebook runs each major pipeline script in a separate cell.

**Make sure to set your desired `RUN_PATH` in `config.py` before running.**

---

In [7]:
# 1. Data Preparation
%env PYTHONPATH=.
!python3 scripts/data_preparation_0.py

env: PYTHONPATH=.
✅ Using centralized configuration
🔧 DATA_PREPARATION_0.PY - Initial Processing
🚀 DATA PREPARATION STEP 0 - INITIAL PROCESSING
Run: run_g4
Reference port: PUERTO_ETEN
Coordinates: (-6.933, -79.855)

📊 STEP 1: Loading port closure data...
✅ Loaded 23073 port closure records
Years: [np.int64(2013), np.int64(2014), np.int64(2015), np.int64(2016), np.int64(2017), np.int64(2018), np.int64(2019), np.int64(2020), np.int64(2021), np.int64(2022), np.int64(2023), np.int64(2024), np.int64(2025)]
Ports: 150 unique ports

🌊 STEP 2: Loading SWAN wave data...
Loading SWAN wave data from CSV...
Loaded 36,010 wave data records
Date range: 2013-01-01 to 2025-04-30
SWH range: 0.355 to 4.416 m

📈 STEP 3: Processing SWAN data to daily aggregates...
Processing SWAN data for PUERTO_ETEN...
60th percentile threshold from 2023: 1.63m
✅ SWAN daily processing complete: (2557, 30)

📡 STEP 4: Loading and processing WAVERYS data...
Loading WAVERYS data for reference port: PUERTO_ETEN
Processing WAV

In [8]:
# 1. Data Preparation
%env PYTHONPATH=.
!python3 scripts/data_preparation_1.py

env: PYTHONPATH=.
✅ Using centralized configuration

🔧 DATA_PREPARATION_1.PY - Enhanced Processing

STEP 1: Loading and validating daily data
✓ Loaded SWAN daily data: (2557, 30)
✓ Loaded WAVERYS data for PUERTO_ETEN: (2557, 41)
SWAN data date range: 2018-01-01 to 2024-12-31
WAVERYS data date range: 2018-01-01 to 2024-12-31
WAVERYS ports: ['PUERTO_ETEN']

STEP 2: Detrending and deseasonalizing reference point data
Input data shape: (2557, 30)
Apply detrending: True
Apply deseasonalizing: True
Processing 21 wave features...
Example features: ['swh_mean', 'swh_max', 'swh_min', 'swh_median', 'swh_p80']
Vectorized processing of 21 features...

STEP 3: Enhanced feature engineering on reference point data
Input shape: (2557, 73)
Use processed features: True
Creating enhanced features from 64 processed features...
  Creating persistence features (memory-efficient)...
  Creating trend features (memory-efficient)...
    Computing slopes for window 3...
    Computing slopes for window 5...
    C

In [9]:
# 2. Rule Evaluation (CV, ML, Thresholds)
%env PYTHONPATH=.
!python3 scripts/rule_evaluation.py

env: PYTHONPATH=.

🔧 RULE_EVALUATION.PY - CV Pipeline
Run: run_g4
Reference port: PUERTO_ETEN
🔎 Looking for merged features at: /Users/ageidv/suyana/peru_swan/wave_analysis_pipeline/data/processed/run_g4/df_swan_waverys_merged.csv
✅ Loaded merged features: /Users/ageidv/suyana/peru_swan/wave_analysis_pipeline/data/processed/run_g4/df_swan_waverys_merged.csv ((2557, 8497))

🚦 Running enhanced rule evaluation pipeline ...
🚀 FAST ENHANCED CV PIPELINE
📊 Dataset: 2557 samples, 384 events (15.0%)

CREATING CROSS-VALIDATION SPLITS
📅 Using TimeSeriesSplit with 6 folds
🔍 FEATURE COMPARISON DEBUG:
Total columns in dataset: 8497
Excluded columns: ['date', 'port_name', 'event_dummy_1', 'total_obs']
Features being used: 8494
First 20 features:
   1. swh_mean_swan
   2. swh_max_swan
   3. swh_min_swan
   4. swh_median_swan
   5. swh_p80_swan
   6. swh_p25_swan
   7. swh_p75_swan
   8. swh_p60
   9. swh_sd_swan
  10. clima_swh_mean_swan
  11. anom_swh_mean_swan
  12. anom_swh_max_swan
  13. anom_swh_

In [10]:
# 3. AEP Calculation (Rule & ML)
%env PYTHONPATH=.
!python3 scripts/aep_calculation.py

env: PYTHONPATH=.
[AEP] Aggregated N_PARAM (fishermen): 5236.0
[AEP] Aggregated W_PARAM (wage): 14.285714285714286

🔧 AEP_CALCULATION.PY - Final AEP Analysis
Run: run_g4
Reference port: PUERTO_ETEN
✅ Best single rule: swh_median_waverys > 1.454

🚀 Running speed-optimized AEP simulation...
🚀 SPEED-OPTIMIZED UNIFIED AEP ANALYSIS
  Data: 2557 observations
  Trigger: swh_median_waverys > 1.4537083333333332
  Port: 5236.0 fishermen × $14.285714285714286/day
  Min event: 1 days
  Block length: 7 days
  Simulations: 4000
  Using 2557 days for simulation.
  Observed events: 384 out of 2557 days
  Pre-computing trigger values...
  Generating block bootstrap samples...
  Pre-computing valid block positions...
  Generating all simulation indices...
  Processing 4000 simulations using 12 parallel workers...
  Using 12 threads for parallel processing...
Processing batches: 100%|███████████████████████| 13/13 [00:00<00:00, 43.69it/s]
  Completed 4000 simulations successfully.
  Calculating standard 

In [11]:
# 4. Plot Pipeline Summary
%env PYTHONPATH=.
!python3 scripts/plot_pipeline_summary.py

env: PYTHONPATH=.
✅ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g4/swh_max_swan_vs_waverys_events.png
✅ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g4/densities_swan_vs_waverys.png
✅ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g4/anom_swh_max_swan_vs_waverys_events.png
✅ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g4/densities_anom_swh_max_swan_vs_waverys.png
✅ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g4/scatter_swh_max_swan_vs_waverys.png
✅ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g4/scatter_anom_swh_max_swan_vs_waverys.png
DEBUG: observed losses loaded:
   year  observed_loss
0  2018      5385600.0
1  2019      4936800.0
2  2020      6956400.0
3  2021      6358000.0
4  2022      2169200.0
5  2023      1271600.0
6  2024      1645600.0
✅ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g4/aep_with_observed_losses.png
