# Peru SWAN Pipeline: End-to-End Run
This notebook runs each major pipeline script in a separate cell.

**Make sure to set your desired `RUN_PATH` in `config.py` before running.**

---

In [60]:
# 1. Data Preparation
%env PYTHONPATH=.
%env RUN_PATH=run_g6
%env MIN_DAYS=1
!python3 scripts/data_preparation_0.py

env: PYTHONPATH=.
env: RUN_PATH=run_g6
env: MIN_DAYS=1
‚úÖ Using centralized configuration
üîß DATA_PREPARATION_0.PY - Initial Processing
üöÄ DATA PREPARATION STEP 0 - INITIAL PROCESSING
Run: run_g6
Reference port: ANCON
Coordinates: (-11.828, -77.131)

üìä STEP 1: Loading port closure data...
‚úÖ Loaded 23073 port closure records
Years: [np.int64(2013), np.int64(2014), np.int64(2015), np.int64(2016), np.int64(2017), np.int64(2018), np.int64(2019), np.int64(2020), np.int64(2021), np.int64(2022), np.int64(2023), np.int64(2024), np.int64(2025)]
Ports: 150 unique ports

üåä STEP 2: Loading SWAN wave data...
Loading SWAN wave data from CSV...
Loaded 36,017 wave data records
Date range: 2013-01-01 to 2025-04-30
SWH range: 0.717 to 3.540 m

üìà STEP 3: Processing SWAN data to daily aggregates...
Processing SWAN data for ANCON...
60th percentile threshold from 2023: 1.56m
‚úÖ SWAN daily processing complete: (2557, 30)

üì° STEP 4: Loading and processing WAVERYS data...
Loading WAVERYS d

In [61]:
# 1. Data Preparation
%env PYTHONPATH=.
%env RUN_PATH=run_g6
%env MIN_DAYS=1
!python3 scripts/data_preparation_1.py

env: PYTHONPATH=.
env: RUN_PATH=run_g6
env: MIN_DAYS=1
‚úÖ Using centralized configuration

üîß DATA_PREPARATION_1.PY - Enhanced Processing

STEP 1: Loading and validating daily data
‚úì Loaded SWAN daily data: (2557, 30)
‚úì Loaded WAVERYS data for ANCON: (2564, 41)
SWAN data date range: 2018-01-01 to 2024-12-31
WAVERYS data date range: 2018-01-01 to 2024-12-31
WAVERYS ports: ['ANCON']

STEP 2: Detrending and deseasonalizing reference point data
Input data shape: (2557, 30)
Apply detrending: True
Apply deseasonalizing: True
Processing 21 wave features...
Example features: ['swh_mean', 'swh_max', 'swh_min', 'swh_median', 'swh_p80']
Vectorized processing of 21 features...

STEP 3: Enhanced feature engineering on reference point data
Input shape: (2557, 73)
Use processed features: True
Creating enhanced features from 64 processed features...
  Creating persistence features (memory-efficient)...
  Creating trend features (memory-efficient)...
    Computing slopes for window 3...
    Comp

In [62]:
# 2. Rule Evaluation (CV, ML, Thresholds)
%env PYTHONPATH=.
%env RUN_PATH=run_g6
%env MIN_DAYS=1
!python3 scripts/rule_evaluation.py

env: PYTHONPATH=.
env: RUN_PATH=run_g6
env: MIN_DAYS=1

üîß RULE_EVALUATION.PY - CV Pipeline
Run: run_g6
Reference port: ANCON
üîé Looking for merged features at: /Users/ageidv/suyana/peru_swan/wave_analysis_pipeline/data/processed/run_g6/df_swan_waverys_merged.csv
‚úÖ Loaded merged features: /Users/ageidv/suyana/peru_swan/wave_analysis_pipeline/data/processed/run_g6/df_swan_waverys_merged.csv ((2564, 8497))

üö¶ Running enhanced rule evaluation pipeline ...
üöÄ FAST ENHANCED CV PIPELINE
üìä Dataset: 2564 samples, 270 events (10.5%)

CREATING CROSS-VALIDATION SPLITS
üìÖ Using TimeSeriesSplit with 6 folds
üîç FEATURE COMPARISON DEBUG:
Total columns in dataset: 8497
Excluded columns: ['date', 'port_name', 'event_dummy_1', 'total_obs']
Features being used: 8494
First 20 features:
   1. swh_mean_swan
   2. swh_max_swan
   3. swh_min_swan
   4. swh_median_swan
   5. swh_p80_swan
   6. swh_p25_swan
   7. swh_p75_swan
   8. swh_p60
   9. swh_sd_swan
  10. clima_swh_mean_swan
  11. anom

In [63]:
# 3. AEP Calculation (Rule & ML)
%env PYTHONPATH=.
%env PYTHONPATH=.
%env RUN_PATH=run_g6
%env MIN_DAYS=1
!python3 scripts/aep_calculation.py

env: PYTHONPATH=.
env: PYTHONPATH=.
env: RUN_PATH=run_g6
env: MIN_DAYS=1
[AEP] Aggregated N_PARAM (fishermen): 2827.0
[AEP] Aggregated W_PARAM (wage): 11.171428571428573

üîß AEP_CALCULATION.PY - Final AEP Analysis
Run: run_g6
Reference port: ANCON
‚úÖ Best single rule: anom_swh_mean_waverys > 0.173

üöÄ Running speed-optimized AEP simulation...
üöÄ SPEED-OPTIMIZED UNIFIED AEP ANALYSIS
  Data: 2564 observations
  Trigger: anom_swh_mean_waverys > 0.1726740234374999
  Port: 2827.0 fishermen √ó $11.171428571428573/day
  Min event: 1 days
  Block length: 7 days
  Simulations: 4000
  Using 2564 days for simulation.
  Observed events: 270 out of 2564 days
  Pre-computing trigger values...
  Generating block bootstrap samples...
  Pre-computing valid block positions...
  Generating all simulation indices...
  Processing 4000 simulations using 12 parallel workers...
  Using 12 threads for parallel processing...
Processing batches: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚

In [64]:
# 3. AEP Calculation (ML)
%env PYTHONPATH=.
%env PYTHONPATH=.
%env RUN_PATH=run_g6
%env MIN_DAYS=1
!python3 scripts/aep_ml_calculation.py

env: PYTHONPATH=.
env: PYTHONPATH=.
env: RUN_PATH=run_g6
env: MIN_DAYS=1
[ML-AEP] N_PARAM (fishermen): 1511.0
[ML-AEP] W_PARAM (wage): 56.0

üé≤ CORRECTED ML AEP CALCULATOR - Independent Daily Draws
This uses the THEORETICALLY CORRECT approach for ML probabilities:
‚Ä¢ Each day's probability = P(port closes | features)
‚Ä¢ Independent random draws (no block bootstrap)
‚Ä¢ Temporal correlation already captured in ML features
Run: run_g6
Reference port: ANCON
‚úÖ Using ML probabilities: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g6/ML_probs_2024.csv
‚úÖ Loaded ML probabilities: (366, 9)
‚úÖ Merged data: (366, 8497)
   Date range: 2024-01-01 00:00:00 to 2024-12-31 00:00:00
‚úÖ Found observed events: 37 out of 366 days

üé≤ Running CORRECTED ML AEP simulation...
üé≤ CORRECTED ML AEP ANALYSIS (Independent Daily Draws)
  Daily probabilities: 366 days
  Probability range: 0.0295 to 0.3415
  Mean probability: 0.0930
  Simulations: 4000
  Economic params: 1511.0 fishermen √ó $56.0

In [65]:
# 5. AEP calculatio for multiple conditions

%env PYTHONPATH=.
%env RUN_PATH=run_g6
%env MIN_DAYS=1
!python3 scripts/aep_calculation_experiment.py

env: PYTHONPATH=.
env: RUN_PATH=run_g6
env: MIN_DAYS=1
[AEP] Aggregated N_PARAM (fishermen): 2827.0
[AEP] Aggregated W_PARAM (wage): 11.171428571428573

üöÄ ENHANCED FAST MULTI-RULE AEP ANALYSIS
Run: run_g6
Reference port: ANCON

üìã Generating rule combinations...
  Testing 20 double rule combinations...

--- 1/20: double_AND ---
  üöÄ Fast multi-rule AEP: 2 features, 4000 sims
    ‚úÖ Mean loss: $3,576,596

--- 2/20: double_OR ---
  üöÄ Fast multi-rule AEP: 2 features, 4000 sims
    ‚úÖ Mean loss: $4,029,950

--- 3/20: double_AND ---
  üöÄ Fast multi-rule AEP: 2 features, 4000 sims
    ‚úÖ Mean loss: $4,029,950

--- 4/20: double_OR ---
  üöÄ Fast multi-rule AEP: 2 features, 4000 sims
    ‚úÖ Mean loss: $4,029,950

--- 5/20: double_AND ---
  üöÄ Fast multi-rule AEP: 2 features, 4000 sims
    ‚úÖ Mean loss: $3,654,144

--- 6/20: double_OR ---
  üöÄ Fast multi-rule AEP: 2 features, 4000 sims
    ‚úÖ Mean loss: $4,029,950

--- 7/20: double_AND ---
  üöÄ Fast multi-rule AEP: 2 fe

In [67]:
# 5. Plot Pipeline Summary
%env PYTHONPATH=.
%env RUN_PATH=run_g6
%env MIN_DAYS=1
!python3 scripts/plot_pipeline_summary.py


env: PYTHONPATH=.
env: RUN_PATH=run_g6
env: MIN_DAYS=1
‚úÖ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g6/swh_max_swan_vs_waverys_events.png
‚úÖ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g6/densities_swan_vs_waverys.png
‚úÖ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g6/anom_swh_max_swan_vs_waverys_events.png
‚úÖ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g6/densities_anom_swh_max_swan_vs_waverys.png
‚úÖ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g6/scatter_swh_max_swan_vs_waverys.png
‚úÖ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/run_g6/scatter_anom_swh_max_swan_vs_waverys.png
DEBUG: observed losses loaded:
   year  observed_loss
0  2018   1.326428e+06
1  2019   5.368877e+05
2  2020   7.263775e+05
3  2021   9.474489e+05
4  2022   1.800153e+06
5  2023   2.021224e+06
6  2024   1.168520e+06
‚úÖ Saved plot: /Users/ageidv/suyana/peru_swan/results/cv_results/r