# Calibration Tutorial - Crane, OR - Irrigated Flux Plot

## Step 1: Uncalibrated Model Run

This tutorial focuses on calibrating SWIM-RS for a single irrigated alfalfa plot at the S2 flux station in Crane, Oregon. Unlike the unirrigated Fort Peck example, this site is actively irrigated.

This notebook demonstrates:
1. Loading pre-built model input data
2. Running the uncalibrated SWIM model
3. Comparing model output with SSEBop ETf

**Note:** The Earth Engine data extraction step has been pre-computed. The `data/prepped_input.json` file contains all necessary input data.

In [None]:
import os
import sys
import time
import zipfile

import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error, r2_score

root = os.path.abspath('../..')
sys.path.append(root)

from swimrs.swim.config import ProjectConfig
from swimrs.swim.sampleplots import SamplePlots
from swimrs.model.obs_field_cycle import field_day_loop

from swimrs.viz.swim_timeseries import plot_swim_timeseries

%matplotlib inline

## 1. Project Setup

Define paths and unzip pre-built data if needed.

In [None]:
project_ws = os.path.abspath('.')
data = os.path.join(project_ws, 'data')

config_file = os.path.join(project_ws, '3_Crane.toml')
prepped_input = os.path.join(data, 'prepped_input.json')

# Unzip data files if they haven't been extracted
prepped_zip = os.path.join(data, 'prepped_input.zip')

if os.path.exists(prepped_zip) and not os.path.exists(prepped_input):
    print("Extracting prepped_input.zip...")
    with zipfile.ZipFile(prepped_zip, 'r') as z:
        z.extractall(data)

print(f"Project workspace: {project_ws}")
print(f"Config file: {config_file}")
print(f"Input data: {prepped_input}")

In [None]:
# Load the project configuration
config = ProjectConfig()
config.read_config(config_file, project_ws)

## 2. About the Study Site

The S2 site is an irrigated alfalfa field in Crane, Oregon. According to IrrMapper data, this location has been irrigated since about 1996, making it a good test case for the irrigation scheduling component of SWIM-RS.

In [None]:
selected_feature = 'S2'

print(f"Site: {selected_feature}")
print(f"Location: Crane, Oregon")
print(f"Crop: Irrigated alfalfa")
print(f"Date range: {config.start_dt} to {config.end_dt}")

## 3. Run the Uncalibrated Model

We define a helper function to run the SWIM model and capture its output.

In [None]:
def run_fields(ini_path, project_ws, selected_feature, output_csv, forecast=False):
    """Run SWIM model and save combined input/output to CSV."""
    start_time = time.time()

    config = ProjectConfig()
    config.read_config(ini_path, project_ws, forecast=forecast)

    fields = SamplePlots()
    fields.initialize_plot_data(config)
    fields.output = field_day_loop(config, fields, debug_flag=True)

    end_time = time.time()
    print(f'\nExecution time: {end_time - start_time:.2f} seconds\n')

    out_df = fields.output[selected_feature].copy()
    in_df = fields.input_to_dataframe(selected_feature)
    df = pd.concat([out_df, in_df], axis=1, ignore_index=False)
    
    # Cut out nan output from before the start of the model run
    df = df.loc[config.start_dt:config.end_dt]
    
    df.to_csv(output_csv)
    return df

In [None]:
selected_feature = 'S2'
out_csv = os.path.join(project_ws, f'combined_output_{selected_feature}_uncalibrated.csv')

df = run_fields(config_file, project_ws, selected_feature=selected_feature, output_csv=out_csv)

In [None]:
print(f"Output shape: {df.shape}")
print(f"Date range: {df.index[0]} to {df.index[-1]}")
print(f"\nKey output columns:")
key_cols = ['et_act', 'etref', 'kc_act', 'kc_bas', 'ks', 'ke', 'melt', 'rain', 
            'depl_root', 'swe', 'ppt', 'irrigation', 'soil_water']
for col in key_cols:
    if col in df.columns:
        print(f"  {col}: mean={df[col].mean():.3f}, max={df[col].max():.3f}")

## 4. Visualize Model Output

Let's examine a single year (2004) to see the model's behavior.

In [None]:
ydf = df.loc['2004-01-01': '2004-12-31']
print(f'Total irrigation: {ydf.irrigation.sum():.1f} mm')
print(f'Total ET: {ydf.et_act.sum():.1f} mm')
print(f'Total precipitation: {ydf.ppt.sum():.1f} mm')

plot_swim_timeseries(ydf, ['et_act', 'etref', 'rain', 'melt', 'irrigation'], 
                     start='2004-01-01', end='2004-12-31', png_dir='et_uncalibrated.png')

## 5. Compare with SSEBop ETf

We have two estimates of the rate of ET as a ratio of reference ET:

1. **SSEBop ETf**: Remote sensing retrievals on Landsat overpass dates
2. **SWIM Kc_act**: Model-estimated actual crop coefficient (analogous to ETf)

Let's compare the agreement between these estimates on capture dates:

In [None]:
def compare_with_ssebop(combined_output_path, irr=True):
    """Compare model Kc_act against SSEBop ETf on capture dates."""
    output = pd.read_csv(combined_output_path, index_col=0, parse_dates=True)

    if irr:
        etf, ct = 'etf_irr', 'etf_irr_ct'
    else:
        etf, ct = 'etf_inv_irr', 'etf_inv_irr_ct'

    df = pd.DataFrame({'kc_act': output['kc_act'],
                       'etf': output[etf],
                       'ct': output[ct]})

    # Filter for capture dates only
    df = df.dropna()
    df = df.loc[df['ct'] == 1]

    # Calculate RMSE and R-squared
    rmse = np.sqrt(mean_squared_error(df['etf'], df['kc_act']))
    r2 = r2_score(df['etf'], df['kc_act'])

    print(f"SWIM Kc_act vs. SSEBop ETf: RMSE = {rmse:.2f}, R-squared = {r2:.2f}")
    print(f"Number of capture dates: {len(df)}")
    
    return df, rmse, r2

In [None]:
# Use irrigated mask since this is an irrigated site
comparison_df, rmse, r2 = compare_with_ssebop(out_csv, irr=True)

In [None]:
import matplotlib.pyplot as plt

# Create a scatter plot comparison
fig, ax = plt.subplots(figsize=(8, 8))

ax.scatter(comparison_df['etf'], comparison_df['kc_act'], alpha=0.5, s=10)
ax.plot([0, 1.5], [0, 1.5], 'r--', label='1:1 line')
ax.set_xlabel('SSEBop ETf')
ax.set_ylabel('SWIM Kc_act')
ax.set_title(f'SWIM vs SSEBop (Uncalibrated)\nRMSE={rmse:.2f}, R2={r2:.2f}')
ax.legend()
ax.set_xlim(0, 1.5)
ax.set_ylim(0, 1.5)

plt.tight_layout()
plt.savefig('comparison_scatter_uncalibrated.png', dpi=150)
plt.show()

## Summary

The uncalibrated model using default parameters doesn't perform well compared to SSEBop. This is expected - the default parameters aren't optimized for this specific site.

**Key observations:**
- The model isn't applying enough irrigation
- The NDVI-to-Kcb relationship needs tuning for alfalfa
- Soil parameters may not match the actual site conditions

**Next step:** In notebook `02_calibration.ipynb`, we'll use PEST++ to calibrate the model parameters using SSEBop ETf and SNODAS SWE observations.