# Derived Variables Demo: Heating & Cooling Degree Days

This notebook demonstrates how to use derived variables in ClimakitAE, focusing on **cooling degree days (CDD)** for energy demand analysis in Los Angeles County.

### Purpose

This notebook uses (and serves as a living example) for the following tools in ClimakitAE:
- ClimateData(): user interface
- Climakitae custom metrics over the Cal-Adapt Data Catalog (cadcat)
    - Built in functions like Cooling Degree Days
    - Modifying the built in functions
    - Defining **your own** metric for notebook specific custom metric analysis
- Spatial subsetting using climakitae
- Warming level approaches using climakitae

> [NOTE]
> Custom metrics defined by users DO NOT show up in the cadcat catalog and ARE NOT available to other users.
> Custom metrics are defined locally and exist in the users environment only at run time and do not persist between sessions.

### Outcomes

Users can expect to take away visualizations of cooling degree days for various thresholds across 3 global warming levels.


**RUNTIME**: 7 minutes

## What are Degree Days?

- **Cooling Degree Days (CDD)**: Measure of energy demand for cooling when outdoor temperature rises above 65¬∞F

These metrics are essential for:
- Building energy modeling
- Utility demand forecasting
- Climate adaptation planning
- Energy efficiency assessments

In [None]:
from dask.diagnostics import ProgressBar
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr

import climakitae as ck

# Set default plotting parameters for readability
plt.rcParams.update({
    'font.size': 15,            # default font size
    'axes.titlesize': 18,       # title
    'axes.labelsize': 15,       # axis labels
    'xtick.labelsize': 15,      # x tick labels
    'ytick.labelsize': 15,      # y tick labels
    'legend.fontsize': 15,      # legend
})

# Initialize ClimateData
cd = ck.ClimateData(verbosity=-1)

In [None]:
# Add dummy time dimension to warming level data for easier handling
def add_dummy_time_to_wl(data):
    """
    Convert time_delta dimension to a dummy time dimension for easier handling.
    """
    if 'time_delta' in data.dims:
        wl_values = data['time_delta'].values
        time_values = pd.to_datetime(wl_values, unit='D', origin='2050-01-01')
        data = data.rename({'time_delta': 'time'})
        data = data.assign_coords(time=time_values)
    return data


## Why Derived Variables? Old vs New Approach

### ‚ùå Previous Approach (Legacy Interface)

```mermaid
flowchart TD
    A[Step 1: Fetch first variable<br/>tasmax_data = get_data] --> B[Step 2: Fetch second variable<br/>tasmin_data = get_data]
    B --> C[Step 3: Define calculation function<br/>def calc_cdd]
    C --> D[Step 4: Manually compute<br/>cdd = calc_cdd]
    D --> E[Step 5: Manual post-processing<br/>cdd_clipped = cdd.sel<br/>cdd_annual = cdd.resample]
    E --> F[‚è±Ô∏è RESULT]
    
    style A fill:#ffcccc,stroke:#cc0000,stroke-width:2px
    style B fill:#ffcccc,stroke:#cc0000,stroke-width:2px
    style C fill:#ffcccc,stroke:#cc0000,stroke-width:2px
    style D fill:#ffcccc,stroke:#cc0000,stroke-width:2px
    style E fill:#ffcccc,stroke:#cc0000,stroke-width:2px
    style F fill:#ffeeee,stroke:#cc0000,stroke-width:2px
```

**Pain Points:**
- üí• Multiple `get_data()` calls might load full datasets into memory (lazy dataset handling was inconsistent)
- üîß Manual coordination of multiple variables
- üì¶ Post-processing happens after data is loaded
- üîÅ No reusability - repeat for each analysis
- ‚ö†Ô∏è Error-prone parameter matching

---

### ‚úÖ New Approach (Derived Variables System)

```mermaid
flowchart TD
    A[Step 1: Define function ONCE and register<br/>@register_derived] --> B[Step 2: Single call with processing<br/>ClimateData.variable.processes.get]
    B --> C[‚ö° LAZY RESULT<br/>computed on demand]
    
    style A fill:#ccffcc,stroke:#00cc00,stroke-width:2px
    style B fill:#ccffcc,stroke:#00cc00,stroke-width:2px
    style C fill:#eeffee,stroke:#00cc00,stroke-width:3px
```

**Benefits:**
- ‚ú® **Single `.get()` call** - All processing in one query
- üîÑ **Automatic variable fetching** - System knows CDD needs t2max & t2min
- üíæ **Lazy evaluation** - Data not loaded until needed
- üéØ **Processing pipeline** - Clipping, time slicing happen efficiently
- ‚ôªÔ∏è **Reusable** - Define once, use everywhere
- üõ°Ô∏è **Type-safe** - Registry validates dependencies
- üì¶ **Composable** - Chain with other processors seamlessly

---

### Code Comparison

**Old Approach:**
```python
# Step 1 & 2: Multiple data fetches
tasmax_data = get_data({'variable': 'tasmax', 'area': 'Los Angeles', ...})
tasmin_data = get_data({'variable': 'tasmin', 'area': 'Los Angeles', ...})

# Step 3 & 4: Manual calculation
def calc_cdd(tasmax, tasmin):
    t_avg = (tasmax + tasmin) / 2
    return np.maximum(0, t_avg - 291.48)  # 65¬∞F threshold

cdd = calc_cdd(tasmax_data, tasmin_data)

# Step 5: Manual post-processing
cdd_clipped = cdd.sel(lat=..., lon=...)
cdd_annual = cdd_clipped.resample(time='Y').sum()
```

**New Approach:**
```python
# Step 1: Register once (already done in builtin!)
@register_derived(variable='CDD_wrf', query={'variable_id': ['t2max', 't2min']})
def calc_cdd_wrf(ds): ...

# Step 2: Single call - that's it!
cdd_data = (
    ClimateData()
    .variable('CDD_wrf')           # Derived variable!
    .activity_id('WRF')
    .processes({
        'clip': 'Los Angeles County',
        'time_slice': (2020, 2050)
    })
    .get()
)
# Returns lazy dataset with all processing applied
```

In [None]:
# Show all available derived variables
# Notice the HDD and CDD variables for both WRF and LOCA2 data
cd.show_derived_variables()

## 2. Understanding the Degree Days Variables

The degree days metrics are built-in derived variables that automatically fetch and compute from temperature data:

In [None]:
from climakitae.new_core.derived_variables import list_derived_variables

# Get info about the degree days variables
derived_vars = list_derived_variables()

degree_days_vars = ['HDD_wrf', 'CDD_wrf', 'HDD_loca', 'CDD_loca']

for var_name in degree_days_vars:
    if var_name in derived_vars:
        info = derived_vars[var_name]
        print(f"\n{var_name}:")
        print(f"  Depends on: {info.depends_on}")
        print(f"  Description: {info.description}")

## 3. Fetching Cooling Degree Days for Los Angeles

Let's fetch CDD data for Los Angeles County to analyze cooling energy demand under future warming:

In [None]:
cd = ck.ClimateData(verbosity=-2)

# Fetch cooling degree days for Los Angeles County
cdd_data = (
    cd
    .catalog("cadcat")
    .variable("CDD_wrf")  # Cooling degree days from WRF
    .activity_id("WRF")
    .institution_id("UCLA")
    .table_id("day")
    .grid_label("d03")  # 3km resolution
    # .experiment_id("ssp370")  # High emissions scenario
    .processes({
        "warming_level": {
            "warming_levels": [1.2, 2.0, 3.0]
        },
        "clip": "Los Angeles County",
    })
    .get()
)

In [None]:
cdd_data = add_dummy_time_to_wl(cdd_data).drop_sel(sim="WRF_UCLA_EC-Earth3-Veg_ssp370_day_d03_r1i1p1f1") 

cdd_data

## 4. Visualizing Cooling Degree Days: Spatial Based

Let's calculate and visualize annual cooling degree days:

In [None]:
# Count days with cooling degree days > 0 per year
cooling_days_per_year = cdd_data["CDD_wrf"].resample(time="Y").sum()
cooling_days_per_year = xr.where(cooling_days_per_year > 0, cooling_days_per_year, np.nan)

# Average years for spatial plot
spt = cooling_days_per_year.mean(dim="time") 

In [None]:
# Compute with progress bar
with ProgressBar():
    spt = spt.median(dim='sim').compute()

In [None]:
# Create figure with 3 subplots (one per warming level)
fig, axes = plt.subplots(1, 3, figsize=(18, 5), gridspec_kw={'wspace': 0.3})

warming_levels = spt.warming_level.values

for idx, (ax, wl) in enumerate(zip(axes, warming_levels)):
    # Select data for this warming level
    wl_data = spt.sel(warming_level=wl)
    
    # Create contourf plot
    im = wl_data.plot.contourf(
        ax=ax, x='lon', y='lat',
        cmap='plasma', levels=100,
        add_colorbar=False,
        vmin=1,
        vmax=300
    )
    
    ax.set_title(f'{wl}¬∞C Warming Level', fontweight='bold')
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')

# Add shared colorbar
fig.colorbar(im, ax=axes, label='Annual Cooling Days', pad=0.02, shrink=0.8)

fig.suptitle('Cooling Days (65 ¬∞F) Distribution Across Warming Levels\nLos Angeles County', 
             fontweight='bold', y=1.07)

The plot above shows the median number of cooling days (65 ¬∞F) per year across Los Angeles County. Note the increase in the number of cooling days as the targeted Global Warming Level increases.

## 5. Temporal Trends by Warming Level

Let's visualize how cooling days evolve over time for each warming level:

In [None]:
# Average over spatial dimensions for county-wide average
with ProgressBar():
    cooling_days_per_year = cooling_days_per_year.mean(dim=["y", "x"]).compute()

In [None]:
# Plot time series for each warming level
fig, ax = plt.subplots(figsize=(14, 7))

# Define colors for each warming level
colors = ['#2E86AB', '#A23B72', '#F18F01']  # Blue, Purple, Orange
warming_levels = cooling_days_per_year.warming_level.values

# Plot ensemble mean for each warming level
for idx, wl in enumerate(warming_levels):
    wl_data = cooling_days_per_year.sel(warming_level=wl)
    
    # Plot individual simulations with low alpha
    for sim in range(wl_data.sizes['sim']):
        sim_data = wl_data.isel(sim=sim)
        ax.plot(sim_data.time.dt.year, sim_data.values, 
                alpha=0.15, color=colors[idx], linewidth=0.8)
    
    # Plot ensemble mean with solid line
    ensemble_mean = wl_data.mean(dim='sim')
    ax.plot(ensemble_mean.time.dt.year, ensemble_mean.values,
            color=colors[idx], linewidth=3, label=f'{wl}¬∞C Warming', 
            marker='o', markersize=4, markevery=5)

ax.set_xlabel('Year', fontweight='bold')
ax.set_ylabel('Annual Cooling Days', fontweight='bold')
ax.set_title('Annual Cooling Days by Global Warming Level\nLos Angeles County (UCLA WRF)', 
             fontweight='bold', pad=15)
ax.legend(loc='upper left', framealpha=0.95)
ax.grid(True, alpha=0.3, linestyle='--')

# Add annotation
ax.text(0.98, 0.02, 'Higher warming ‚Üí More cooling days', 
        transform=ax.transAxes,
        verticalalignment='bottom', horizontalalignment='right',
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.3))

plt.tight_layout()

The plot above shows the median number of cooling degree days (65 ¬∞F) as a function of time averaged over Los Angeles County. As the Global warming level increases so too does the number of cooling days. Please note that the time axis is a "dummy" sequence, and the true timescale for the data can be found in the dataset's metadata.

In [None]:
# Summary statistics for cooling days by warming level
for wl in warming_levels:
    wl_data = cooling_days_per_year.sel(warming_level=wl).mean(dim='sim')
    mean_days = wl_data.mean().values
    print(f"\n{wl}¬∞C Warming Level:")
    print(f"  Average cooling days per year: {mean_days:.0f} days")
    print(f"  Range: {wl_data.min().values:.0f} - {wl_data.max().values:.0f} days")

## 6. Customizing CDD

There are two ways to customize the calculation of Cooling Degree Days:
1. Use the built-in methods to pass your own threshold and register it
2. Define your own Cooling Degree Days method and register it

### 6.1 Customizing CDD: Built-in Methods

In [None]:
from climakitae.new_core.derived_variables.registry import register_user_function
from climakitae.new_core.derived_variables.builtin.temperature import calc_cdd_wrf

# The builtin calc_cdd_wrf accepts threshold_f as an argument
# We can wrap it with a custom threshold and register as a new variable

def calc_cdd_wrf_75f(ds):
    """CDD with 75¬∞F threshold - wraps builtin with custom threshold."""
    return calc_cdd_wrf(ds, threshold_f=75.0)

register_user_function(
    name="CDD_wrf_75f",
    depends_on=["t2"],  # Same dependency as CDD_wrf
    func=calc_cdd_wrf_75f,
    description="Cooling Degree Days from WRF with 75¬∞F base",
    units="K",
)

In [None]:
cd = ck.ClimateData(verbosity=-2)

# Fetch cooling degree days with 75¬∞F threshold
cdd_data_75f = (
    cd
    .catalog("cadcat")
    .variable("CDD_wrf_75f")  # Now using our custom 75¬∞F variable
    .activity_id("WRF")
    .institution_id("UCLA")
    .table_id("day")
    .grid_label("d03")
    .processes({
        "warming_level": {
            "warming_levels": [1.2, 2.0, 3.0]
        },
        "clip": "Los Angeles County",
    })
    .get()
)

cdd_data_75f = add_dummy_time_to_wl(cdd_data_75f).drop_sel(sim="WRF_UCLA_EC-Earth3-Veg_ssp370_day_d03_r1i1p1f1") 

In [None]:
# Count days with cooling degree days > 0 per year
cooling_days_per_year = cdd_data_75f["CDD_wrf"].resample(time="Y").sum()
cooling_days_per_year = xr.where(cooling_days_per_year > 0, cooling_days_per_year, np.nan)

# Average years for spatial plot
spt = cooling_days_per_year.mean(dim="time") 

In [None]:

# Compute with progress bar
with ProgressBar():
    spt = spt.median(dim='sim').compute()

In [None]:
# Create figure with 3 subplots (one per warming level)
fig, axes = plt.subplots(1, 3, figsize=(18, 5), gridspec_kw={'wspace': 0.3})

warming_levels = spt.warming_level.values

for idx, (ax, wl) in enumerate(zip(axes, warming_levels)):
    # Select data for this warming level
    wl_data = spt.sel(warming_level=wl)
    
    # Create contourf plot
    im = wl_data.plot.contourf(
        ax=ax, x='lon', y='lat',
        cmap='plasma', levels=100,
        add_colorbar=False,
        vmin=0,
        vmax=340
    )
    
    ax.set_title(f'{wl}¬∞C Warming Level', fontweight='bold')
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')

# Add shared colorbar
fig.colorbar(im, ax=axes, label='Annual Cooling Days', pad=0.02, shrink=0.8)

fig.suptitle('Cooling Days (75 ¬∞F) Distribution Across Warming Levels\nLos Angeles County', 
             fontweight='bold', y=1.07)

The plot above shows the median number of cooling days (75 ¬∞F) per year across Los Angeles County. Note the increase in the number of cooling days as the targeted Global Warming Level increases. 

### 6.2: Define your own cooling degree days function

In [None]:
from climakitae.new_core.derived_variables.registry import register_user_function
from climakitae.util.utils import f_to_k

THRESHOLD_F = 85.0
threshold_k = f_to_k(THRESHOLD_F)

def calc_custom_cdd(ds):
    """
    A custom CDD function that runs on a different variable. 

    In this example we look at days where the daily minimum temperature exceeds 75¬∞F.
    """
    cdd_values = np.maximum(0, ds.t2min - threshold_k)
    
    # Replace zeros with NaN (days below threshold become NaN)
    cdd_values = xr.where(cdd_values > 0, cdd_values, np.nan)
    ds["CDD_min_wrf_75f"] = cdd_values
    ds["CDD_min_wrf_75f"].attrs = {
        "units": "K",
        "long_name": "Cooling Degree Days (WRF)",
        "comment": f"Cooling degree days calculated from daily minimum temperature with base {threshold_k} K",
        "derived_from": "t2min",
        "derived_by": "climakitae",
        "threshold": f"{threshold_k} K",
    }
    return ds

register_user_function(
    name="CDD_min_wrf_75f",
    depends_on=["t2min"],
    func=calc_custom_cdd,
    description="Cooling Degree Days from WRF with 75¬∞F base using daily minimum temperature",
    units="K",
)

In [None]:
cd = ck.ClimateData(verbosity=-2)

# Fetch cooling degree days for Los Angeles County
cdd_data_min_75f = (
    cd
    .catalog("cadcat")
    .variable("CDD_min_wrf_75f")  # Cooling degree days from WRF with 75¬∞F base
    .activity_id("WRF")
    .institution_id("UCLA")
    .table_id("day")
    .grid_label("d03")  # 3km resolution
    .processes({
        "warming_level": {
            "warming_levels": [1.2, 2.0, 3.0]
        },
        "clip": "Los Angeles County",
    })
    .get()
)

cdd_data_min_75f = add_dummy_time_to_wl(cdd_data_min_75f).drop_sel(sim="WRF_UCLA_EC-Earth3-Veg_ssp370_day_d03_r1i1p1f1") 

In [None]:
# Count days with cooling degree days > 0 per year
cooling_days_per_year = cdd_data_min_75f["CDD_min_wrf_75f"].resample(time="Y").sum()
cooling_days_per_year = xr.where(cooling_days_per_year > 0, cooling_days_per_year, np.nan)

# Average years for spatial plot
spt = cooling_days_per_year.mean(dim="time") 

In [None]:

# Compute with progress bar
with ProgressBar():
    spt = spt.median(dim='sim').compute()

In [None]:
# Create figure with 3 subplots (one per warming level)
fig, axes = plt.subplots(1, 3, figsize=(18, 5), gridspec_kw={'wspace': 0.3})

warming_levels = spt.warming_level.values

for idx, (ax, wl) in enumerate(zip(axes, warming_levels)):
    # Select data for this warming level
    wl_data = spt.sel(warming_level=wl)
    
    # Create contourf plot
    im = wl_data.plot.contourf(
        ax=ax, x='lon', y='lat',
        cmap='plasma', levels=100,
        add_colorbar=False,
        vmin=0,
        vmax=30
    )
    
    ax.set_title(f'{wl}¬∞C Warming Level', fontweight='bold')
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')

# Add shared colorbar
fig.colorbar(im, ax=axes, label='Annual Cooling Days', pad=0.02, shrink=0.8)

fig.suptitle('Cooling Days (Min Temp > 85 ¬∞F) Distribution Across Warming Levels\nLos Angeles County', 
             fontweight='bold', y=1.07)

## 7. Summary: Key Takeaways

**What We've Learned:**

1. **Derived variables simplify analysis** - Single `.get()` call automatically fetches and computes HDD/CDD from temperature data

2. **Warming levels provide climate-relevant framing** - Analyzing by 1.2¬∞C, 2.0¬∞C, and 3.0¬∞C warming shows progressive impacts

3. **Los Angeles shows clear cooling demand increase** - CDD rises significantly with warming while HDD decreases

4. **Spatial patterns matter** - Inland areas experience more extreme cooling demands than coastal regions

5. **Energy costs will increase** - Higher warming levels translate directly to higher HVAC energy costs

6. **Customizing Functions** - Metrics can be customized and added to the catalog (locally, this does not change the catalog other users see). 

## 8. Available Degree Days Variables

| Variable | Data Source | Input Variables | Description |
|----------|-------------|-----------------|-------------|
| `HDD_wrf` | WRF | t2 | Heating degree days (base 65¬∞F) |
| `CDD_wrf` | WRF | t2 | Cooling degree days (base 65¬∞F) |
| `HDD_loca` | LOCA2 | tasmax, tasmin | Heating degree days (base 65¬∞F) |
| `CDD_loca` | LOCA2 | tasmax, tasmin | Cooling degree days (base 65¬∞F) |

All variables automatically:
- Fetch required temperature data
- Calculate daily average: (max + min) / 2
- Apply 65¬∞F (291.48K) threshold
- Return lazy-evaluated xarray datasets