# double_ITCZ_bias: Meridional structure of time-mean precipitation (PR) in the eastern Pacific

The meridional structure of time-mean precipitation (PR) in the eastern Pacific, in the context of ENSO (El Niño-Southern Oscillation), refers to the distribution and variation of average precipitation along the north-south (meridional) direction over a certain period, usually during different ENSO phases (El Niño, La Niña, and neutral conditions).

This notebook computes the meridional root mean square error (RMSE) of climatological precipitation (PR) between model simulations and observations over the eastern Pacific region (15°S-15°N, 150°W-90°W). 

The analysis involves:
- Averaging precipitation data zonally between 150°W and 90°W.
- Time-averaging the precipitation data to create climatologies for both model output and observational data.
- Calculating the meridional RMSE to quantify how well the model captures the observed precipitation pattern in this region.







## Observations and reanalysis datasets used for comparison

The following reference observational datasets are used to evaluate the model simulations:

- **GPCPv2.3 (1979-2018)**: The primary dataset for comparison. The Global Precipitation Climatology Project (GPCP) combines satellite and surface precipitation measurements to provide comprehensive global precipitation data.
  
### Additional observational datasets used for comparison to be used:
- **20CRv2 (1871-2012)**: The 20th Century Reanalysis provides global atmospheric data, including precipitation, based on surface pressure observations and ensemble assimilation techniques.
[ESMValTool obs cmoriser config](https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/cmor_config/NOAA-CIRES-20CR-V2.yml)

## Dask Cluster

Start a cluster with multiple cores

In [None]:
from dask.distributed import Client, progress
client = Client()
client

## Datasets

In [None]:
from esmvalcore.dataset import Dataset

FYI you may get some `ESMValCoreDeprecationWarning`s.

Which is the change of how configuration is set up as in ESMValTool, to be similar to other software packages (eg. Dask). There is still backwards compatibility so not an issue here, the warning will include some instructions to update.

### Exercise:
Define `Dataset`s with certain attributes. Observational datasets will have different attributes to models.

In this cell fill in another observational dataset for

**'NOAA-CIRES-20CR-V2'**, project: 'OBS6', time range:'187101/201212', tier: 2

In [None]:
## add or update dataset different exp/ensemble r1:10i1p1f1 CM2, ESM1-5 1:40
model_datasets = {
"ACCESS-ESM1-5": 
    Dataset(
    short_name='pr',
    project='CMIP6',
    mip="Amon",
    exp="historical",
    ensemble="r1i1p1f1",
    timerange="18500101/20150101",
    dataset="ACCESS-ESM1-5",
    grid="gn"
)}

# fill in blanks for an obs dataset.
# ESMValTool config for obs, see above for 20CR-V2
obs_datasets = {
"GPCP-V2.3": 
    Dataset(
    short_name='pr',
    dataset='GPCP-V2.3',
    mip="Amon",
    project='obs4MIPs', #
    timerange="19790101/20170101",
    tier=1),
# fill Dataset in for here: 'NOAA-CIRES-20CR-V2' 'OBS6' "187101/201212", tier: 2
"NOAA-CIRES-20CR-V2": 
    Dataset(
    short_name='pr',
    dataset= , #
    mip="Amon",
    project= ,#
    type= 'reanaly',
    timerange= ,#
    tier= #
    ),
}

## Add Ancillary files

Cell area and land fraction are in separate files. These can also be added to the Dataset objects so they can also be loaded

In [None]:
for name, dataset in model_datasets.items():
    dataset.add_supplementary(short_name='sftlf', mip="fx")
    dataset.add_supplementary(short_name="areacella", mip="fx")

In [None]:
# load the datasets - ESMValCore will find the files to load and do some basic checks

model_datasets = {name: dataset.load() for name, dataset in model_datasets.items()}
obs_datasets = {name: dataset.load() for name, dataset in obs_datasets.items()}

In [None]:
# import the required preprocessors.
# https://docs.esmvaltool.org/projects/ESMValCore/en/latest/api/esmvalcore.preprocessor.html#
from esmvalcore.preprocessor import climate_statistics
from esmvalcore.preprocessor import convert_units
from esmvalcore.preprocessor import extract_region
from esmvalcore.preprocessor import extract_time
from esmvalcore.preprocessor import regrid
from esmvalcore.preprocessor import axis_statistics
from esmvalcore.preprocessor import mask_landsea

### Exercise:
Complete preprocessor functions with the correct parameters, see api docs:
- [climate_statistics](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/api/esmvalcore.preprocessor.html#esmvalcore.preprocessor.climate_statistics) for an average across the full time period
- [axis_statistics](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/api/esmvalcore.preprocessor.html#esmvalcore.preprocessor.axis_statistics) for an average along longitude(x-axis)

In [None]:

def preproc1(cube):
    nino3_latext_region = {"start_longitude": 210., "end_longitude": 270., "start_latitude": -15., "end_latitude": 15.}
    cube = convert_units(cube, units="mm/day")
    cube = mask_landsea(cube, mask_out="land")
    
    cube = climate_statistics(cube, ) #fill in
    cube = regrid(cube, target_grid="1x1", scheme="linear")
    cube = extract_region(cube, **nino3_latext_region)
    
    cube = axis_statistics(cube, ) #fill in
    return cube

def preproc2(cube):
    nino3_latext_region = {"start_longitude": 150., "end_longitude": 270., "start_latitude": -15., "end_latitude": 15.}
    cube = mask_landsea(cube, mask_out="land")

    cube = climate_statistics(cube, ) #fill in
    
    cube = convert_units(cube, units="mm/day")
    cube = regrid(cube, target_grid="1x1", scheme="linear")
    cube = extract_region(cube, **nino3_latext_region)
    return cube


In [None]:
## run the loaded datasets through the preprocessors

model_datasets_prep1 = {name: preproc1(dataset) for name, dataset in model_datasets.items()}
obs_datasets_prep1 = {name: preproc1(dataset) for name, dataset in obs_datasets.items()}

model_datasets_prep2 = {name: preproc2(dataset) for name, dataset in model_datasets.items()}
obs_datasets_prep2 = {name: preproc2(dataset) for name, dataset in obs_datasets.items()}

# Diagnostics Level 1


### Lets plot some preprocessed data ..

*tips*
- check shape and cube coordinates to help figure out what to plot
- check some data values

In [None]:
import iris
import iris.quickplot as qplt
import matplotlib.pyplot as plt
import numpy as np

# Function to format x-axis labels with degree symbols and N/S
def format_latitude(x, pos):
    if x < 0:
        return f'{abs(x)}°S'
    elif x > 0:
        return f'{x}°N'
    else:
        return '0°'

### Exercise:
Plot other observations, hints in comments

*RMSE is still calculated with GPCP: `obs_data`

In [None]:
plt.figure(figsize=(10, 6))

obs_data = obs_datasets_prep1["GPCP-V2.3"] #
model_data = model_datasets_prep1["ACCESS-ESM1-5"] #

# Plot observation data in black
qplt.plot(obs_data, color='black', label='Observation', linewidth=4)

#### plot another "NOAA-CIRES-20CR-V2", label='20CR-V2' ####
qplt.plot(, color='black', , linewidth=2)  ### complete

# Plot model data in blue
qplt.plot(model_data, color='blue', label='Model', linewidth=4)

rmse = np.sqrt(np.mean((obs_data.data - model_data.data) ** 2))

# Adding labels and title
plt.xlabel('Latitude')
plt.ylabel('Precipitation (mm/day)')
plt.title('Mean Precipitation')
plt.grid(linestyle='--')
plt.yticks(np.arange(0,16, step=5))


# Adding RMSE text on top of the plot
plt.text(0.5, 0.95, f'RMSE: {rmse:.2f} mm/day', fontsize=12, ha='center', transform=plt.gca().transAxes,
         bbox=dict(facecolor='white', alpha=0.8, edgecolor='none'))

# Setting custom x-axis formatter
plt.gca().xaxis.set_major_formatter(plt.FuncFormatter(format_latitude))

# Adding a legend
plt.legend()

# Display the plot
plt.show()

Figure 1: Meridional bias in the time-mean precipitation (PR) structure across the eastern Pacific (averaged between 150–90°W), primarily illustrating the double intertropical convergence zone (ITCZ) bias. The black curve represents the reference data, while the blue curve represents the model data. The metric shown is the meridional RMSE between the model and reference curves.

# Diagnostics Level 2


In [None]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
from matplotlib import cm
import matplotlib.colors as mcolors
import cmocean
# Importing numpy for linear space creation
import numpy as np

import iris
import iris.plot as iplt

### Exercise:
Change the observation dataset 

In [None]:
#### change this observational dataset to NOAA-CIRES-20CR-V2
obs_data = obs_datasets_prep2["GPCP-V2.3"] ## both are in this dictionary 

model_data = model_datasets_prep2["ACCESS-ESM1-5"]

# Create a colormap from cmocean balance
cmap = cmocean.cm.balance
colors = cmap(np.linspace(0, 1, 14))

# Create a large figure
fig = plt.figure(figsize=(20, 10))  # Adjusted for full width

# Center the map on the Pacific Ocean (roughly around 180° longitude)
proj = ccrs.Orthographic(central_longitude=-160, central_latitude=0)

# Define the normalization for the color scale with range of 0 to 12 mm
norm = mcolors.Normalize(vmin=0, vmax=12)

# Plot #1: contourf with axes longitude from -180 to 180
ax1 = plt.subplot(121, projection=proj)
cf1 = iplt.contourf(obs_data, cmap=cmap, norm=norm)
ax1.add_feature(cfeature.LAND, facecolor='gray')  # Add land feature with gray color
ax1.coastlines()
ax1.set_extent([120, -80, -20, 20], crs=ccrs.PlateCarree())

ax1.set_title("ref: GPCPv2.3") ### Change label to match

# Add gridlines for latitude and longitude
gl1 = ax1.gridlines(draw_labels=True, linestyle='--')
gl1.top_labels = False
gl1.right_labels = False

# Plot #2: contourf with axes longitude from 0 to 360
ax2 = plt.subplot(122, projection=proj)
cf2 = iplt.contourf(model_data, cmap=cmap, norm=norm)
ax2.add_feature(cfeature.LAND, facecolor='gray')  # Add land feature with gray color
ax2.coastlines()
ax2.set_extent([120, -80, -20, 20], crs=ccrs.PlateCarree())
ax2.set_title("ACCESS-ESM1-5 r1i1p1f1")

# Add gridlines for latitude and longitude
gl2 = ax2.gridlines(draw_labels=True, linestyle='--')
gl2.top_labels = False
gl2.right_labels = False

# Add a single colorbar at the bottom
cbar = fig.colorbar(cf2, ax=[ax1, ax2], orientation='horizontal', extend='both', pad=0.15, shrink=0.8)
cbar.set_label('Precipitation (mm/day)')
cbar.set_ticks(np.arange(0, 13, 2))  # Set ticks at intervals of 2 mm

plt.subplots_adjust(bottom=0.3)  # Further adjust bottom margin to fit the colorbar
plt.show()

Figure 2: Time-mean precipitation (PR) bias in the equatorial Pacific, highlighting the double intertropical convergence zone (ITCZ) bias. The left map displays the reference data, while the right map shows the model data.