## Seasonal NAO skill comparison ##

In the decadal predictions, we observe a pretty significant drop in skill post-2005 initiailisation. As the dataset is limited to around 50 years and we have only observed one 'forecast bust' like this, we don't know whether Doug was just lucky or whether something is actually going wrong in the models in the recent period.

To explore this, we consider the seasonal hindcasts:

1. CSF-20C (Coupled Seasonal Forecast):This is a hindcast performed with the model fully coupled to the Nucleus for European Modelling of the Ocean (NEMO) ocean model, initialised from CERA-20C reanalysis data.
2. ASF-20C (Atmospheric Seasonal Forecasts): This is a hindcast performed with prescribed Sea Surface Temperature (SST) and sea-ice boundary conditions at the surface, initialised form ERA-20C reanalysis data.

These both run from initialisation in 1901-2009, so give a longer time period.

Strommen et al. (2023) propose that seasonal forecasts can be used to diagnose decadal forecast signals over a longer period. We plan to look at this in the context of 8 year running mean (years 2-9 of the decadal forecast) NAO predictability. In doing this, we want to explore the following questions:

* Was the recent drop in NAO skill an outlier in the longer period (1901-2020)?
* Have similar 'forecast bust' periods been observed before?
* Are there similar conditions occuring during previous bust periods?
    * i.e. divergence of model/obs SPNA SSTs

In [1]:
# Import relevant modules
import os
import sys

# Third party imports
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt

# Testing dask again
import dask
import dask.array as da

# Import cdo for regridding
from cdo import Cdo
cdo = Cdo()

In [2]:
# Import local dictionaries and functions
sys.path.append('/home/users/benhutch/skill-maps')

# Import dictionaries
import dictionaries as dic

# Set up the path where the functions are stored
sys.path.append('/home/users/benhutch/skill-maps/python')

# Import functions
import functions as func

# Import the NAO skill functions
import nao_skill_functions as nao_func

# Set up the path where the functions are stored
sys.path.append('/home/users/benhutch/skill-maps/rose-suite-matching')

# Import the suite functions
import nao_matching_seasons as nms_func

# Import the bootstrapping functions
import process_bs_values as pbs_func

### Data access issues ###

* ERA20C reanalysis not easily accessible?
* Who should I contact to try to get access?
* Maybe just look back to 1940 onwards (ERA5 window) first

In [3]:
# Find some of the ASF20C data on jasmin
# Set up the parameters
model = "ASF20C" # Alternative is "CSF20C"
variable = "SLP" # Sea level pressure
initialisation = "Nov" # Month of initialisation
year = 1901 # First year of the hindcast
member=1 # Ensemble member
region="Global" # For regridding

# Set up the base path
base_path_20c = "/badc/deposited2020/seasonal-forecasts-20thc/data"

# Find the specific file
folder_name = f"{variable}monthly_{model}_{initialisation}START_ENSmems"

# Set up the file name
file_name = f"{variable}monthly_{year}_M{member}.nc"

# Set up the full path
file_path = os.path.join(base_path_20c, model,
                            folder_name, file_name)

# Check if the file exists
if os.path.exists(file_path):
    print(f"The file {file_path} exists")

# Set up directories in canari to store the regridded psl data
# Set up the base path
base_path = "/gws/nopw/j04/canari/users/benhutch"

# Form a new path
new_path_ASF20C = os.path.join(base_path, "seasonal",
                               "ASF20C", variable,
                                 f"{initialisation}_START")

# Same for the CSF20C
new_path_CSF20C = os.path.join(base_path, "seasonal",
                               "CSF20C", variable,
                                 f"{initialisation}_START")

# Check if the directories exist
if not os.path.exists(new_path_ASF20C):
    print(f"The directory {new_path_ASF20C} exists")
    # Create a new directory
    os.makedirs(new_path_ASF20C, exist_ok=False)

# Same for the CSF20C
if not os.path.exists(new_path_CSF20C):
    print(f"The directory {new_path_CSF20C} exists")
    # Create a new directory
    os.makedirs(new_path_CSF20C, exist_ok=False)

The file /badc/deposited2020/seasonal-forecasts-20thc/data/ASF20C/SLPmonthly_ASF20C_NovSTART_ENSmems/SLPmonthly_1901_M1.nc exists


In [4]:
# Find the gridspec for the file
gridspec_dir = "/home/users/benhutch/gridspec/"

# # Print all of the files within the directory
# print(os.listdir(gridspec_dir))

# Find gridspec-global.txt
gridspec_file = os.path.join(gridspec_dir, "gridspec-global.txt")

# Print the contents of the file
print(open(gridspec_file).read())

gridtype=lonlat
xfirst=-180
xinc=2.5
xsize=144
yfirst=-90
yinc=2.5
ysize=72


In [5]:
# Find the directory where all of the data to be regridded is stored
dir = os.path.join(base_path_20c, model,
                            folder_name)

# # Print all of the files within the directory
# print(os.listdir(dir))

# # Find the files within the directory
# files = os.listdir(dir)

# # Limit files to the first 10
# files = files[:10]

# # Loop through the files
# for file in files:
#     # Set up the output file name
#     output_file = os.path.join(new_path_ASF20C, f"{file[:-3]}_rg.nc")

#     # If the output file already exists, skip
#     if os.path.exists(output_file):
#         print(f"The file {output_file} exists")
#         continue

#     # Set up the input file name
#     input_file = os.path.join(dir, file)

#     # perform the regridding
#     cdo.remapbil(gridspec_file, input=input_file,
#                  output=output_file)

### Time ###

The timesteps for the seasonal forecasts are set up in a weird way. We are going to assume that if the forecast is initialized on the 1st November and then run forwards for four months, then the first timestep is the November mean, the second December, then January and February. Therefore, to get the DJF mean, we will take the average of all time steps but the first.

In [6]:
#Import pandas
import pandas as pd

# # Find my directory with the regridded data
# # Print the files contained within new_path_ASF20C
# print(os.listdir(new_path_ASF20C))

# Create an array of initialisation years
years = np.arange(1901, 2011)

# For a single file, collapse this into a DJF mean
# First calculate the anomaly field
# Then calculate the DJF mean anomalies
# Then azores gridbox mean - iceland gridbox mean
# Do this for a single file first as a test
test_file = os.path.join(new_path_ASF20C, "SLPmonthly_1901_M1_rg.nc")

# Read in the file
ds = xr.open_dataset(test_file)

ds

# # Extract the lats and lons
lats = ds['lat']
lons = ds['lon']

# Generate an array of years from 1901 to 2009
years = np.arange(1901, 2010)

# Generate an array for the members from 1 to 50
members = np.arange(1, 51)

# Print the shape of the data
print(np.shape(lats))
print(np.shape(lons))
print(np.shape(years))
print(np.shape(members))

# Extract the data for the variable: 'MSL_GDS0_SFC'
data = ds['MSL_GDS0_SFC']

# Print the shape of the data
print(data.shape)

# Take the mean over the time dimension
djf_mean = data[1:, :, :].mean(axis=0)

# Print the shape of the data
print(djf_mean.shape)

# Print the data
print(djf_mean)

(72,)
(144,)
(109,)
(50,)
(4, 72, 144)
(72, 144)
<xarray.DataArray 'MSL_GDS0_SFC' (lat: 72, lon: 144)>
array([[100824.6 , 100824.6 , 100824.6 , ..., 100824.6 , 100824.6 ,
        100824.6 ],
       [100243.15, 100247.77, 100253.4 , ..., 100259.94, 100253.44,
        100247.77],
       [ 99846.35,  99874.48,  99851.52, ..., 100026.48,  99937.15,
         99880.23],
       ...,
       [102820.85, 102822.23, 102824.94, ..., 102804.94, 102813.23,
        102817.81],
       [102873.69, 102879.85, 102886.85, ..., 102854.85, 102861.85,
        102869.35],
       [102807.19, 102810.77, 102814.1 , ..., 102792.31, 102797.56,
        102802.48]], dtype=float32)
Coordinates:
  * lon      (lon) float64 -180.0 -177.5 -175.0 -172.5 ... 172.5 175.0 177.5
  * lat      (lat) float64 -90.0 -87.5 -85.0 -82.5 -80.0 ... 80.0 82.5 85.0 87.5


In [7]:
# Start the dask client
from dask.distributed import Client, progress
client = Client(threads_per_worker=4, n_workers=1)
client

Perhaps you already have a cluster running?
Hosting the HTTP server on port 37941 instead


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:37941/status,

0,1
Dashboard: http://127.0.0.1:37941/status,Workers: 1
Total threads: 4,Total memory: 31.42 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:44863,Workers: 1
Dashboard: http://127.0.0.1:37941/status,Total threads: 4
Started: Just now,Total memory: 31.42 GiB

0,1
Comm: tcp://127.0.0.1:46841,Total threads: 4
Dashboard: http://127.0.0.1:34371/status,Memory: 31.42 GiB
Nanny: tcp://127.0.0.1:44759,
Local directory: /tmp/dask-worker-space/worker-pzs7rfsx,Local directory: /tmp/dask-worker-space/worker-pzs7rfsx


In [8]:
# Add additional workers
client.cluster.scale(10)

In [9]:
# Generate an array of years from 1901 to 2009
years = np.arange(1901, 2010)

# Create an empty Dask array to store the data
asf_psl_field = da.empty([len(years), len(members)])

# Print the shape of the array
print(asf_psl_field.shape)

# Extract the lats and lons for the NAO region
iceland_lat1, iceland_lat2 = dic.iceland_grid_corrected['lat1'], dic.iceland_grid_corrected['lat2']
iceland_lon1, iceland_lon2 = dic.iceland_grid_corrected['lon1'], dic.iceland_grid_corrected['lon2']

azores_lat1, azores_lat2 = dic.azores_grid_corrected['lat1'], dic.azores_grid_corrected['lat2']
azores_lon1, azores_lon2 = dic.azores_grid_corrected['lon1'], dic.azores_grid_corrected['lon2']

# Define a function to process a single year and member
def process_year_member(year, member):
    # Print the year and member
    print("Processing year: ", year, " member: ", member)

    # Set up the file name
    file_name = f"SLPmonthly_{year}_M{member}_rg.nc"

    # Set up the file path
    file_path = os.path.join(new_path_ASF20C, file_name)

    # Read in the file
    ds = xr.open_dataset(file_path, chunks={'time': 100, 'lat': 100, 'lon': 100})

    # Take the mean over the spatial dimension of the iceland gridbox
    iceland_mean = ds['MSL_GDS0_SFC'].sel(lat=slice(iceland_lat1, iceland_lat2),
                                          lon=slice(iceland_lon1, iceland_lon2)).mean(axis=(1, 2))

    # Take the mean over the spatial dimension of the azores gridbox
    azores_mean = ds['MSL_GDS0_SFC'].sel(lat=slice(azores_lat1, azores_lat2),
                                         lon=slice(azores_lon1, azores_lon2)).mean(axis=(1, 2))

    # Calculate the NAO index
    nao_index = azores_mean - iceland_mean

    # Take the DJF mean
    djf_mean = nao_index[1:].mean(axis=0)

    # Print the shape of the data
    print("shape of djf_mean: ", djf_mean.shape)
    print("value of djf_mean: ", djf_mean.values)

    # Return the Dask array
    return djf_mean

# Create an empty list to store the results
results = []

# Loop through the years and members
# for year in years:
#     print("Processing year: ", year)

# # Set the year
# year = years[0]

# years_test = years[:5]

for year in years:
    print("Processing year: ", year)
    # Test the function
    for member in members:
        print("Processing member: ", member)

        # Use dask.delayed to delay the function and append the result to the list
        result = dask.delayed(process_year_member)(year, member)
        results.append(result)

# Print the results
print(results)

(109, 50)
Processing year:  1901
Processing member:  1
Processing member:  2
Processing member:  3
Processing member:  4
Processing member:  5
Processing member:  6
Processing member:  7
Processing member:  8
Processing member:  9
Processing member:  10
Processing member:  11
Processing member:  12
Processing member:  13
Processing member:  14
Processing member:  15
Processing member:  16
Processing member:  17
Processing member:  18
Processing member:  19
Processing member:  20
Processing member:  21
Processing member:  22
Processing member:  23
Processing member:  24
Processing member:  25
Processing member:  26
Processing member:  27
Processing member:  28
Processing member:  29
Processing member:  30
Processing member:  31
Processing member:  32
Processing member:  33
Processing member:  34
Processing member:  35
Processing member:  36
Processing member:  37
Processing member:  38
Processing member:  39
Processing member:  40
Processing member:  41
Processing member:  42
Processing

Processing member:  40
Processing member:  41
Processing member:  42
Processing member:  43
Processing member:  44
Processing member:  45
Processing member:  46
Processing member:  47
Processing member:  48
Processing member:  49
Processing member:  50
Processing year:  1938
Processing member:  1
Processing member:  2
Processing member:  3
Processing member:  4
Processing member:  5
Processing member:  6
Processing member:  7
Processing member:  8
Processing member:  9
Processing member:  10
Processing member:  11
Processing member:  12
Processing member:  13
Processing member:  14
Processing member:  15
Processing member:  16
Processing member:  17
Processing member:  18
Processing member:  19
Processing member:  20
Processing member:  21
Processing member:  22
Processing member:  23
Processing member:  24
Processing member:  25
Processing member:  26
Processing member:  27
Processing member:  28
Processing member:  29
Processing member:  30
Processing member:  31
Processing member:  

In [10]:
# Start the Dask computation
asf_psl_field_results = dask.compute(*results)

# Print the results
print(type(asf_psl_field_results))

# # Print the shape of the results
# print(asf_psl_field_results.shape)

# Print the results
print(asf_psl_field_results)

# Create a test dask array
test_dask_array = da.empty([len(years), len(members)])

# Print the shape of the array
print(test_dask_array.shape)

member_counter = 0
year_counter = 0

# Loop through the results
for i, result in enumerate(asf_psl_field_results):
    # Print the shape of the result
    print(f"result {i} has value: ", result)

    # Set the value of the dask array
    test_dask_array[year_counter, member_counter] = result

    # Increment the counters
    member_counter += 1

    # If the member counter is equal to the number of members
    if member_counter == len(members):
        # Reset the member counter
        member_counter = 0

        # Increment the year counter
        year_counter += 1

# Print the dask array shape
print(test_dask_array.shape)

Processing year:  1902  member:  1
Processing year:  1948  member: Processing year:  1979 7
 Processing year:  1904  member:  10
 member:  28
shape of djf_mean:  ()
shape of djf_mean:  ()
shape of djf_mean:  ()
shape of djf_mean:  ()
Processing year:  1927  member:  37
Processing year:  1943  member:  20
Processing year:  1966  member:  36
Processing year:  1945  member:  33
shape of djf_mean:  ()
shape of djf_mean:  ()
shape of djf_mean:  ()
Processing year:  2009  member:  6
Processing year:  2007  member:  32
Processing year:  1958  member:  6
shape of djf_mean:  ()
shape of djf_mean:  ()
Processing year:  1995  member:  2
Processing year:  1903  member:  38
shape of djf_mean:  ()
Processing year:  1919  member:  46
shape of djf_mean:  ()
shape of djf_mean:  ()
shape of djf_mean:  ()
Processing year:  1937  member:  20
Processing year:  1937  member:  45
Processing year:  1960  member:  2
shape of djf_mean:  ()
shape of djf_mean:  ()
shape of djf_mean:  ()
Processing year:  1945  me

In [12]:
test_dask_array

Unnamed: 0,Array,Chunk
Bytes,42.58 kiB,42.58 kiB
Shape,"(109, 50)","(109, 50)"
Count,76301 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 42.58 kiB 42.58 kiB Shape (109, 50) (109, 50) Count 76301 Tasks 1 Chunks Type float64 numpy.ndarray",50  109,

Unnamed: 0,Array,Chunk
Bytes,42.58 kiB,42.58 kiB
Shape,"(109, 50)","(109, 50)"
Count,76301 Tasks,1 Chunks
Type,float64,numpy.ndarray


In [13]:
# Print the dask array
print(test_dask_array.compute())

[[  165.9140625  -1205.68493652   775.66668701 ...  -642.8515625
   -245.80729675 -1406.046875  ]
 [-1656.84899902  -717.47137451 -1462.578125   ...  -183.76823425
   -661.29425049 -1072.25256348]
 [ -788.1171875  -1091.42443848  -735.96875    ...  -527.09375
   -974.4140625   -838.2578125 ]
 ...
 [-1472.703125    -563.78125     -446.63803101 ...  -754.328125
  -1428.44006348   131.90625   ]
 [-1197.09118652 -1729.90625     -223.09114075 ... -1186.80212402
  -1015.44012451  -428.80728149]
 [ -590.765625    -432.9375       325.50521851 ... -1200.84375
   -716.79425049 -1224.02868652]]


In [14]:
model_nao_members = test_dask_array

In [16]:
# Import local dictionaries and functions
sys.path.append('/home/users/benhutch/skill-maps')

# Import dictionaries
import dictionaries as dic

# Set up the path where the functions are stored
sys.path.append('/home/users/benhutch/skill-maps/python')

# Import functions
import functions as func

# Import the NAO skill functions
import nao_skill_functions as nao_func

# Set up the path where the functions are stored
sys.path.append('/home/users/benhutch/skill-maps/rose-suite-matching')

# Import the suite functions
import nao_matching_seasons as nms_func

# Import the bootstrapping functions
import process_bs_values as pbs_func

In [40]:
# Find the psl field data
# Set up the base path
base_path_canari = "/gws/nopw/j04/canari/users/benhutch"

# File name
file_name = "adaptor.mars.internal-1703255460.0952423-14259-18-80988a8a-51a2-4138-9b16-41a188e1c9d8.grib"

# Form a new path
new_path_ERA5 = os.path.join(base_path_canari, "ERA5", file_name)

# Check if the file exists
if os.path.exists(new_path_ERA5):
    print(f"The file {new_path_ERA5} exists")

# Setup a directory for the regridded data
regrid_dir = os.path.join(base_path_canari, "ERA5", "regrid")

# Check if the directory exists
if not os.path.exists(regrid_dir):
    print(f"The directory {regrid_dir} does not exist")
    # Create a new directory
    os.makedirs(regrid_dir, exist_ok=False)

# Set up the output file name
output_file = os.path.join(regrid_dir, f"{file_name[:-5]}_rg.grib")

# # Print the output file name
# print(output_file)

# Set up the gridspec file
gridspec_file = os.path.join(gridspec_dir, "gridspec-global.txt")

# Check if the file exists
if os.path.exists(gridspec_file):
    print(f"The file {gridspec_file} exists")

# Only if the output file does not exist
if not os.path.exists(output_file):
    print(f"The file {output_file} does not exist")

    # Regrid the file
    cdo.remapbil(gridspec_file, input=new_path_ERA5, output=output_file)

# Find the file ending with _rg.nc in the regrid directory
# List the files in the directory
print(os.listdir(regrid_dir))

# Find the file ending with _rg.nc
files = os.listdir(regrid_dir)

# Loop through the files
psl_obs = [file for file in files if file.endswith("_rg.nc")]

# Print the files
print(psl_obs)

# Form the full path
psl_obs = os.path.join(regrid_dir, psl_obs[0])

# Extract using xarray
psl_obs = xr.open_dataset(
    psl_obs, chunks={"time": 100, "lat": 100, "lon": 100}
)

# Extract the months of December, January and February
psl_obs_djf = psl_obs.sel(time=psl_obs["time.season"] == "DJF")

# Calculate the mean over the time dimension - the climatology
psl_obs_djf_mean = psl_obs_djf.mean(dim="time")

# Remove the mean from the data
psl_obs_djf_anom = psl_obs_djf - psl_obs_djf_mean

# psl_obs_djf_anom

# Shift the time back by -2 and take the annual mean
# to get the annual DJF mean
psl_obs_djf_anom = psl_obs_djf_anom.shift(time=-2).resample(time="Y").mean(dim="time")

# Now take the rolling mean over 8 years
psl_obs_djf_anom = psl_obs_djf_anom.rolling(time=8, center=True).mean()

# Calculate the NAO index
# Extract the lats and lons for the NAO region
iceland_lat1, iceland_lat2 = (
    dic.iceland_grid_corrected["lat1"],
    dic.iceland_grid_corrected["lat2"],
)
iceland_lon1, iceland_lon2 = (
    dic.iceland_grid_corrected["lon1"],
    dic.iceland_grid_corrected["lon2"],
)

# And for the azores
azores_lat1, azores_lat2 = (
    dic.azores_grid_corrected["lat1"],
    dic.azores_grid_corrected["lat2"],
)
azores_lon1, azores_lon2 = (
    dic.azores_grid_corrected["lon1"],
    dic.azores_grid_corrected["lon2"],
)

# Take the mean over the spatial dimension of the iceland gridbox
iceland_mean = (
    psl_obs_djf_anom["var151"]
    .sel(
        lat=slice(iceland_lat1, iceland_lat2),
        lon=slice(iceland_lon1, iceland_lon2),
    )
    .mean(axis=(1, 2))
)

# Take the mean over the spatial dimension of the azores gridbox
azores_mean = (
    psl_obs_djf_anom["var151"]
    .sel(
        lat=slice(azores_lat1, azores_lat2),
        lon=slice(azores_lon1, azores_lon2),
    )
    .mean(axis=(1, 2))
)

# Calculate the NAO index
nao_index = azores_mean - iceland_mean

# Print the shape of the data
print("shape of nao_index: ", nao_index.shape)

# Print the data
print(nao_index)

The file /gws/nopw/j04/canari/users/benhutch/ERA5/adaptor.mars.internal-1703255460.0952423-14259-18-80988a8a-51a2-4138-9b16-41a188e1c9d8.grib exists
The file /home/users/benhutch/gridspec/gridspec-global.txt exists
['adaptor.mars.internal-1703255460.0952423-14259-18-80988a8a-51a2-4138-9b16-41a188e1c9d8_rg.grib', 'adaptor.mars.internal-1703255460.0952423-14259-18-80988a8a-51a2-4138-9b16-41a188e1c9d8_rg.nc']
['adaptor.mars.internal-1703255460.0952423-14259-18-80988a8a-51a2-4138-9b16-41a188e1c9d8_rg.nc']
shape of nao_index:  (84,)
<xarray.DataArray 'var151' (time: 84)>
dask.array<sub, shape=(84,), dtype=float64, chunksize=(8,), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 1940-12-31 1941-12-31 ... 2023-12-31
