## Alternate lag testing ##

Here we test a new methodology for the alternate lagging. Having calculated the anomalies for all of the models, ensemble members, start dates, and forecast years in the '*calc_anoms_suite*', we want to load these into python to form arrays with shapes like:

(178, 60, 11, 72, 144)

Where the dimensions are as following:

* 178 total ensemble members (from all of the models)
* 60 start dates (~1960-2020)
* 11 forecast years (could this differ between models - *may have to watch out for*)
* 72 latitude bands
* 144 longitude bands

As a first exercise, it would be useful to load in an array for a single model, in this case: BCC-CSM2-MR. The shape would look something like:

(8, 60, 11, 72, 144)

In [1]:
# Import local modules
import sys
import os
import argparse

# Import 3rd party modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr

In [2]:
# import the dictionaries
# /home/users/benhutch/lagging-NAO-test-suite/dictionaries.py
sys.path.append("/home/users/benhutch/lagging-NAO-test-suite/alternate_lag_suite/")

# Import the function
from alternate_lag_functions import load_data

# import the dictionaries
# /home/users/benhutch/lagging-NAO-test-suite/dictionaries.py
sys.path.append("/home/users/benhutch/lagging-NAO-test-suite/")

# Import the dictionaries
import dictionaries as dicts


In [3]:
# Set up the arguments
base_dir = "/gws/nopw/j04/canari/users/benhutch/skill-maps-processed-data"
variable = "psl"
model = "HadGEM3-GC31-MM"
region = "global"
forecast_range = "all_forecast_years"
season = "DJFM"

In [4]:
# load data
dir = "/gws/nopw/j04/canari/users/benhutch/alternate-lag-processed-data/"

# Filenames
file_name1 = "psl_DJFM_global_1961_1968_2-3_4_1705426743.8047726_alternate_lag.npy"
file_name2 = "psl_DJFM_global_1961_1968_2-3_4_1705426743.8047726.npy"

# Load the data
data1 = np.load(dir + file_name1)
data2 = np.load(dir + file_name2)

# Check the data
print(data1.shape)
print(data2.shape)

# # Check the data
# print(data1)
print(data2)

(5, 712, 72, 144)
(8, 178, 11, 72, 144)
[[[[[ 5.85475830e+02  5.85475830e+02  5.85475830e+02 ...
      5.85475830e+02  5.85475830e+02  5.85475830e+02]
    [ 4.79863647e+02  4.80430389e+02  4.81120728e+02 ...
      4.78723022e+02  4.78934662e+02  4.79333099e+02]
    [ 5.14544006e+02  5.19569580e+02  5.24133545e+02 ...
      4.96996460e+02  5.03201691e+02  5.09049713e+02]
    ...
    [ 1.88354401e+02  1.92788345e+02  1.96699570e+02 ...
      1.72051849e+02  1.77877136e+02  1.83345886e+02]
    [ 1.75125717e+02  1.77489349e+02  1.79406250e+02 ...
      1.65506393e+02  1.69114349e+02  1.72327408e+02]
    [ 1.15078835e+02  1.15944603e+02  1.16530540e+02 ...
      1.10773438e+02  1.12476562e+02  1.13912643e+02]]

   [[ 3.11264923e+02  3.11264923e+02  3.11264923e+02 ...
      3.11264923e+02  3.11264923e+02  3.11264923e+02]
    [ 3.22238647e+02  3.22196014e+02  3.22237915e+02 ...
      3.22855835e+02  3.22575287e+02  3.22372162e+02]
    [ 3.67434662e+02  3.68327423e+02  3.69219452e+02 ...
     

In [5]:
# # Define a test list
# test_models_list = [ "HadGEM3-GC31-MM",
#                      "EC-Earth3",
#                      "NorCPM1"
#                     ]

# Test the new function
data = load_data(variable=variable, 
                 models_list=dicts.models,
                 season=season,
                 start_year=1961,
                 end_year=1971) # testing final year range

BCC-CSM2-MR: 11 years
MPI-ESM1-2-HR: 11 years
CanESM5: 11 years
CMCC-CM2-SR5: 11 years
HadGEM3-GC31-MM: 11 years
EC-Earth3: 11 years
MPI-ESM1-2-LR: 11 years
FGOALS-f3-L: 11 years
MIROC6: 11 years
IPSL-CM6A-LR: 11 years
CESM1-1-CAM5-CMIP5: 11 years
NorCPM1: 11 years
BCC-CSM2-MR: 8 ensemble members: 1961
BCC-CSM2-MR: 8 ensemble members: 1971
MPI-ESM1-2-HR: 10 ensemble members: 1961
MPI-ESM1-2-HR: 10 ensemble members: 1971
CanESM5: 20 ensemble members: 1961
CanESM5: 20 ensemble members: 1971
CMCC-CM2-SR5: 10 ensemble members: 1961
CMCC-CM2-SR5: 10 ensemble members: 1971
HadGEM3-GC31-MM: 10 ensemble members: 1961
HadGEM3-GC31-MM: 10 ensemble members: 1971
EC-Earth3: 15 ensemble members: 1961
EC-Earth3: 15 ensemble members: 1971
MPI-ESM1-2-LR: 16 ensemble members: 1961
MPI-ESM1-2-LR: 16 ensemble members: 1971
FGOALS-f3-L: 9 ensemble members: 1961
FGOALS-f3-L: 9 ensemble members: 1971
MIROC6: 10 ensemble members: 1961
MIROC6: 10 ensemble members: 1971
IPSL-CM6A-LR: 10 ensemble members: 1961


In [None]:
# Print the shape of the data
print(data.shape)

# Print the data
print(data)

(11, 178, 11, 72, 144)
[[[[[ 5.85475830e+02  5.85475830e+02  5.85475830e+02 ...
      5.85475830e+02  5.85475830e+02  5.85475830e+02]
    [ 4.79863647e+02  4.80430389e+02  4.81120728e+02 ...
      4.78723022e+02  4.78934662e+02  4.79333099e+02]
    [ 5.14544006e+02  5.19569580e+02  5.24133545e+02 ...
      4.96996460e+02  5.03201691e+02  5.09049713e+02]
    ...
    [ 1.88354401e+02  1.92788345e+02  1.96699570e+02 ...
      1.72051849e+02  1.77877136e+02  1.83345886e+02]
    [ 1.75125717e+02  1.77489349e+02  1.79406250e+02 ...
      1.65506393e+02  1.69114349e+02  1.72327408e+02]
    [ 1.15078835e+02  1.15944603e+02  1.16530540e+02 ...
      1.10773438e+02  1.12476562e+02  1.13912643e+02]]

   [[ 3.11264923e+02  3.11264923e+02  3.11264923e+02 ...
      3.11264923e+02  3.11264923e+02  3.11264923e+02]
    [ 3.22238647e+02  3.22196014e+02  3.22237915e+02 ...
      3.22855835e+02  3.22575287e+02  3.22372162e+02]
    [ 3.67434662e+02  3.68327423e+02  3.69219452e+02 ...
      3.59988647e+02  

In [None]:
# Create a file path to save the data
save_dir = "/home/users/benhutch/lagging-NAO-test-suite/saved_arrays"

# If the directory doesn't exist, create it
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

# Set the file name for the data
file_name = f"data_{variable}_{model}_{region}_{forecast_range}_{season}_1961-1971"

# Create the file path
file_path = os.path.join(save_dir, file_name)

# Save the data as as .npy file
np.save(file_path, data)

In [None]:
# Importlib reload
import importlib

# Reload the function
importlib.reload(sys.modules['alternate_lag_functions'])

# Import the function
from alternate_lag_functions import load_data, alternate_lag

In [None]:
# Fixed file path
fixed_file_path = "/home/users/benhutch/lagging-NAO-test-suite/saved_arrays/data_psl_HadGEM3-GC31-MM_global_all_forecast_years_DJFM_1961-1971.nc.npy"

# Load the data from the .npy file with allow_pickle=True
data = np.load(file_path + ".npy")

# Print the shape of the data
print(data.shape)

# Print the data
print(data)

(11, 178, 11, 72, 144)
[[[[[ 5.85475830e+02  5.85475830e+02  5.85475830e+02 ...
      5.85475830e+02  5.85475830e+02  5.85475830e+02]
    [ 4.79863647e+02  4.80430389e+02  4.81120728e+02 ...
      4.78723022e+02  4.78934662e+02  4.79333099e+02]
    [ 5.14544006e+02  5.19569580e+02  5.24133545e+02 ...
      4.96996460e+02  5.03201691e+02  5.09049713e+02]
    ...
    [ 1.88354401e+02  1.92788345e+02  1.96699570e+02 ...
      1.72051849e+02  1.77877136e+02  1.83345886e+02]
    [ 1.75125717e+02  1.77489349e+02  1.79406250e+02 ...
      1.65506393e+02  1.69114349e+02  1.72327408e+02]
    [ 1.15078835e+02  1.15944603e+02  1.16530540e+02 ...
      1.10773438e+02  1.12476562e+02  1.13912643e+02]]

   [[ 3.11264923e+02  3.11264923e+02  3.11264923e+02 ...
      3.11264923e+02  3.11264923e+02  3.11264923e+02]
    [ 3.22238647e+02  3.22196014e+02  3.22237915e+02 ...
      3.22855835e+02  3.22575287e+02  3.22372162e+02]
    [ 3.67434662e+02  3.68327423e+02  3.69219452e+02 ...
      3.59988647e+02  

In [None]:
# Process the data with a year 2-5 lag
data_alternate_lag_y2_5 = alternate_lag(data=data,
                                        forecast_range="2-5",
                                        years=np.arange(1961, 1971+1),
                                        lag=4)

no_lagged_years:  8
First lagged year:  1964
Last lagged year:  1971
Processing data for lag year index:  0
Processing data for lag index:  0
Extracting data for year index:  3
Extracting data for year:  1964
For lag index:  0
start year:  2  end year:  5
year index:  0  ensemble member index:  0  lag index:  0
Appending to: year index:  0  ensemble member index:  0
start year:  2  end year:  5
year index:  0  ensemble member index:  1  lag index:  0
Appending to: year index:  0  ensemble member index:  4
start year:  2  end year:  5
year index:  0  ensemble member index:  2  lag index:  0
Appending to: year index:  0  ensemble member index:  8
start year:  2  end year:  5
year index:  0  ensemble member index:  3  lag index:  0
Appending to: year index:  0  ensemble member index:  12
start year:  2  end year:  5
year index:  0  ensemble member index:  4  lag index:  0
Appending to: year index:  0  ensemble member index:  16
start year:  2  end year:  5
year index:  0  ensemble member 

In [11]:
# Print the shape of the data
print(data_alternate_lag_y2_5.shape)

# Print the data
print(data_alternate_lag_y2_5)

(8, 712, 72, 144)
[[[[ 3.23176376e+02  3.23176376e+02  3.23176376e+02 ...  3.23176376e+02
     3.23176376e+02  3.23176376e+02]
   [ 2.81225616e+02  2.82701233e+02  2.84196253e+02 ...  2.76829783e+02
     2.78254974e+02  2.79741954e+02]
   [ 3.32517985e+02  3.34663348e+02  3.36584048e+02 ...  3.24850624e+02
     3.27753779e+02  3.30252848e+02]
   ...
   [ 5.45593974e+02  5.51582621e+02  5.57316767e+02 ...  5.26228943e+02
     5.32903158e+02  5.39364095e+02]
   [ 5.63818420e+02  5.67984131e+02  5.71895833e+02 ...  5.49964722e+02
     5.54778402e+02  5.59400330e+02]
   [ 5.58966858e+02  5.60553975e+02  5.62001892e+02 ...  5.53486979e+02
     5.55419271e+02  5.57251180e+02]]

  [[-8.65632070e+01 -8.65632070e+01 -8.65632070e+01 ... -8.65632070e+01
    -8.65632070e+01 -8.65632070e+01]
   [-1.32261359e+02 -1.31751897e+02 -1.31209995e+02 ... -1.33165005e+02
    -1.32976797e+02 -1.32653880e+02]
   [-8.07788798e+01 -7.73053945e+01 -7.42284571e+01 ... -9.47379201e+01
    -8.95040283e+01 -8.487737

In [4]:
# Form the directory path
dir_path = os.path.join(base_dir, variable, model, region, forecast_range,
                        season, "outputs", "anoms")

print(dir_path)

# List the files ending with *.nc in this directory
file_list = [f for f in os.listdir(dir_path) if f.endswith(".nc")]

# Print the list of files
print(file_list)

# Find the file containing "s1970" and "r1i1"
test_file = [f for f in file_list if "s1961" in f and "r1i1" in f][0]

# Print the test file
print(test_file)

/gws/nopw/j04/canari/users/benhutch/skill-maps-processed-data/psl/HadGEM3-GC31-MM/global/all_forecast_years/DJFM/outputs/anoms
['all-years-DJFM-global-psl_Amon_HadGEM3-GC31-MM_dcppA-hindcast_s1960-r10i1_gn_196011-197103-anoms.nc', 'all-years-DJFM-global-psl_Amon_HadGEM3-GC31-MM_dcppA-hindcast_s1960-r1i1_gn_196011-197103-anoms.nc', 'all-years-DJFM-global-psl_Amon_HadGEM3-GC31-MM_dcppA-hindcast_s1960-r2i1_gn_196011-197103-anoms.nc', 'all-years-DJFM-global-psl_Amon_HadGEM3-GC31-MM_dcppA-hindcast_s1960-r3i1_gn_196011-197103-anoms.nc', 'all-years-DJFM-global-psl_Amon_HadGEM3-GC31-MM_dcppA-hindcast_s1960-r4i1_gn_196011-197103-anoms.nc', 'all-years-DJFM-global-psl_Amon_HadGEM3-GC31-MM_dcppA-hindcast_s1960-r5i1_gn_196011-197103-anoms.nc', 'all-years-DJFM-global-psl_Amon_HadGEM3-GC31-MM_dcppA-hindcast_s1960-r6i1_gn_196011-197103-anoms.nc', 'all-years-DJFM-global-psl_Amon_HadGEM3-GC31-MM_dcppA-hindcast_s1960-r7i1_gn_196011-197103-anoms.nc', 'all-years-DJFM-global-psl_Amon_HadGEM3-GC31-MM_dcppA-h

In [5]:
# Load in the test file using xarray
test_ds = xr.open_dataset(os.path.join(dir_path, test_file))

test_ds

# Extract the data for the variable
test_ds_psl = test_ds[variable]

# Print the data
print(test_ds_psl)

# Extract the years from the time dimension
test_ds_psl_years = test_ds_psl.time.dt.year

<xarray.DataArray 'psl' (time: 11, lat: 72, lon: 144)>
[114048 values with dtype=float32]
Coordinates:
  * time     (time) object 1961-11-01 00:00:00 ... 1971-11-01 00:00:00
  * lon      (lon) float64 -180.0 -177.5 -175.0 -172.5 ... 172.5 175.0 177.5
  * lat      (lat) float64 -90.0 -87.5 -85.0 -82.5 -80.0 ... 80.0 82.5 85.0 87.5
Attributes:
    standard_name:  air_pressure_at_mean_sea_level
    long_name:      Sea Level Pressure
    units:          Pa
    cell_methods:   area: time: mean
    comment:        Sea Level Pressure
    original_name:  mo: (stash: m01s16i222, lbproc: 128)
    cell_measures:  area: areacella


In [9]:
# Set up an empty array to store the data
years = np.arange(1961, 2015) # 1964 to 2014 inclusive - initialisation years

# Set up the number of years
num_years = len(years)

# Set up the number of ensemble members
nens = 10 # Now for HadGEM3-GC31-MM

# Extract the number of forecast years
no_forecast_years = test_ds_psl.shape[0]

# Set up the no of lats
no_lats = test_ds_psl.shape[1]

# Set up the no of lons
no_lons = test_ds_psl.shape[2]

# Set up an empty array to store the data
test_array = np.zeros([num_years, nens, no_forecast_years, no_lats, no_lons])

# Print the shape of the array
print(test_array.shape)

(54, 10, 11, 72, 144)


In [10]:
# Loop over the years
for i, year in enumerate(years):
    # Loop over the ensemble members
    for j in range(nens):
        # logging to know where we are
        print(" year index: ", i, " ensemble member index: ", j)
        
        # Find the file containing "s1970" and "r1i1"
        test_file = [f for f in file_list if f"s{year}" in f and f"r{j+1}i1" in f][0]
        # Load in the test file using xarray
        test_ds = xr.open_dataset(os.path.join(dir_path, test_file))
        # Extract the data for the variable
        test_ds_psl = test_ds[variable]
        # Store the data in the array
        test_array[i, j, :, :, :] = test_ds_psl

 year index:  0  ensemble member index:  0
 year index:  0  ensemble member index:  1
 year index:  0  ensemble member index:  2
 year index:  0  ensemble member index:  3
 year index:  0  ensemble member index:  4
 year index:  0  ensemble member index:  5
 year index:  0  ensemble member index:  6
 year index:  0  ensemble member index:  7
 year index:  0  ensemble member index:  8
 year index:  0  ensemble member index:  9
 year index:  1  ensemble member index:  0
 year index:  1  ensemble member index:  1
 year index:  1  ensemble member index:  2
 year index:  1  ensemble member index:  3
 year index:  1  ensemble member index:  4
 year index:  1  ensemble member index:  5
 year index:  1  ensemble member index:  6
 year index:  1  ensemble member index:  7
 year index:  1  ensemble member index:  8
 year index:  1  ensemble member index:  9
 year index:  2  ensemble member index:  0
 year index:  2  ensemble member index:  1
 year index:  2  ensemble member index:  2
 year index

In [11]:
print(test_array.shape)

# Print the array
print(test_array)

(54, 10, 11, 72, 144)
[[[[[-2.29538345e+02 -2.29538345e+02 -2.29538345e+02 ...
     -2.29538345e+02 -2.29538345e+02 -2.29538345e+02]
    [-2.44167618e+02 -2.45504974e+02 -2.46731537e+02 ...
     -2.40617188e+02 -2.42375000e+02 -2.43585220e+02]
    [-2.71251434e+02 -2.74858673e+02 -2.77549713e+02 ...
     -2.60718048e+02 -2.65893463e+02 -2.69452423e+02]
    ...
    [-3.77497864e+02 -3.82338074e+02 -3.87235809e+02 ...
     -3.63231537e+02 -3.67918335e+02 -3.72695312e+02]
    [-4.13723724e+02 -4.17587372e+02 -4.21272736e+02 ...
     -4.01080963e+02 -4.05515625e+02 -4.09715210e+02]
    [-4.02450287e+02 -4.04583099e+02 -4.06536926e+02 ...
     -3.95260651e+02 -3.97759949e+02 -4.00160522e+02]]

   [[-4.18053986e+02 -4.18053986e+02 -4.18053986e+02 ...
     -4.18053986e+02 -4.18053986e+02 -4.18053986e+02]
    [-3.79526978e+02 -3.79004974e+02 -3.77387787e+02 ...
     -3.85093750e+02 -3.82593750e+02 -3.79343048e+02]
    [-3.32001434e+02 -3.32069611e+02 -3.28471588e+02 ...
     -3.42444611e+02 -3

Now we have loaded the data into an array, we want to test how taking the alternate lag would work.

In [16]:
# Set up the parameters for the alternate lag calculation
forecast_range = "2-5"

In [17]:
# Test the function
lagged_correlation = alternate_lag(test_array, forecast_range, years)

# Print the shape of the lagged correlation
print(lagged_correlation.shape)

# Print the lagged correlation
print(lagged_correlation)

no_lagged_years:  51
Processing data for lag year index:  0
Processing data for lag index:  0
Extracting data for year index:  3
Extracting data for year:  1964
For lag index:  0
start year:  2  end year:  5
year index:  0  ensemble member index:  0  lag index:  0
Appending to: year index:  0  ensemble member index:  0
start year:  2  end year:  5
year index:  0  ensemble member index:  1  lag index:  0
Appending to: year index:  0  ensemble member index:  4
start year:  2  end year:  5
year index:  0  ensemble member index:  2  lag index:  0
Appending to: year index:  0  ensemble member index:  8
start year:  2  end year:  5
year index:  0  ensemble member index:  3  lag index:  0
Appending to: year index:  0  ensemble member index:  12
start year:  2  end year:  5
year index:  0  ensemble member index:  4  lag index:  0
Appending to: year index:  0  ensemble member index:  16
start year:  2  end year:  5
year index:  0  ensemble member index:  5  lag index:  0
Appending to: year inde