In [1]:
import os

import numpy as np
import xarray as xr

from tqdm import tqdm

In this notebook, we read the ERA5 data stored on `/glade` and save the necessary variables in a more accessible manner locally. Specifically, we will read $u$, $v$, $w$, $T$, and $Z$ (where $Z$ is the geopotential), as well as the coordinate variables $\vartheta$ and $p$. Seviour et al. (2012) suggest that six-hourly resolution is necessary to capture the diurnal variability of the residual upwelling. Fortunately, there is an ERA5 dataset on disk ([633.1](https://rda.ucar.edu/datasets/ds633.1/)) of monthly mean data derived from the original six-hourly resolution. To reduce the size of the data we store locally, we will only store DJF and JJA averages.

We begin by setting the appropriate directory and enumerating the available years.

In [2]:
era_dir = '/gpfs/fs1/collections/rda/data/ds633.1/e5.moda.an.pl'
years = sorted(os.listdir(era_dir))
n_year = len(years)

print(f'Found {n_year} years of ERA5 data, starting with {years[0]} and ending with {years[-1]}.')

Found 43 years of ERA5 data, starting with 1979 and ending with 2021.


Now, we create arrays of the appropriate shape to hold the seasonal averages for each variable of interest. We also define indices that select the right months for each season.

In [8]:
n_lev, n_lat, n_lon = 37, 721, 1440
# shape = (n_year, n_lev, n_lat, n_lon)
shape = (n_lev, n_lat, n_lon)

names = ['u', 'v', 'w', 'T', 'Z']
data_djf = {name : np.zeros(shape) for name in names}
data_jja = {name : np.zeros(shape) for name in names}

idx_djf = np.array([11, 0, 1])
idx_jja = np.array([5, 6, 7])

We are now ready to step through the years, selecting the appropriate file for each variable and computing the seasonal averages we want.

In [9]:
for year in tqdm(years, unit='year'):
    year_dir = f'{era_dir}/{year}'
    fnames = [x for x in os.listdir(year_dir) if x.endswith('.nc')]
    
    for name in names:
        fname = [x for x in fnames if f'_{name.lower()}.' in x][0]
        with xr.open_dataset(f'{year_dir}/{fname}') as f:
            data_djf[name] += f[name.upper()][idx_djf].mean('time')
            data_jja[name] += f[name.upper()][idx_jja].mean('time')
            
for name in names:
    data_djf[name] = data_djf[name] / n_year
    data_jja[name] = data_jja[name] / n_year

100%|██████████| 43/43 [26:14<00:00, 36.63s/year]


Let's save these arrays to the disk.

In [10]:
np.savez('data/era-djf.npz', **data_djf)
np.savez('data/era-jja.npz', **data_jja)

Finally, we also want the pressure and latitude grids saved. We read and save them below.

In [11]:
with xr.open_dataset(f'{year_dir}/{fname}') as f:
    p = f['level'].values
    lat = f['latitude'].values
    
np.savez('data/era-coords.npz', p=p, lat=lat)