# Preprocess

## Merge

In this notebook, it is shown how all seasonal forecasts are loaded into one xarray dataset. For the Siberian heatwave, we have retrieved 105 files (one for each of the 35 years and for each of the three lead times, ([see Retrieve](../1.Download/1.Retrieve.ipynb)). For the UK, we are able to use more forecasts, because the target month is shorter: one month as compared to three months for the Siberian example. We retrieved 5 leadtimes x 35 = 125 files.  

Each netcdf file contains 25 ensemble members, hence has the dimensions lat, lon, number (25 ensembles). Here we create an xarray dataset that also contains the dimensions time (35 years) and leadtime (5 initialization months). To generate this, we loop over lead times, and open all 35 years of the lead time and then concatenate those leadtimes.

In [1]:
##This is so variables get printed within jupyter
from IPython.core.interactiveshell import InteractiveShell 
InteractiveShell.ast_node_interactivity = "all"

In [2]:
import os
import sys
sys.path.insert(0, os.path.abspath('../../../'))
import src.cdsretrieve as retrieve

In [3]:
os.chdir(os.path.abspath('../../../'))
os.getcwd() #print the working directory

'/lustre/soge1/projects/ls/personal/timo/UNSEEN-open'

In [4]:
import xarray as xr
import numpy as np

def merge_SEAS5(folder, target_months):
    init_months, leadtimes = retrieve._get_init_months(target_months)
    print('Lead time: ' + "%.2i" % init_months[0])
    SEAS5_ld1 = xr.open_mfdataset(
        folder + '*' + "%.2i" % init_months[0] + '.nc',
        combine='by_coords')  # Load the first lead time
    SEAS5 = SEAS5_ld1  # Create the xarray dataset to concatenate over
    for init_month in init_months[1:len(init_months)]:  ## Remove the first that we already have
        print(init_month)
        SEAS5_ld = xr.open_mfdataset(
            folder + '*' + "%.2i" % init_month + '.nc',
            combine='by_coords') 
        SEAS5 = xr.concat([SEAS5, SEAS5_ld], dim='leadtime')
    SEAS5 = SEAS5.assign_coords(leadtime = np.arange(len(init_months)) + 2) # assign leadtime coordinates
    return(SEAS5)

In [5]:
SEAS5_Siberia = merge_SEAS5(folder = '../Siberia_example/SEAS5/', target_months = [3,4,5])

Lead time: 02
1
12


In [6]:
SEAS5_Siberia

Unnamed: 0,Array,Chunk
Bytes,170.48 MB,1.62 MB
Shape,"(3, 105, 25, 41, 132)","(1, 3, 25, 41, 132)"
Count,595 Tasks,105 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 170.48 MB 1.62 MB Shape (3, 105, 25, 41, 132) (1, 3, 25, 41, 132) Count 595 Tasks 105 Chunks Type float32 numpy.ndarray",105  3  132  41  25,

Unnamed: 0,Array,Chunk
Bytes,170.48 MB,1.62 MB
Shape,"(3, 105, 25, 41, 132)","(1, 3, 25, 41, 132)"
Count,595 Tasks,105 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,170.48 MB,1.62 MB
Shape,"(3, 105, 25, 41, 132)","(1, 3, 25, 41, 132)"
Count,595 Tasks,105 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 170.48 MB 1.62 MB Shape (3, 105, 25, 41, 132) (1, 3, 25, 41, 132) Count 595 Tasks 105 Chunks Type float32 numpy.ndarray",105  3  132  41  25,

Unnamed: 0,Array,Chunk
Bytes,170.48 MB,1.62 MB
Shape,"(3, 105, 25, 41, 132)","(1, 3, 25, 41, 132)"
Count,595 Tasks,105 Chunks
Type,float32,numpy.ndarray


You can for example select a the lat, long, time, ensemble member and lead time as follows (add `.load()` to see the values):

In [8]:
SEAS5_Siberia.sel(latitude = 60, longitude = -10, time = '2000-03', number = 24, leadtime = 3).load()

In [10]:
SEAS5_UK = merge_SEAS5(folder = '../UK_example/SEAS5/', target_months = [2])

Lead time: 01
12
11
10
9


The SEAS5 total precipitation rate is in m/s. You can easily convert this and change the attributes.
Click on the show/hide attributes button to see the assigned attributes.

In [12]:
SEAS5_UK['tprate'] = SEAS5_UK['tprate'] * 1000 * 3600 * 24 ## From m/s to mm/d
SEAS5_UK['tprate'].attrs = {'long_name': 'rainfall',
 'units': 'mm/day',
 'standard_name': 'thickness_of_rainfall_amount'}
SEAS5_UK

Unnamed: 0,Array,Chunk
Bytes,2.69 MB,15.40 kB
Shape,"(5, 35, 25, 11, 14)","(1, 1, 25, 11, 14)"
Count,1715 Tasks,175 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.69 MB 15.40 kB Shape (5, 35, 25, 11, 14) (1, 1, 25, 11, 14) Count 1715 Tasks 175 Chunks Type float32 numpy.ndarray",35  5  14  11  25,

Unnamed: 0,Array,Chunk
Bytes,2.69 MB,15.40 kB
Shape,"(5, 35, 25, 11, 14)","(1, 1, 25, 11, 14)"
Count,1715 Tasks,175 Chunks
Type,float32,numpy.ndarray
