# UNSEEN-open

In this project, the aim is to build an open, reproducible, and transferable workflow for UNSEEN.
<!-- -- an increasingly popular method that exploits seasonal prediction systems to assess and anticipate climate extremes beyond the observed record. The approach uses pooled forecasts as plausible alternate realities. Instead of the 'single realization' of reality, pooled forecasts can be exploited to better assess the likelihood of infrequent events.  -->
The workflow consists of four steps, as illustrated below:

![title](../../graphs/Workflow.png)

In this project, UNSEEN-open is applied to assess two extreme events in 2020: February 2020 UK precipitation and the 2020 Siberian heatwave. 

February average precipitation was the highest on record in the UK: with what frequency of occurrence can February extreme precipitation events such as the 2020 event be expected?

The Siberian heatwave has broken the records as well. Could such an event be anticipation with UNSEEN? And to what extend can we expect changes in the frequency of occurrence and magnitude of these kind of events?

## Overview

Here we provide an overview of the steps taken to apply UNSEEN-open.

### Retrieve

The main functions to retrieve all forecasts (SEAS5) and reanalysis (ERA5) are `retrieve_SEAS5` and `retrieve_ERA5`.  For more explanation, see [retrieve](1.Download/1.Retrieve.ipynb).

In [1]:
import os
import sys
sys.path.insert(0, os.path.abspath('../../'))
os.chdir(os.path.abspath('../../'))

import src.cdsretrieve as retrieve
import src.preprocess as preprocess

import numpy as np
import xarray as xr

In [2]:
retrieve.retrieve_SEAS5(
    variables=['2m_temperature', '2m_dewpoint_temperature'],
    target_months=[3, 4, 5],
    area=[70, -11, 30, 120],
    years=np.arange(1981, 2021),
    folder='../Siberia_example/SEAS5/')

In [3]:
retrieve.retrieve_ERA5(variables=['2m_temperature', '2m_dewpoint_temperature'],
                       target_months=[3, 4, 5],
                       area=[70, -11, 30, 120],
                       folder='../Siberia_example/ERA5/')

### Preprocess

In the preprocessing step, we first merge all downloaded files into one netcdf file.
Then the rest of the preprocessing depends on the definition of the extreme event. For example, for the UK case study, we want to extract the UK average precipitation while for the Siberian heatwave we will just used the defined area to spatially average over. For the MAM season, we still need to take the seasonal average, while for the UK we already have the average February precipitation. 

Read the docs on [preprocessing](2.Preprocess/2.Preprocess.ipynb) for more info. 

The merged xarray dataset looks like this:

In [4]:
SEAS5_Siberia = preprocess.merge_SEAS5(folder = '../Siberia_example/SEAS5/', target_months = [3,4,5])
SEAS5_Siberia

Lead time: 02
1
12


Unnamed: 0,Array,Chunk
Bytes,387.52 MB,3.31 MB
Shape,"(3, 117, 51, 41, 132)","(1, 3, 51, 41, 132)"
Count,887 Tasks,117 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 387.52 MB 3.31 MB Shape (3, 117, 51, 41, 132) (1, 3, 51, 41, 132) Count 887 Tasks 117 Chunks Type float32 numpy.ndarray",117  3  132  41  51,

Unnamed: 0,Array,Chunk
Bytes,387.52 MB,3.31 MB
Shape,"(3, 117, 51, 41, 132)","(1, 3, 51, 41, 132)"
Count,887 Tasks,117 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,387.52 MB,3.31 MB
Shape,"(3, 117, 51, 41, 132)","(1, 3, 51, 41, 132)"
Count,887 Tasks,117 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 387.52 MB 3.31 MB Shape (3, 117, 51, 41, 132) (1, 3, 51, 41, 132) Count 887 Tasks 117 Chunks Type float32 numpy.ndarray",117  3  132  41  51,

Unnamed: 0,Array,Chunk
Bytes,387.52 MB,3.31 MB
Shape,"(3, 117, 51, 41, 132)","(1, 3, 51, 41, 132)"
Count,887 Tasks,117 Chunks
Type,float32,numpy.ndarray


And for ERA5:

In [5]:
ERA5_Siberia = xr.open_mfdataset('../Siberia_example/ERA5/ERA5_????.nc',combine='by_coords') ## open the data
ERA5_Siberia

Unnamed: 0,Array,Chunk
Bytes,2.73 MB,64.94 kB
Shape,"(126, 41, 132)","(3, 41, 132)"
Count,126 Tasks,42 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.73 MB 64.94 kB Shape (126, 41, 132) (3, 41, 132) Count 126 Tasks 42 Chunks Type float32 numpy.ndarray",132  41  126,

Unnamed: 0,Array,Chunk
Bytes,2.73 MB,64.94 kB
Shape,"(126, 41, 132)","(3, 41, 132)"
Count,126 Tasks,42 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.73 MB,64.94 kB
Shape,"(126, 41, 132)","(3, 41, 132)"
Count,126 Tasks,42 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.73 MB 64.94 kB Shape (126, 41, 132) (3, 41, 132) Count 126 Tasks 42 Chunks Type float32 numpy.ndarray",132  41  126,

Unnamed: 0,Array,Chunk
Bytes,2.73 MB,64.94 kB
Shape,"(126, 41, 132)","(3, 41, 132)"
Count,126 Tasks,42 Chunks
Type,float32,numpy.ndarray


Then we calculate the day-in-month weighted seasonal average: 

In [6]:
SEAS5_Siberia_weighted = preprocess.season_mean(SEAS5_Siberia, years = 39)
ERA5_Siberia_weighted = preprocess.season_mean(ERA5_Siberia, years = 42)

And we select the 2m temperature, and take the average over a further specified domain. This is a simple average, a area-weighed average is more appropriate, since grid cell area decreases with latitude, see [preprocess](2.Preprocess/2.Preprocess.ipynb). 

In [14]:
SEAS5_Siberia_events_zoomed = (
    SEAS5_Siberia_weighted['t2m'].sel(
        latitude=slice(70, 50),  
        longitude=slice(65, 120)).  
    mean(['longitude', 'latitude']))

SEAS5_Siberia_events_zoomed_df = SEAS5_Siberia_events_zoomed.to_dataframe()

  x = np.divide(x1, x2, out)


In [18]:
ERA5_Siberia_events_zoomed = (
    ERA5_Siberia_weighted['t2m'].sel(  # Select 2 metre temperature
        latitude=slice(70, 50),        # Select the latitudes
        longitude=slice(65, 120)).    # Select the longitude
    mean(['longitude', 'latitude']))

ERA5_Siberia_events_zoomed_df = ERA5_Siberia_events_zoomed.to_dataframe()

### Preprocess

In the preprocessing step, we first merge all downloaded files into one xarray dataset, then take the spatial average over the domain and a temporal average over the MAM season.

### Read more
Jump into the respective sections for more detail:

* **Download**
    * [1. Retrieve](1.Download/1.Retrieve.ipynb) 
* **Pre-process**
    * [2.1 Merge](2.Preprocess/2.1Merge.ipynb)  
    * [2.2 Mask](2.Preprocess/2.2Mask.ipynb)
    * [2.3 Upscale](2.Preprocess/2.3Upscale.ipynb)
* **Evaluate**
    * [3. Evaluate](3.Evaluate/3.Evaluate.ipynb)
* **Illustrate**
    