(C) Copyright 1996- ECMWF.
This software is licensed under the terms of the Apache Licence Version 2.0
which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
In applying this licence, ECMWF does not waive the privileges and immunities
granted to it by virtue of its status as an intergovernmental organisation
nor does it submit to any jurisdiction.

# Table of Contents
1. [Introduction](#introduction)
2. [Environment](#environment)
    1. [Imports](#imports)
    2. [User-defined inputs](#inputs)
3. [Data Analysis](#analysis)
    1. [Moisture/water-related variables](#moisture)
        1. [Calculate Anomalies](#anomalies)
        2. [EOF Analysis](#eof)
    2. [Observational data - Overlap](#eobs)
        1. [Preprocessing](#preprocessing-overlap)
        2. [Temporal Overlap](#overlap)
    3. [Observational Data - Connecting EPEs to Large-Scale Atmoshperic Flow Patterns](#extremes-to-patterns)
        1. [Preprocessing](#preprocessing-connections) 
        2. [Auxiliary Functions](#auxiliary)
        3. [Quantifying the Connections](#quantifying-connections)

# Additional data analysis for the work presented in the paper: <a name="introduction"></a>
### [Extreme precipitation events in the Mediterranean: Spatiotemporal characteristics and connection to large-scale atmospheric flow patterns](https://rmets.onlinelibrary.wiley.com/doi/10.1002/joc.6985)

---
Author: Nikolaos Mastrantonas\
Email: nikolaos.mastrantonas@ecmwf.int; nikolaos.mastrantonas@doktorand.tu-freiberg.de

---
The additional analysis is based on the reviewers's comments about:
1. Why no moisture/water-related variables were used.
2. What would the results regarding connection of extremes to large-scale patterns be, if observational data were used for precipitation instead of ERA5.

# Environment<a name="environment"></a>
Load the required packages and get the user-defined inputs.

The analysis was done in a Linux machine with 8 CPUs and 32 GB RAM. The total duration was about 1 hour.

## Imports<a name="imports"></a>

Import the required packages (full package or specific functions).

In [1]:
import multiprocessing # parallel processing
import tqdm # timing
import sys
from datetime import datetime # timing
from pathlib import Path # creation of dictionaries
import warnings # for suppressing RuntimeWarning

# basic libraries for data analysis
import numpy as np 
import pandas as pd 
import xarray as xr

# specialized libraries
import metview as mv # the metview package is needed for calculating the equivalent potential temperature 
from eofs.xarray import Eof # EOF analysis
from scipy.stats import binom # binomial distribution for significance testing of extremes and large-scale patterns

## User-defined inputs <a name="inputs"></a>

Define the dictionary with the input data.

In [2]:
dir_loc = '' # the main folder where the input data are stored

Define the inputs related to EOF analysis of the Atmospheric Variables.

In [3]:
variables_used = ['ThetaE850', 'WVF', 'SLP', 'T850', 'Z500', 'Q850'] # variables used for the EOF analysis

Area_used =  [50, -11, 26, 41] # define extend of area of interest (The Med. domain) (One of the regions at Script2)

Var_ex = 90 # define the minimum total variance that the subset of kept EOFs should explain (same as Script2)

Define the inputs related to comparison of ERA5 and EOBS data.

The EOBS data were downloaded from https://surfobs.climate.copernicus.eu/dataaccess/access_eobs.php#datafiles at the 0.10 degrees resolution (file name: "rr_ens_mean_0.1deg_reg_v21.0e").

The dataset was initally processed with cdo tooldbox for having the exact same grid cell coordinates and grid resolution with the used ERA5 precipitation data. The 0.25 degrees resolution product of the EOBS data was not selected, because the grid is shifted compared to ERA5. Thus, since remapping is needed, using finer resolution gives better results.

The cdo preprocessing was done with a small txt file named "mygrid.txt" with the new grid information in the following lines:\
gridtype = lonlat\
xsize = 185\
ysize = 73\
xfirst = -8\
xinc = 0.25\
yfirst = 29\
yinc = 0.25

The linux command within cdo for remaping the data is: **cdo remapcon,mygrid.txt rr_ens_mean_0.1deg_reg_v21.0e EOBS_Med.nc**. The above command is executable once we are at the folder of the input data, otherwise the relative location of the files should be included as well.

**The *Remapcon* was selected, because for precipitation data the first-order convervative regridding method is recommended.**

More infomation available at https://code.mpimet.mpg.de/boards/2/topics/296

In [4]:
EOBS_file_name = 'Data/EOBS_Med.nc' # the name of the file of the EOBS precipitation data
ERA5_file_name = 'Data/D1_Total_Precipitation.grb' # the name of the grb file of the precipitation data

P_used = [95, 97, 99] # define the percentile(s) of interest (same as Script2)

Define the cluster daily allocation data for analysing the connections based on EOBS dataset.

In [5]:
Combination_used = 'Med_SLP~Z500' # should be one of the available sets saved from Script2
Clusters_used = 9 # should be one of the available sets saved from Script2

# Data Analysis <a name="analysis"></a>

In [6]:
InitializationTime = datetime.now()

## Moisture/Water-related Variables<a name="moisture"></a>

### Calculate Anomalies<a name="anomalies"></a>

In [7]:
def anomalies(variable):
    
    # read actual daily values as xarray dataarray object
    if variable in ['SLP', 'T850', 'Z500', 'Q850']: # directly read data as xarray
        file_path = dir_loc + 'Data/D1_Mean_'+variable+'.grb'
        Daily = xr.open_dataarray(file_path, engine='cfgrib') # read data
    elif variable=='ThetaE850': # calcualte eq. pot. temperated with metview package and then convert to xarray
        Q850 = mv.read(dir_loc+ 'Data/D1_Mean_Q850.grb') # specific humidity
        T850 = mv.read(dir_loc+ 'Data/D1_Mean_T850.grb') # temperature
        ThetaE850 = mv.eqpott_p(temperature=T850, humidity=Q850) # use mv package to calculate equiv. pot. temp.
        Daily = ThetaE850.to_dataset() # convert from metview Fieldset to xarray Dataset
        Daily = Daily.to_array()[0] # convet to DataArray
    elif variable=='WVF': # read the east and north componenet of WVF and calculate the total magnitude per grid cell
        WVFeast = xr.open_dataarray(dir_loc+ 'Data/D1_Mean_WVFeast.grb', engine='cfgrib')
        WVFnorth = xr.open_dataarray(dir_loc+ 'Data/D1_Mean_WVFnorth.grb', engine='cfgrib')
        Daily = np.sqrt(WVFeast**2 + WVFnorth**2) # total magnitude of the WVF
        
    # subset area of interest 
    Daily = Daily.sel(latitude=slice(Area_used[0], Area_used[2]), longitude=slice(Area_used[1], Area_used[3]))    
    
    actual_days = Daily.time.values # get actual timesteps
    dates_grouped = pd.to_datetime(Daily.time.values).strftime('%m%d') # get Month-Day of each timestep
    
    # 5-day smoothed climatology. Rolling can be applied directly because the daily data refer to consequtive days. If
    # days are not consecutive, firstly the xr.resample should be applied, so that missing days are generated with NaN
    Smoothed = Daily.rolling(time=5, center=True, min_periods=1).mean() # 5-day smoothing
    
    Daily = Daily.assign_coords({'time': dates_grouped}) # change the time to Month-Day
    Smoothed = Smoothed.assign_coords({'time': dates_grouped}) # change the time to Month-Day
    
    Climatology = Smoothed.groupby('time').mean() # climatology of the smoothed data
    
    Anomalies = Daily.groupby('time') - Climatology
    Anomalies = Anomalies.assign_coords({'time': actual_days}) # change back to the original timestep information
    
    return Anomalies

In [8]:
pool = multiprocessing.Pool() # object for multiprocessing
Anomalies = list(tqdm.tqdm(pool.imap(anomalies, variables_used), total=len(variables_used), position=0, leave=True))
pool.close()

Anomalies = {variables_used[i_c]: i_anom for i_c, i_anom in enumerate(Anomalies)}

del(pool)

100%|██████████| 6/6 [04:47<00:00, 47.92s/it] 


### EOF Analysis<a name="eof"></a>

In [9]:
def eof_analysis(variable):
    
    dataset_used = Anomalies[variable] # variable to be used for the analysis
    
    coslats = np.cos(np.deg2rad(dataset_used.latitude.values)).clip(0, 1) # coslat for weights on EOF
    wgts = np.sqrt(coslats)[..., np.newaxis] # calculation of weights
    solver = Eof(dataset_used, weights=wgts) # EOF analysis of the subset
    
    N_eofs = int(np.searchsorted(np.cumsum(solver.varianceFraction().values), Var_ex/100)) # number of EOFs needed
    N_eofs += 1 # add 1 since python does not include the last index of a range
    
    EOFS = solver.eofs(neofs=N_eofs)
    VARS = solver.varianceFraction(neigs=N_eofs).values*100
    
    return {'EOFS': EOFS, 'VARS': VARS}

In [10]:
pool = multiprocessing.Pool() # object for multiprocessing
EOFS = list(tqdm.tqdm(pool.imap(eof_analysis, variables_used), total=len(variables_used), position=0, leave=True))
pool.close()

EOFS = {variables_used[i_c]: i_eof for i_c, i_eof in enumerate(EOFS)}
del(pool)

100%|██████████| 6/6 [00:21<00:00,  3.59s/it]


In [11]:
for i_var in variables_used:
    print('{} EOFs needed for explaining at least {}% of the total variance for the {} daily anomalies.'.\
          format(len(EOFS[i_var]['VARS']), Var_ex, i_var))
del(i_var)

42 EOFs needed for explaining at least 90% of the total variance for the ThetaE850 daily anomalies.
49 EOFs needed for explaining at least 90% of the total variance for the WVF daily anomalies.
6 EOFs needed for explaining at least 90% of the total variance for the SLP daily anomalies.
12 EOFs needed for explaining at least 90% of the total variance for the T850 daily anomalies.
7 EOFs needed for explaining at least 90% of the total variance for the Z500 daily anomalies.
115 EOFs needed for explaining at least 90% of the total variance for the Q850 daily anomalies.


Note that the number of EOFs needed for the water/moisture-related variables (Q850, ThetaE850, WVF) is substantially higher compared to the other three variables (SLP, T850, Z500). This is because of the large spatial domain of the analysis. This result suggests that using the water/moisture-related variables at the K-means clustering for defining weather regimes would increase the level of complexity, without necessarily bringing significant improvements on the connection of extreme precipitation events to large-scale patterns. For smaller domains (e.g. country-wise, regional ones), the inclusion of such variables would be useful.

## Observational Data - Overlap<a name="eobs"></a>

### Preprocessing<a name="preprocessing-overlap"></a>

In [12]:
P_used = sorted(list(np.array(P_used).flatten()))[::-1] # sort P_used & make list for consistency and avoiding errors

In [13]:
# read ERA5 and EOBS data and do some basic preprocessing for having both sets in same format
ERA5 = xr.open_dataarray(dir_loc + ERA5_file_name, engine='cfgrib') # read data
ERA5 = ERA5.drop(['valid_time', 'step', 'surface', 'number']) # drop not-used coordinates
ERA5 = ERA5.assign_coords({'time': pd.to_datetime(ERA5.time.values).strftime('%Y%m%d')}) # change time to str

EOBS = xr.open_dataarray(dir_loc + EOBS_file_name) # read data
EOBS = EOBS.rename({'lat': 'latitude', 'lon': 'longitude'}) # rename for same name as ERA5
EOBS = EOBS.reindex(latitude=EOBS.latitude[::-1]) # reverse order for same as ERA5
EOBS = EOBS.assign_coords({'time': pd.to_datetime(EOBS.time.values).strftime('%Y%m%d')}) # change time to str
EOBS = EOBS.sel(time=ERA5.time.values) # keep only the dates available in ERA5

# calculate percentage of NaN per grid cell
NANs_EOBS = np.isnan(EOBS).sum(dim='time')
NANs_EOBS = NANs_EOBS/len(EOBS)*100
NANs_EOBS = NANs_EOBS<5 # keep only locations that have less than 5% missing data

In [14]:
def dates_extreme(dataset, quantile):
    
    ' Get the dates over user-defined percentile '
    
    with warnings.catch_warnings(): # if all are NaN then it gives Runtimewarning, which is now suppressed
        warnings.simplefilter('ignore', category=RuntimeWarning)
        Q_thres = dataset.quantile(quantile, interpolation='linear', dim='time', keep_attrs=True) # get the threshold
    
    dataset_df = dataset.values.flatten() # keep only the values as numpy array
    dataset_df = pd.DataFrame(np.reshape(dataset_df, (len(dataset), -1)), index=dataset.time.values) # convert to DF

    QuantExceed = dataset_df > Q_thres.values.flatten() # Boolean over /under-up to Q threshold
    DaysExceed = QuantExceed.apply(lambda x: list(QuantExceed.index[np.where(x == 1)[0]]), axis=0) # exceedance days
    
    return DaysExceed

In [15]:
def days_extend(actual_dates, days_offset):
    
    '''
    Get a list with dates within a temporal window centered over "actual_dates" and extending "days_offset" before
    and after the "actual_dates"
    '''
    
    if type(actual_dates) == str: actual_dates = [actual_dates] # if only 1 value, then convert to list
        
    dates_dt = pd.to_datetime(actual_dates) # convert to datetime objects from string
    all_dates = [pd.date_range(i_date - pd.DateOffset(days=days_offset), i_date + pd.DateOffset(days=days_offset) )
                 for i_date in dates_dt]
    all_dates = [j.strftime('%Y%m%d') for i in all_dates for j in i] # single list and convert to string
    all_dates = set(all_dates)
    
    return all_dates

### Temporal Overlap<a name="overlap"></a>

In [16]:
def calculate_overlap(input_data):
    
    ' Get the temporal overlap of EPEs between the ERA5 and EOBS datasets for user-defined percentile'
    
    P_ERA5 = input_data[0] # percentile for ERA5 for defining the days of EPEs
    P_EOBS = input_data[1] # percentile for EOBS for defining the days of EPEs
    offset = input_data[2] # temporal window for allowing flexibility in overlap between the EPEs 

    ERA5_Q = dates_extreme(dataset=ERA5, quantile=P_ERA5/100) # get the days of EPEs from ERA5
    EOBS_Q = dates_extreme(dataset=EOBS, quantile=P_EOBS/100) # get the days of EPEs from EOBS
    
    Common = ERA5[0] # generate xarray object for storing the overlap results
    
    # Calculate the overlap for each grid cell, with or without considering temporal flexibility at the EOBS results.
    # Check if EOBS have at least 1 day (len(j)!=0) because EOBS has no data over the sea.
    if offset == 0: # if no flexibility window, for time efficiency do not use the "days_extend" function
        common_percent = [len(set(i) & set(j))/len(i)*100 if len(j)!=0 else np.nan 
                          for i, j in zip(ERA5_Q, EOBS_Q)]
    else:
        common_percent = [len(set(i) & days_extend(j, offset))/len(i)*100 if len(j)!=0 else np.nan 
                          for i, j in zip(ERA5_Q, EOBS_Q)]
        
    Common.values = np.reshape(common_percent, Common.shape) # assign the overlap values to the final dataset
    Common = Common.assign_coords({'time': str(input_data)}) # assign the coordinate value based on the input data
    Common = Common.rename({'time': 'Input_comb'}) # rename coordinate

    return Common

Create the list with the combinations checked for the temporal overlap between EOBS and ERA5. EOBS data are not provided at UTC, rather they are based on the "day" as used by the measuring authorities of the different countries/regions. Thus, there can be a +-1 day shift of the main hours of a precipitation event between ERA5 and EOBS. Findings about such temporal shift of the EOBS dataset are identified by previous researches, e.g. *Turco et al, 2013*: https://doi.org/10.5194/nhess-13-1457-2013. For this reason the temporal overlap is analysed considering a 1-day offset (3-days daily window centered at each day identified by the EOBS data). Moroever, the overlap is checked for each studied percentile of ERA5, and the same percentile of EOBS, as well as 2/100 lower percentile, to check the overlap when there is a flexibility in the intensity.

In [17]:
Combs = [[(i, i, 1), (i, i-2, 1)] for i in P_used] # input data for temporal overlap (P_ERA5, P_EOBS, offset)
Combs = [j for i in Combs for j in i] # drop the internal lists

In [18]:
pool = multiprocessing.Pool() # object for multiprocessing
Overlaps = list(tqdm.tqdm(pool.imap(calculate_overlap, Combs), total=len(Combs), position=0, leave=True))
pool.close()
Overlaps = xr.concat(Overlaps, dim='Input_comb') # concatenate to a single xarray dataarray

del(pool)

Overlaps = Overlaps.where(NANs_EOBS) # mask for keeping locations with less than 5% missing data
Overlaps.to_netcdf(dir_loc+'DataForPlots/Overlap_ERA5_EOBS.nc') # save data

100%|██████████| 6/6 [50:42<00:00, 507.11s/it]  


## Observational Data - Connecting EPEs to Large-Scale Atmoshperic Flow Patterns <a name="extremes-to-patterns"></a>

### Preprocessing<a name="preprocessing-connections"></a>

In [19]:
file_name = dir_loc + 'DataForPlots/Clusters_'+Combination_used+'.csv'
Clustering = pd.read_csv(file_name, index_col=0)
Clustering.index = pd.to_datetime(Clustering.index).strftime('%Y%m%d')

In [20]:
# calculate thresholds per location and percentile
with warnings.catch_warnings(): # if all are NaN then it gives Runtimewarning, which is now suppressed
    warnings.simplefilter('ignore', category=RuntimeWarning)
    Quant = EOBS.quantile(np.array(P_used)/100, interpolation='linear', dim='time', keep_attrs=True) # thresholds

Quant = Quant.rename({'quantile': 'percentile'}) # rename coordinate
Quant = Quant.assign_coords({'percentile': P_used}) # assign the dim values based on lags

# boolean xarray for identifying if an event is over the threshold
Exceed_xr = [(EOBS>Quant.sel(percentile=i_p))*1 for i_p in P_used] 
Exceed_xr = xr.concat(Exceed_xr, dim=pd.Index(P_used, name='percentile')) # concatenate data for all percentiles

### Auxiliary functions <a name="auxiliary"></a>

In [21]:
# cumulative distribution of binomial for statistical significance testing
def binom_test(occurrences, propabilities):
    return binom.cdf(k=occurrences-1, n=occurrences.sum(), p=propabilities)

In [22]:
def transition_matrix(data, lead=1):
    
    '''     
    Function for calculating the transition matrix M of an item (list/numpy/pandas Series/pandas single column DF),
    where M[i][j] is the probablity of transitioning from state i to state j.
    Basic code taken from stackoverflow:
    https://stackoverflow.com/questions/46657221/generating-markov-transition-matrix-in-python
    
    NOTE!: Data should not have NaN values, otherwise code crushes!
    
    :param data : input data: one dimensional vector with elements of same type (e.g. all str, or all float, etc)
    :param lead : lead time for checking the transition (default=1)
    :return     : transition matrix as pandas DataFrame
    '''
    
    if type(data) == pd.core.frame.DataFrame:
        data_used = list(data.values.flatten())
    else:
        data_used = data

    unique_states = sorted(set(data_used)) # get the names of the unique states and sort them
    
    dict_sequencial = {val: i for i, val in enumerate(unique_states)} # sequencial numbering of states
    
    transitions_numbered = pd.Series(data_used).map(dict_sequencial) # map the data to sequencial order
    transitions_numbered = transitions_numbered.values # get only the actual values of the Series
    
    n = len(unique_states) # number of unique states

    M = [[0]*n for _ in range(n)] # transition matrix

    for (i,j) in zip(transitions_numbered,transitions_numbered[lead:]): # the total times of the transition M[i][j]
        M[i][j] += 1

    # now convert to probabilities:
    for row in M:
        s = sum(row)
        if s > 0:
            row[:] = [f/s for f in row]
    
    M = pd.DataFrame(M, columns=unique_states, index=unique_states) # convert to DF and name columns/rows as per data
    
    return M

In [23]:
def statistics_clusters(n_clusters):
    
    ' Calculate statistics of occurences and limits of climatological frequencies for each cluster '
    
    Data = Clustering['Clusters_'+str(n_clusters)] # cluster pd.Series with cluster label for each day
    
    # days per cluster, and statistics of total occurrences
    Totals = Data.value_counts() # days per cluster (use all the daily data available at the clustering results)
    Totals = pd.DataFrame(Totals.reindex(range(n_clusters))) # sort the data per cluster order
    Totals.rename(columns={'Clusters_'+str(n_clusters): 'Occurrences'}, inplace=True) # rename column
    
    # persistence, climatological frequencies, and effective size due to persistence
    transitions = transition_matrix(Data) # next-day transition probs matrix
    Totals['Persistence'] = np.diag(transitions) # self-transition probability
    total_days = len(Data) # total days used for clustering
    Totals['Percent'] = Totals['Occurrences']/total_days # climatological frequencies
    Totals['N_ef'] = total_days*(1-Totals['Persistence'])/(1+Totals['Persistence']) # effective length

    # 95% CI of climatological frequencies: use normal approximation to Binomial distr. considering effective length 
    Totals['Perc_Upper'] = Totals['Percent']+1.96*np.sqrt(Totals['Percent']*(1-Totals['Percent'])/Totals['N_ef'])
    Totals['Perc_Lower'] = Totals['Percent']-1.96*np.sqrt(Totals['Percent']*(1-Totals['Percent'])/Totals['N_ef'])

    # Precipitation data do not include 1st Jan 1979, so use the Precipitation dates for accurate results
    dates_all = EOBS.time.values
    subset_totals = Data.loc[dates_all].value_counts() # days per cluster for the dates available in Precip. data
    Totals['Subset_Occurrences'] = subset_totals.reindex(range(n_clusters)) # sort the data per cluster order
    Totals['Occur_Max'] = np.ceil(Totals['Perc_Upper']*len(dates_all)) # ceiling to get the next integer
    
    return (Totals, Data)

### Quantifying the connections <a name="quantifying-connections"></a>

In [24]:
def extremes_to_clusters(n_clusters):
    
    ' Calculate connection of extremes to patterns; % of events per cluster, condit. prob. and stat. sign. '
    ' inputa data: number of clusters used'
    
    Totals, Data = statistics_clusters(n_clusters) # get statistics of clusters and daily attributions of labels
    
    ExceedCounts = Exceed_xr.copy(deep=True)
    ExceedCounts = ExceedCounts.assign_coords({'time': Data.loc[Exceed_xr.time.values].values}) # change to cluster id
    ExceedCounts = ExceedCounts.rename({'time': 'cluster'}) # rename the coordinate
    ExceedCounts = ExceedCounts.groupby('cluster').sum() # find total extremes at each cell allocated per cluster
    
    RatioCluster = ExceedCounts.transpose(..., 'cluster')/Totals['Subset_Occurrences'].values*100 # conditional prob.
    RatioClusterMax = ExceedCounts.transpose(..., 'cluster')/Totals['Occur_Max'].values*100 # cond. prob. of 95% freq.
    Exceed_Perc = ExceedCounts/ExceedCounts.sum(dim=['cluster'])*100 # percent of extremes per cluster
    
    "check statistical significance of occurrences based on binomial distribution for 95% Confidence Interval"
    # perform the analysis for the Upper tail and use the Upper 95% CI for the cluster probability
    Binom_Cum_Upper = ExceedCounts.copy(deep=True) # new xr (SOS: deep=True otherwise the data are overwritten later)
    Binom_Cum_Upper = Binom_Cum_Upper.astype(float) # convert to float from int
    Binom_Cum_Upper = Binom_Cum_Upper.transpose('cluster', ...)
    Counts_np = Binom_Cum_Upper.values.copy() # numpy of values for applying the function below
    Binom_Cum_Upper_np = np.apply_along_axis(binom_test, propabilities=Totals['Perc_Upper'],  axis=0, arr=Counts_np)
    Binom_Cum_Upper[:] = Binom_Cum_Upper_np # pass the results to the xr

    # perform the analysis for the Lower tail and use the Lower 95% CI for the cluster propability
    Binom_Cum_Lower = Binom_Cum_Upper.copy(deep=True)
    Binom_Cum_Lower_np = np.apply_along_axis(binom_test, propabilities=Totals['Perc_Lower'],  axis=0, arr=Counts_np)
    Binom_Cum_Lower[:] = Binom_Cum_Lower_np

    Sign = (Binom_Cum_Upper > .975)*1 + (Binom_Cum_Lower < .025)*(-1) # assign boolean for statistical significance

    # final object with counts, percentages, and statistical significance
    All_data = [ExceedCounts, Exceed_Perc, RatioCluster, RatioClusterMax, Sign]
    Coord_name = ['Counts', 'PercExtremes', 'CondProb', 'CondProbUpperLimit', 'Significance']
    Coord_name = pd.Index(Coord_name, name='indicator')
    Final = xr.concat(All_data, dim=Coord_name)
    
    return Final

In [25]:
ExtremesClusters = extremes_to_clusters(n_clusters=Clusters_used)
ExtremesClusters = ExtremesClusters.where(NANs_EOBS) # mask for keeping locations with less than 5% missing data
ExtremesClusters.to_netcdf(dir_loc+'DataForPlots/ClusteringStats_Med_SLP~Z500_Clusters9_EOBS.nc') # save data

In [26]:
print('Total Analysis completed in:', datetime.now() - InitializationTime, ' HR:MN:SC.')
del(InitializationTime)

Total Analysis completed in: 0:56:27.540382  HR:MN:SC.
