# Example Python workflow for shuffling the climate data with tests to check whether shuffling worked

This notebook shows how you can shuffle the climate data (here for simplicity done on yearly basis). In addition, it gives test files of shuffled climate data and test files of reconverted time series (from the shuffled data) for two glaciers (using nearest gridpoints of these glaciers). 

we choose these glaciers for testing:
- **RGI60-11.00897**: Hintereisferner (lon: 10.758, lat: 46.800)
    - nearest gridpoint from isimip3b: (10.75, 46.75)
- **RGI60-16.02207**: Shallap Glacier (lon: -9.486, lat: -77.334)
    - nearest gridpoint from isimip3b: (-9.25, -77.25)

**Please check in your workflow if your shuffling works by testing if you get the same time series of shuffled and reconverted climate time series for the two glaciers!**

(we only check temperature shuffling with the ssp585 scenario and the ipsl-cm6a-lr gcm)

test files for shuffled climate:
- `test_shuffling/test_RGI60-11.00897_ipsl-cm6a-lr_tasAdjust_ssp585_shuffled.csv`
- `test_shuffling/test_RGI60-16.02207_ipsl-cm6a-lr_tasAdjust_ssp585_shuffled.csv`

test files for reconverted climate time series:
- `test_shuffling/test_RGI60-11.00897_ipsl-cm6a-lr_tasAdjust_ssp585_reconvert_clim.csv`
- `test_shuffling/test_RGI60-16.02207_ipsl-cm6a-lr_tasAdjust_ssp585_reconvert_clim.csv`
---

In [33]:
# import these packages 
import xarray as xr
import numpy as np
import pandas as pd

let's take the *ipsl-cm6a-lr* gcm, the *ssp585* scenario and temperature as an example!

In [34]:
gcm = 'ipsl-cm6a-lr'
scenario = 'ssp585'
typ = 'tasAdjust'
glacier = 'RGI60-11.00897' 
# we also run the workflow for the Shallap glacier
# just run the notebook instead with:
# glacier = 'RGI60-16.02207'

In [42]:
# take the right gridpoint!
if glacier == 'RGI60-11.00897':
    # Hintereisferner
    # nearest gridpoint
    lon, lat = (10.75, 46.75)
elif glacier == 'RGI60-16.02207':
    # Shallap glacier
    # nearest gridpoint
    lon, lat = (-9.25, -77.25)

In [44]:
# get the shuffled year key:
pd_shuffled_yrs = pd.read_csv('shuffled_years_GlacierMIP3.csv', index_col=0)
pd_shuffled_yrs


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,4990,4991,4992,4993,4994,4995,4996,4997,4998,4999
1851-1870,1861,1858,1852,1862,1868,1855,1863,1860,1870,1866,...,1860,1851,1862,1854,1868,1867,1852,1866,1863,1870
1901-1920,1909,1918,1903,1906,1904,1905,1917,1914,1902,1920,...,1904,1906,1911,1910,1903,1905,1907,1913,1902,1919
1951-1970,1957,1959,1969,1963,1970,1964,1961,1952,1960,1965,...,1953,1961,1952,1969,1959,1962,1968,1967,1957,1965
1995-2014,2000,2013,2005,1996,2010,2012,1995,2001,2004,2008,...,2007,2000,2009,2003,1996,2008,2005,2012,1998,1995
2021-2040,2034,2036,2032,2039,2021,2035,2031,2024,2026,2029,...,2037,2034,2036,2033,2039,2024,2027,2026,2040,2030
2041-2060,2049,2045,2054,2053,2043,2059,2052,2051,2041,2044,...,2041,2045,2046,2059,2047,2053,2052,2049,2057,2060
2061-2080,2073,2072,2065,2061,2063,2079,2067,2068,2077,2071,...,2074,2080,2069,2075,2072,2078,2073,2067,2065,2064
2081-2100,2081,2094,2098,2084,2090,2089,2100,2099,2091,2083,...,2091,2095,2092,2085,2083,2088,2081,2093,2090,2082


In [45]:
# template of shuffled climate values (that will be filled afterwards)
rownames = pd_shuffled_yrs.index
pd_empty_clim_template = pd.DataFrame(np.NaN, index=rownames, 
                                      columns=pd_shuffled_yrs.columns)
pd_empty_clim_template

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,4990,4991,4992,4993,4994,4995,4996,4997,4998,4999
1851-1870,,,,,,,,,,,...,,,,,,,,,,
1901-1920,,,,,,,,,,,...,,,,,,,,,,
1951-1970,,,,,,,,,,,...,,,,,,,,,,
1995-2014,,,,,,,,,,,...,,,,,,,,,,
2021-2040,,,,,,,,,,,...,,,,,,,,,,
2041-2060,,,,,,,,,,,...,,,,,,,,,,
2061-2080,,,,,,,,,,,...,,,,,,,,,,
2081-2100,,,,,,,,,,,...,,,,,,,,,,


In [46]:
# open the right climate file
if gcm in ['gfdl-esm4', 'ipsl-cm6a-lr', 'mpi-esm1-2-hr', 'mri-esm2-0']:
    ensemble = 'r1i1p1f1'
elif gcm == 'ukesm1-0-ll':
    ensemble = 'r1i1p1f2'
    
folder_output = 'isimip3b_{}_monthly'.format(typ)

# historical dataset
path_output_tas_hist = 'isimip3b/{}/{}_{}_w5e5_historical_{}_global_monthly_1850_2014.nc'.format(folder_output, gcm,
                                                                                                 ensemble, typ)
ds_tas_monthly_hist = xr.open_dataset(path_output_tas_hist)
# ssp dataset
path_output_tas_ssp = 'isimip3b/{}/{}_{}_w5e5_{}_{}_global_monthly_2015_2100.nc'.format(folder_output, gcm, ensemble,
                                                                                        scenario, typ)
ds_tas_monthly_ssp = xr.open_dataset(path_output_tas_ssp)

In [47]:
# select the nearest grid point and resample it to get yearly data
ds_yearly_hist = ds_tas_monthly_hist.sel(lon=lon, lat=lat).tasAdjust.groupby('time.year').mean()
ds_yearly_ssp = ds_tas_monthly_ssp.sel(lon=lon, lat=lat).tasAdjust.groupby('time.year').mean()
# concat historical with ssp file
ds_yearly_clim = xr.concat([ds_yearly_hist, ds_yearly_ssp], dim='year')

**we first do the shuffling**

In [48]:
pd_shuffle_clim = pd_empty_clim_template.copy()
# get the shuffled climate data for each experiment (time period)
for c in rownames:
    pd_shuffle_clim.loc[c] = ds_yearly_clim.sel(year=pd_shuffled_yrs.loc[c].values).values

# test file to check for your workflow
pd_shuffle_clim.to_csv('test_shuffling/test_{}_{}_{}_{}_shuffled.csv'.format(glacier, gcm, typ, scenario))
pd_shuffle_clim

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,4990,4991,4992,4993,4994,4995,4996,4997,4998,4999
1851-1870,270.509644,271.433563,271.823822,271.572571,271.465179,271.444916,270.749786,270.15094,272.592438,269.981049,...,270.15094,271.651917,271.572571,271.167328,271.465179,270.67691,271.823822,269.981049,270.749786,272.592438
1901-1920,271.286499,270.225555,269.993378,271.005493,270.07077,269.957367,271.125153,270.145782,270.742859,271.132751,...,270.07077,271.005493,272.198883,270.931,269.993378,269.957367,269.597382,270.118652,270.742859,270.868896
1951-1970,271.965332,272.337372,271.553131,271.319458,270.637848,271.860596,270.454407,272.036896,272.004303,272.335754,...,272.488464,270.454407,272.036896,271.553131,272.337372,272.018311,270.914856,272.242859,271.965332,272.335754
1995-2014,272.543823,273.612549,273.208374,272.49234,272.844513,273.874298,272.216888,272.932617,272.202698,272.47644,...,273.768219,272.543823,273.397003,273.649567,272.49234,272.47644,273.208374,273.874298,272.74295,272.216888
2021-2040,273.897675,274.926422,274.084625,274.388916,273.681488,274.552307,272.973633,272.553009,273.564636,272.558868,...,273.897797,273.897675,274.926422,274.730408,274.388916,272.553009,273.274567,273.564636,274.395111,273.728546
2041-2060,275.207062,275.436096,275.777252,274.980591,275.692261,276.291931,275.753815,275.030212,274.439209,276.010468,...,274.439209,275.436096,275.044617,276.291931,275.627991,274.980591,275.753815,275.207062,277.058197,276.563049
2061-2080,278.033691,278.030304,277.347626,277.015839,276.637665,279.757721,276.350616,277.577362,278.801636,278.010162,...,277.826385,279.402985,277.982147,277.372467,278.030304,278.677673,278.033691,276.350616,277.347626,276.505585
2081-2100,278.575348,279.327179,281.086548,278.625946,280.027466,280.427795,280.62796,281.597198,280.355682,279.765137,...,280.355682,280.318756,279.106323,278.926208,279.765137,279.783203,278.575348,280.937592,280.027466,278.055542


**and then reconvert the shuffled time series again into a real time series to check if the shuffling was done the right way**

In [49]:
years = ds_yearly_clim.year.values  # 1850 to 2100

# NaN climate time series that will be filled with the reconverted climate data (from shuffled dataframe)
time_series_clim = pd.DataFrame(np.NaN, index=years, columns=[glacier])

# reconvert to time series
for y in years:
    clim = pd_shuffle_clim[pd_shuffled_yrs == y].stack().mean()
    time_series_clim.loc[y] = clim
    
    # check if values from the same year are equal! -> std = 0
    # not all years are inside of the shuffled time series, so there are some NaNs
    if ~np.isnan(clim):
        np.testing.assert_allclose(pd_shuffle_clim[pd_shuffled_yrs == y].stack().std(), 0, atol=1e-12)


**check if the climate time series reconverted from the shuffled data is equal to the original yearly time series**


In [50]:
# original data without NaNs
original = ds_yearly_clim.sel(year=time_series_clim.dropna().index)

# reconverted data from the shuffled time series without NaNs
reconvert_from_shuffled = time_series_clim.dropna().to_xarray()[glacier]

# are they the same?
np.testing.assert_allclose(original, reconvert_from_shuffled, rtol=1e-12)
# did we compare the same years?
np.testing.assert_allclose(original.year, reconvert_from_shuffled.index)

# test file to check for your workflow
time_series_clim.to_csv('test_shuffling/test_{}_{}_{}_{}_reconvert_clim.csv'.format(glacier, gcm, typ, scenario))
time_series_clim

Unnamed: 0,RGI60-11.00897
1850,
1851,271.651917
1852,271.823822
1853,271.265320
1854,271.167328
...,...
2096,280.742157
2097,280.462738
2098,281.086548
2099,281.597198
