# Shuffling a variable in time

Example code that takes CESM2 daily data and shuffles in time based on day-of-year. That is, the seasonality is preserved, but the temporal coherence on subseasonal scales is removed.

The useful things demonstrated here:
- reshaping xarray DataArray from (time, lat, lon) to (year, day-of-year, lat, lon)
- Using Numpy's random number generator to shuffle along one dimension (method 1)
- Generating random indices
- using `take_along_axis` to sample using the random indices
- reconstructing a DataArray by re-stacking the year and day-of-year into 'time' and assigning the original time values.

In [1]:
%%time

# METHOD 1 : use 'shuffle'
# pro: more direct
# con: don't have the indices to sample something else

import numpy as np
import xarray as xr
from pathlib import Path

# Load data (example - piControl, 10 years of daily TS)
dloc = Path("/glade/campaign/collections/cmip/CMIP6/timeseries-cmip6/b.e21.B1850.f09_g17.CMIP6-esm-piControl.001/atm/proc/tseries/day_1")
fil = dloc / "b.e21.B1850.f09_g17.CMIP6-esm-piControl.001.cam.h1.TS.00310101-00401231.nc"
ds = xr.open_dataset(fil, decode_times=True)
x = ds['TS']

# rearrange into dayofyear, year, lat, lon
xreshape = x.copy(deep=True)
year = ds.time.dt.year # define a year coordinate
doy = ds.time.dt.dayofyear  # define a day-of-year coordinate
xreshape = xreshape.assign_coords(year=("time", year.data), doy=("time", doy.data))
xreshape = xreshape.set_index(time=("year","doy")).unstack("time")
# xreshape is now     (lat:192 lon:288 year:10 doy:365)

rng = np.random.default_rng()  # random number generator
xTimeShuffle = xreshape.copy(deep=True)  # make a copy of the reshaped data (not strictly necessary)
xTimeShuffleArr = xTimeShuffle.values    # ndarray BUT POINTS AT THE DATA IN xTimeShuffle **CAUTION**
rng.shuffle(xTimeShuffleArr, axis=3)     # axis associated with dy of year; returns None => modifies array in place

# xTimeShuffle now contains the shuffled data in xTimeShuffleArr

# go back to original shape:
xTimeShuffle = xTimeShuffle.stack(time=("year","doy")).transpose("time","lat","lon")



CPU times: user 1min 2s, sys: 10.4 s, total: 1min 12s
Wall time: 1min 15s


## Alternative Method

This second method produces the same end result, but in this case we generate the shuffled indices and keep them. The advantage is that then we can easily validate the sampling compared to the original data. A second advantage is that multiple fields can be resampled in exactly the same order.

- **pro:** a little faster, keeps indices handy
- **con:** a little more complicated, might have to be careful about knowing which axis is which.

In [14]:
%%time

# Method 2: generate random indices for each day of year using 'permutation'

## <-- copied from above for timing comparison -->

import numpy as np
import xarray as xr
from pathlib import Path

# Load data (example - piControl, 10 years of daily TS)
dloc = Path("/glade/campaign/collections/cmip/CMIP6/timeseries-cmip6/b.e21.B1850.f09_g17.CMIP6-esm-piControl.001/atm/proc/tseries/day_1")
fil = dloc / "b.e21.B1850.f09_g17.CMIP6-esm-piControl.001.cam.h1.TS.00310101-00401231.nc"
ds = xr.open_dataset(fil, decode_times=True)
x = ds['TS']

# rearrange into dayofyear, year, lat, lon
xreshape = x.copy(deep=True)
year = ds.time.dt.year # define a year coordinate
doy = ds.time.dt.dayofyear  # define a day-of-year coordinate
xreshape = xreshape.assign_coords(year=("time", year.data), doy=("time", doy.data))
xreshape = xreshape.set_index(time=("year","doy")).unstack("time")
# xreshape is now     (lat:192 lon:288 year:10 doy:365)

rng = np.random.default_rng()  # random number generator

## <-- end copied section -->

# Alternative, where we know the shuffled indices for each day-of-year
ntime = len(xreshape.year)
print(f'There are {ntime} years.')
# make an array that will hold the year, doy random indices

randomIndices = np.zeros((len(xreshape.year), len(xreshape.doy)), dtype=int)
for d in range(len(xreshape.doy)):
    randomIndices[:,d] = rng.permutation(np.arange(ntime))  # random indices that are different order of year for each day

    
# make the indices be the shape of the data:
randomIndices = xr.DataArray(randomIndices, dims=["year","doy"], coords={"year":xreshape.year, "doy":xreshape.doy})
randomIndicesBcst = randomIndices.broadcast_like(xreshape)

# This works, but seems clumsy
# s = xreshape.copy(deep=True)
# for d in range(365):
#     s[:,:,:,d] = xreshape[:,:,randomIndices[:,d].values,d].values
    
# This produces the same answer
s2 = np.take_along_axis(xreshape.values, randomIndicesBcst.values, axis=2)

# put back into a DataArray:
s2 = xr.DataArray(s2, dims=xreshape.dims, coords=xreshape.coords)

# finalize by transforming back to original shape:
s2 = s2.stack(time=("year","doy")).transpose("time","lat","lon")
# and finally, replace the "MultiIndex" time with the original time:
s2 = s2.assign_coords({"time":x.time})

There are 10 years.
CPU times: user 54.2 s, sys: 10.1 s, total: 1min 4s
Wall time: 1min 4s


In [13]:
# quick validation: difference between time average of the shuffled and original data
# won't be zero because of precision issues, but is very small.

print(s2.coords)
print(x.coords)

(s2 - x).mean(dim='time')

Coordinates:
  * lat      (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 87.17 88.12 89.06 90.0
  * lon      (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
  * time     (time) object 0031-01-01 00:00:00 ... 0040-12-31 00:00:00
Coordinates:
  * lat      (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 87.17 88.12 89.06 90.0
  * lon      (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
  * time     (time) object 0031-01-01 00:00:00 ... 0040-12-31 00:00:00
