# Statistical Downscaling and Bias-Adjustment - Advanced tools

The previous notebook covered the most common utilities of `xclim.sdba` for conventionnal cases. Here we explore more advanced usage of `xclim.sdba` tools.

## LOESS smoothing and detrending

As described in Cleveland (1979), locally weighted linear regressions are multiple regression methods using a nearest-neighbor approach. Instead of using all datapoints to compute a linear or polynomial regression, LOESS algorithms compute a local regression for each point in the dataset, using only the k-nearest neighbors as selected by a weighting function. This weighting function must fulfill some strict requirements, see the doc of `xclim.sdba.loess.loess_smoothing` for more details.

In xclim's implementation, the user can choose between local _constancy_ ($d=0$, local estimates are weighted averages) and local _linearity_ ($d=1$, local estimates are taken from linear regressions). Two weighting functions are currently implemented : "tricube" ($w(x) = (1 - x^3)^3$) and "gaussian" ($w(x) = e^{-x^2 / 2\sigma^2}$). Finally, the number of Cleveland's _robustifying iterations_ is controllable through `niter`. After computing an estimate of $y(x)$, the weights are modulated by a function of the distance between the estimate and the points and the procedure is started over. These iterations are made to weaken the effect of outliers on the estimate.

The next example shows the application of the LOESS to 2 years of daily temperature data. The red curve is the weighting function centered on January 1st 2014, the red circles are the nearest-neighbors. The black line and dot are the estimated $y$, outputs of the `sdba.loess.loess_smoothing` function, using local linear regression (passing $d = 1$), a window spanning 20% ($f = 0.2$) of the domain, the "tricube" weighting function and only one iteration.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from xclim.sdba import loess
%matplotlib inline

In [None]:
# Daily temperature data from xarray's tutorials
ds = xr.tutorial.open_dataset('air_temperature').resample(time='D').mean()
tas = ds.isel(lat=0, lon=0).air

## Everything below is there for having a clear explicit mathematical example
## Most functions shouldn't be called directly by the user, see next subsection for a concrete usage example

# LOESS algorithms as implemented here use scaled coordinates.
x = tas.time
x = (x - x[0]) / (x[-1] - x[0])
xi = x[366]
ti = tas.time[366]
# weighting function take the distance with all neighbors scaled by the r parameters as input
f = 0.2
r = int(f * tas.time.size)
h = np.sort(np.abs(x - xi))[r]
weights = loess._tricube_weighting(np.abs(x - xi).values / h)

ys = loess.loess_smoothing(tas, d=1, weights='tricube', f=f, niter=1)

fig, ax = plt.subplots()
ax.plot(tas.time, tas, 'o', fillstyle='none')
wax = ax.twinx()
wax.plot(tas.time, weights, color='indianred')
ax.plot(tas.time, tas.where(tas * weights > 0), 'o', color='lightcoral', fillstyle='none')
ax.plot(tas.time, ys, 'k')
ax.plot(ti, ys[366], 'ko')
ax.set_xlabel('Time')
ax.set_ylabel('Temperature [K]')
wax.set_ylabel('Weights')
plt.show()

As it can already be seen in this small example, LOESS smoothing suffers from heavy boundary effects. On the other hand, it has the advantage of always staying within the bounds of the data.


### LOESS Detrending

In climate science, it can be used in the detrending process. `xclim` provides `sdba.detrending.LoessDetrend` in order to compute trend with the LOESS smoothing and remove them from timeseries.

First we create some toy data with a sinusoidal annual cycle, random noise and a linear temperature increase.

In [None]:
time = xr.cftime_range('1990-01-01', '2049-12-31', calendar='noleap')
tas = xr.DataArray(
   (10 * np.sin(time.dayofyear * 2 * np.pi / 365) +  # Annual variability
    5 * (np.random.random_sample(time.size) - 0.5) +  # Random noise
    np.linspace(0, 1.5, num=time.size)),  # 1.5 degC increase in 60 years
    dims=('time',), coords={'time': time},
    attrs={'units': 'degC'}, name='temperature',
)
tas.plot()

Then we compute the trend on the data. Here, we compute on the whole timeseries (`group='time'`) with the parameters suggested above.

In [None]:
from xclim.sdba.detrending import LoessDetrend

# Create the detrending object
det = LoessDetrend(group='time', d=0, niter=2, f=0.2)
# Fitting returns a new object
fit = det.fit(tas)
# Get the trend and the detrended series
trend = fit.get_trend(tas)
tas_det = fit.detrend(tas)

In [None]:
fig, ax = plt.subplots()
trend.plot(ax=ax, label='Computed trend')
ax.plot(time, np.linspace(0, 1.5, num=time.size), label='Expected tred')
ax.plot([time[0], time[int(0.1 * time.size)]], [0.4, 0.4], linewidth=2)

As said earlier, this example shows how the Loess has strong boundary effects. It is recommended to remove the $\frac{f}{2}\cdot N$ outermost points on each side, as shown by the thick green bar in the graph above.