# Simple timeseries analysis

In this Notebook you will learn to perform different kinds of analysis on timeseries data

Seaborn is a powerful library to visualise informative statistical graphics (https://seaborn.pydata.org/index.html)

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from statsmodels.tsa import seasonal

from datascience.read import Era5, AscatDataH121, read_multiple_ds

%matplotlib widget

We will look at ERA5 and ASCAT data in this notebook. The reading and filtering of valid data is identical as in the previous notebooks.

In [None]:
era5 = Era5(read_bulk = False)
ascat = AscatDataH121(read_bulk = False)

In [None]:
lat = 48.198905
lon = 16.367182
gpi = era5.grid.find_nearest_gpi(lon, lat)[0]

In [None]:
ts = read_multiple_ds(loc=(lon, lat), ascat=ascat, era5=era5, ref_ds="ascat")

In [None]:
not_valid = (ts["stl1_era5"] < 0) | (ts["sd"] > 0)
ts.loc[:,"sm_valid"] = ~not_valid
ts_valid = ts.loc[ts["sm_valid"]]
ts_valid = ts_valid.dropna()

## Linear regression

To find the correlation between two variables you can scatterplot them against eachother and use the linregress and pearsonr function to get numerical values. The resulting pearson statistic show the correlation between two variables, a small p-value(<0.05) show a statistical relevanve to this correlation. Note that the p-value is dependent on the size of the dataset, large datasets give small p-values independent on their statistical relevance.

In [None]:
linreg = stats.linregress(ts_valid["surface_soil_moisture"],  ts_valid["swvl1_era5"])
pearson = stats.pearsonr(ts_valid["surface_soil_moisture"],  ts_valid["swvl1_era5"])
pearson

In [None]:
fig, ax = plt.subplots(figsize=(7,7))

kwargs = {"facecolors": "None", "edgecolor": "C0"}

ax.scatter(ts_valid["surface_soil_moisture"], ts_valid["swvl1_era5"], **kwargs)
ax.plot(ts_valid["surface_soil_moisture"], linreg.intercept+linreg.slope*ts_valid["surface_soil_moisture"], c="r")
ax.set_xlabel("surface soil moisture [%]")
ax.set_ylabel("volumetric soil water layer 1 [m]")
ax.set_title("Soil Moisture ASCAT vs. swvl1 Era5")

plt.show()

## Pairplot

You can also create a pairplot to see the visualise the correlation between more than two variables:

In [None]:
sns.pairplot(ts_valid, vars=["surface_soil_moisture", "swvl1_era5", "stl1_era5"], diag_kind="hist", plot_kws=kwargs)

## Boxplot

And create boxplots to see the distribution of values

In [None]:
fig, ax = plt.subplots(figsize=(5,5))

sns.boxplot(ts["surface_soil_moisture"], fill=False)
ax.set_title("Boxplot surface soil moisture ASCAT")
ax.set_ylabel("surface soil moisture [%]")
ax.set_xlabel(f"gpi: {gpi}")

plt.show()

## Seasonal Trend

Also, you can create seasonal trend analysis, beware that nan values might cause problems, as you need temporal consistent data. To do this you can also resample the data to a consistent time frame (beware that if you do this some variables are not meaningful anymore). The period should be chosen after how many entries the data repeats itself (e.g. if it repeats itself every year and you have data every 5 days the period is 365/5=73)

In [None]:
ts_resampled = ts.resample('5D').mean()
ts_resampled

In [None]:
result = seasonal.seasonal_decompose(ts_resampled["surface_soil_moisture"].dropna(), model="additive", period=73)
result.plot()
plt.show()

You can also compare the trend of different variables:

In [None]:
trend_ssm = seasonal.seasonal_decompose(ts_resampled["surface_soil_moisture"].dropna(), model="additive", period=73).trend
trend_t2m = seasonal.seasonal_decompose(ts_resampled["t2m_era5"].dropna(), model="additive", period=73).trend

fig, ax = plt.subplots(1,1, figsize=(10,5))
ax.plot(trend_ssm, label="ssm")
ax1 = ax.twinx()
ax1.plot(trend_t2m, c="r", label="t2m")
ax.set_title("surface soil moisture and temperature 2m trends")
ax.set_ylabel("ssm [%]")
ax1.set_ylabel("t2m [°C]")
ax.set_xlabel("Date")
ax.legend()
ax1.legend()

plt.show()