##**Load Time Series Dataset**
We'll now explore trend and seasonality removal with examples. We'll be using famous air passenger datasets available on-line for our purpose because it has both trend and seasonality. It has information about US airline passengers from 1949 to 1960 recorded each month. Please download the dataset to follow along.

*Air Passengers Dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# import pandas, numpy, matplotlib


In [None]:
air_passengers= # read the data using pandas include (index_col=0, parse_dates=True)
# preview data using head command

In [None]:
air_passengers.plot(figsize=(8,4), color="tab:red");

In [None]:
air_passengers["1952"].plot(kind="bar", color="tab:green", legend=False);

By looking at the above plots we can see that our time-series is multiplicative time-series and has both trend as well as seasonality. We can see the trend as passengers are constantly increasing over time. We can see seasonality with the same variations repeating for 1 year where value peaks somewhere are around August.

**Decompose Time-Series to See Components (Trend, Seasonality, Noise, etc)**

We can decompose time-series to see various components of time-series. Python module named statmodels provides us with easy to use utility which we can use to get an individual component of time-series and then visualize it.

In [None]:
#import seasonal decomposing using statsmodel and seasonal decompose

In [None]:
decompose_result = # perform seasonal decomposing using multiplicative model type

trend = # get trend
seasonal = #get seasonal
residual = #get residual

#plot the decompose result

We can notice trend and seasonality components separately as well as residual components. There is a loss of residual in the beginning which is settling later.



**Checking Whether Time-Series is Stationary or Not**

As we declared above time-series is stationary whose mean, variance and auto-covariance are independent of time. We can check mean, variance and auto-covariance using moving window functions available with pandas. We'll also use a dicky-fuller test available with statsmodels to check the stationarity of time-series. If time-series is not stationary then we need to make it stationary.

Below we have taken an average over moving window of 12 samples. We noticed from the above plots that there is the seasonality of 12 months in time-series. We can try different window sizes for testing purposes.

In [None]:
air_passengers.rolling(window = 12).mean().plot(figsize=(8,4), color="tab:red", title="Rolling Mean over 12 month period");

In [None]:
#same as above apply the rolling mean for 20 month period

We can clearly see that time-series has a visible upward trend.

Below we have taken variance over the moving window of 12 samples. We noticed from the above plots that there is the seasonality of 12 months in time-series

In [None]:
# same as above instead of rolling mean now calculate the variance for 12 months

In [None]:
# same as above instead of rolling mean now calculate the variance for 20 months

From the above two plots, we notice that time series has some kind of multiplicative effect which seems to be increasing with time period. We can see the low seasonality effect at the beginning which amplifies over time.

Below we are also plotting an auto-correlation plot for time-series data as well. This plot helps us understand whether present values of time series are positively correlated, negatively correlated, or not related at all to past values. statsmodels library provides ready to use method plot_acf as a part of module statsmodels.graphics.tsaplots.

In [None]:
#using from statsmodels.graphics.tsaplots import the autocorrelation function (plot_acf )


plot_acf(air_passengers);

We can notice from the above chart that after 13 lags, the line gets inside confidence interval (light blue area). This can be due to seasonality of 12-13 months in our data.

**Dicky-Fuller Test for Stationarity**
Once we have removed trend and seasonality from time-series data then we can test its stationarity using a dicky-fuller test. It's a statistical test to check the stationarity of time-series data.

Further Read About Test

We can perform Dicky-Fuller test functionality available with the statsmodels library.

Below we'll test the stationarity of our time-series with this functionality and try to interpret its results to better understand it.

In [None]:
from statsmodels.tsa.stattools import adfuller

dftest = adfuller(air_passengers['#Passengers'], autolag = 'AIC')

print("1. ADF : ",dftest[0])
print("2. P-Value : ", dftest[1])
print("3. Num Of Lags : ", dftest[2])
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest[3])
print("5. Critical Values :")
for key, val in dftest[4].items():
    print("\t",key, ": ", val)

We can interpret above results based on p-values of result.

p-value > 0.05 - This implies that time-series is non-stationary.
p-value <=0.05 - This implies that time-series is stationary.
We can see from the above results that p-value is greater than 0.05 hence our time-series is not stationary. It still has time-dependent components present which we need to remove.

**Remove Trend**
There are various ways to remove trends from data as we have discussed above. We'll try ways like differencing, power transformation, log transformation, etc.

**Logged Transformation**

To apply log transformation, we need to take a log of each individual value of time-series data.


In [None]:
logged_passengers = air_passengers["#Passengers"].apply(lambda x : np.log(x))

ax1 = plt.subplot(121)
logged_passengers.plot(figsize=(12,4) ,color="tab:red", title="Log Transformed Values", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(color="tab:red", title="Original Values", ax=ax2);

From the above first chart, we can see that we have reduced the variance of time-series data. We can look at y-values of original time-series data and log-transformed time-series data to conclude that the variance of time-series is reduced.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [None]:
# perform seasonal decompose on logged transformed

**Power Transformations**
We can apply power transformation in data same way as that of log transformation to remove trend.

In [None]:
# perform power transform using (lambda x : x ** 0.5)

# plot the curve for the same

From the above first chart, we can see that we have reduced the variance of time-series data. We can look at y-values of original time-series data and power-transformed time-series data to conclude that the variance of time-series is reduced.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [None]:
# perform seasonal decompose on logged transformed

**Applying Moving Window Functions**
We can calculate rolling mean over a period of 12 months and subtract it from original time-series to get de-trended time-series.

In [None]:
rolling_mean = air_passengers.rolling(window = 12).mean()
passengers_rolled_detrended = air_passengers - rolling_mean

ax1 = plt.subplot(121)
passengers_rolled_detrended.plot(figsize=(12,4),color="tab:red", title="Differenced With Rolling Mean over 12 month", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);

From the above the first chart, we can see that we seem to have removed trend from time-series data.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [None]:
# perform seasonal decompose on logged transformed

**Applying Moving Window Function on Log Transformed Time-Series**

We can apply more than one transformation as well. We'll first apply log transformation to time-series, then take a rolling mean over a period of 12 months and then subtract rolled time-series from log-transformed time-series to get final time-series.

In [None]:
# take log transformed and rolling mean for 12 months period time
passengers_log_rolled_detrended = logged_passengers["#Passengers"] - rolling_mean["#Passengers"]


# plot the passengers_log_rolled_detrended

From the above the first chart, we can see that we are able to removed the trend from time-series data.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [None]:
# perform seasonal decompose on logged transformed

**Applying Moving Window Function on Power Transformed Time-Series**

We can apply more than one transformation as well. We'll first apply power transformation to time-series, then take a rolling mean over a period of 12 months and then subtract rolled time-series from power-transformed time-series to get final time-series.

In [None]:

passengers_pow_rolled_detrended = # take the difference of power transformed and rolling mean on powered transformed


From the above the first chart, we can see that we are able to remove the trend from time-series data.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [None]:
# perform seasonal decompose on logged transformed

**Applying Linear Regression to Remove Trend**

We can also apply a linear regression model to remove the trend. Below we are fitting a linear regression model to our time-series data. We are then using a fit model to predict time-series values from beginning to end. We are then subtracting predicted values from original time-series to remove the trend.

In [None]:
from statsmodels.regression.linear_model import OLS

least_squares = OLS(air_passengers["#Passengers"].values, list(range(air_passengers.shape[0])))
result = least_squares.fit()

fit = pd.Series(result.predict(list(range(air_passengers.shape[0]))), index = air_passengers.index)

passengers_ols_detrended = air_passengers["#Passengers"] - fit


# plot the reggressed model

From the above the first chart, we can see that we are able to remove the trend from time-series data.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [None]:
# perform seasonal decompose on logged transformed

After applying the above transformations, we can say that linear regression seems to have done a good job of removing the trend than other methods. We can confirm it further whether it actually did good by removing the seasonal component and checking stationarity of time-series.

**Remove Seasonality**
We can remove seasonality by differencing technique. We'll use differencing over various de-trended time-series calculated above.

**Differencing Over Log Transformed Time-Series**
We have applied differencing to log-transformed time-series by shifting its value by 1 period and subtracting it from original log-transformed time-series

In [None]:
logged_passengers_diff = logged_passengers - logged_passengers.shift()

# plot the logged_passengers_diff;

We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above.

In [None]:
# perform the ADF test

From our dicky-fuller test results, we can confirm that time-series is NOT STATIONARY due to the p-value of 0.07 greater than 0.05.

**Differencing Over Power Transformed Time-Series**

We have applied differencing to power transformed time-series by shifting its value by 1 period and subtracting it from original power transformed time-series

In [None]:
powered_passengers_diff = powered_passengers - powered_passengers.shift()

# plot

We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above.

In [None]:
# perform ADF test

From our dicky-fuller test results, we can confirm that time-series is STATIONARY due to a p-value of 0.02 less than 0.05.

**Differencing Over Time-Series with Rolling Mean taken over 12 Months**

We have applied differencing to mean rolled time-series by shifting its value by 1 period and subtracting it from original mean rolled time-series

In [None]:
# plot and calculate the rolled detrended diff

We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above.

In [None]:
# perform ADF test

From our dicky-fuller test results, we can confirm that time-series is STATIONARY due to a p-value of 0.02 less than 0.05.

In [None]:
# diffrence the log rolled detrended diff and plot

We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above.

In [None]:
# perform ADF test

From our dicky-fuller test results, we can confirm that time-series is STATIONARY due to a p-value of 0.001 less than 0.05.

**Differencing Over Power Transformed & Mean Rolled Time-Series**

We have applied differencing to power transformed & mean rolled time-series by shifting its value by 1 period and subtracting it from original time-series

In [None]:
# take the differrence of linear regressed transformed with a shift and then plot

We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above.

In [None]:
# perform ADF test

From our dicky-fuller test results, we can confirm that time-series is STATIONARY due to a p-value of 0.005 less than 0.05.

**Differencing Over Linear Regression Transformed Time-Series**

We have applied differencing to linear regression transformed time-series by shifting it's value by 1 period and subtracting it from original log-transformed time-series

In [None]:
# take the differrence of linear regressed transformed with a shift and then plot


We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above

In [None]:
# perform ADF test

From our dicky-fuller test results, we can confirm that time-series is NOT STATIONARY due to the p-value of 0.054 greater than 0.05.

This is the end!!!!