<a href="https://colab.research.google.com/github/cagBRT/timeSeries/blob/main/4b_TimeSeriesAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!git clone -l -s https://github.com/cagBRT/timeSeries.git cloned-repo
%cd cloned-repo

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
from statsmodels.tsa.seasonal import seasonal_decompose

# **Detrending a time series**

Detrending a time series is to remove the trend component from a time series

The key point is that when removing trend, seasonality or noise, we don't just delete information. We take that information apart in order to analyze separately each part of the behaviour. <br>
For example, if we are interested in seasonal effects, we still have the seasonal variation we have removed, isolated from other components.

When we remove seasonality and trend, we can tell what is the effect of the trend, what is the effect of the seasons, and what is the effect that isn't accounted for by season nor trend and that should reveal another - hopefully interesting - phenomenon.

There are several approaches to de-trending a time series:<br>

1. Subtract the line of best fit from the time series. <br>
2. Subtract the trend component obtained from time series decomposition we saw earlier..<br>

3. Subtract the mean

4. Apply a filter like Baxter-King filter(statsmodels.tsa.filters.bkfilter) or the Hodrick-Prescott Filter (statsmodels.tsa.filters.hpfilter) to remove the moving average trend lines or the cyclical components.

**Detrend Simple Example**

Create a random signal

In [None]:
t = np.linspace(0,5,100)
x = t + np.random.normal(size=100)

Detrend the signal

In [None]:
x_detrended=signal.detrend(x)

Plot the signal and the detrended signal

In [None]:
plt.figure(figsize=(5, 4))
plt.plot(t, x, label="x")
plt.plot(t, x_detrended, label="x_detrended")
plt.legend(loc='best')
plt.show()

**Subtract the line of best fit from the time series**<br>
 The line of best fit may be obtained from a linear regression model with the time steps as the predictor. For more complex trends, you may want to use quadratic terms (x^2)

In [None]:
df = pd.read_csv('timeSeriesExample.csv', parse_dates=['date'], index_col='date')
df.head()

**Plot the series**<br>
The series has a trend, so we can detrend this series

In [None]:
plt.plot(df)
plt.title('Drug Sales', fontsize=16)
plt.show()

**Subtract the line of best fit using sgnal.detrend**

In [None]:
# Using scipy: Subtract the line of best fit
detrended = signal.detrend(df.value.values)
plt.plot(detrended)
plt.title('Drug Sales detrended by subtracting the least squares fit', fontsize=16)
plt.show()

**Using statmodels: Subtracting the Trend Component**

In [None]:
#Decompose the series df uisng the multiplicative method
result_mul = seasonal_decompose(df['value'], model='multiplicative', extrapolate_trend='freq')
detrended = df.value.values - result_mul.trend
plt.plot(detrended)
plt.title('Drug Sales detrended by subtracting the trend component', fontsize=16)

**Detrend Assignment**<br>
Detrend the time series in file shampoo.csv

In [None]:
#Detrend Assignment

In [None]:
#@title
#!cat shampoo.csv
shampoo = pd.read_csv('shampoo.csv', parse_dates=['Month'], index_col='Month')
shampoo.head()

plt.plot(shampoo)
plt.title('Shampoo Sales', fontsize=16)
plt.show()

#Decompose the series df uisng the multiplicative method
result_mul = seasonal_decompose(shampoo['Sales'], model='multiplicative', 
                                freq=12,extrapolate_trend='freq')
detrended = shampoo.Sales - result_mul.trend
plt.plot(detrended)
plt.title('Shampoo Sales detrended by subtracting the trend component', 
          fontsize=16)
plt.show()

# **Deseasonalize a time series**

1. Take a moving average with length as the seasonal window. This will smoothen in series in the process.

2. Seasonal difference the series (subtract the value of previous season from the current value)

3. Divide the series by the seasonal index obtained from STL decomposition
>If dividing by the seasonal index does not work well, try taking a log of the series and then do the deseasonalizing. You can later restore to the original scale by taking an exponential.


**Get the time series data**

In [None]:
df = pd.read_csv('timeSeriesExample.csv', parse_dates=['date'], index_col='date')

**DeSeasonalize**<br>
To deseasonalize, divide by the seasonal_decompose the instead of subtracting

In [None]:
# Time Series Decomposition
result_mul = seasonal_decompose(df['value'], model='multiplicative', extrapolate_trend='freq')
detrended = df.value.values / result_mul.trend

plt.plot(df)
plt.title('Drug Sales', fontsize=16)
plt.show()

# Plot
plt.plot(deseasonalized)
plt.title('Drug Sales Deseasonalized', fontsize=16)
plt.plot()

**Deseasonalize Assignment**<br>
Deseasonalize the time series in file shampoo.csv

In [None]:
#Assignment

In [None]:
#@title 
shampoo = pd.read_csv('shampoo.csv', parse_dates=['Month'], index_col='Month')
result_mul = seasonal_decompose(shampoo['Sales'], model='multiplicative', 
                                freq=12, extrapolate_trend='freq')
deseasonalized = shampoo.Sales / result_mul.trend

plt.plot(shampoo)
plt.title('Shampoo Sales', fontsize=16)
plt.show()

# Plot
plt.plot(deseasonalized)
plt.title('Shampoo Sales Deseasonalized', fontsize=16)
plt.plot()