<a href="https://colab.research.google.com/github/LakshyaMalhotra/time-series-analysis/blob/main/non_stationary_time_series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# library imports
import os
import warnings
import datetime

import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns

warnings.filterwarnings('ignore')
%matplotlib inline

In [2]:
# plot formatting
plt.rcParams['figure.figsize'] = (10,8)
plt.rcParams['xtick.labelsize'] = 13
plt.rcParams['ytick.labelsize'] = 13
plt.rcParams['axes.labelsize'] = 14
sns.set_palette('Set2')
colors = list(sns.color_palette('Set2'))

In [3]:
def convert_to_dt(x):
    return datetime.datetime.strptime(x, '%m/%d/%Y')

In [4]:
df = pd.read_csv(
    'https://raw.githubusercontent.com/srivatsan88/YouTubeLI/master/dataset/amazon_revenue_profit.csv', 
    parse_dates=['Quarter'], date_parser=convert_to_dt)
df.head()

Unnamed: 0,Quarter,Revenue,Net Income
0,2020-03-31,75452,2535
1,2019-12-31,87437,3268
2,2019-09-30,69981,2134
3,2019-06-30,63404,2625
4,2019-03-31,59700,3561


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 61 entries, 0 to 60
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Quarter     61 non-null     datetime64[ns]
 1   Revenue     61 non-null     int64         
 2   Net Income  61 non-null     int64         
dtypes: datetime64[ns](1), int64(2)
memory usage: 1.6 KB


There are few modelling techniques for time series:
- ARMA (Auto Regressive Moving Averages): This model expects our time series to be stationary.
- ARIMA (Auto Regressive Integrated Moving Average): When the time-series is non-stationary and has some trend to it.


In [14]:
fig = px.scatter(df, x='Quarter', y='Revenue', title='Amazon Revenue')
fig.update_traces(mode='lines+markers', marker=dict(color='rgb(102,194,165)'))
fig.update_xaxes(rangeslider_visible=True)
fig.show()

This clearly shows that the `revenue` has an upward trend in this time-series which means the data is non-staionary. But just to make sure we will run some statistical tests. The first test we'll run is called `kpss` test. We define our null and alternate hypothesis as follows:

- Null Hypothesis: Series is stationary
- Alternate Hypothesis: Series is non-stationary

In [16]:
from statsmodels.tsa.stattools import kpss

In [17]:
test_stat, p_val, lags, crit_vals = kpss(df.Revenue, regression='c')

In [19]:
print(f'Test statistics: {test_stat}')
print(f'p-value: {p_val}')
print(f'Critical values: {crit_vals}')

if p_val < 0.05:
    print('Series is non-stationary')
else:
    print('Series is stationary')

Test statistics: 0.5827432403327967
p-value: 0.024205159969745753
Critical values: {'10%': 0.347, '5%': 0.463, '2.5%': 0.574, '1%': 0.739}
Series is non-stationary


Another test we can try to test whether the series is stationary or not is `adfuller` test. Our null and alternate hypothesis change in this case as:
- Null Hypothesis: Series possesses a unit root and hence is not stationary
- Alternate Hypothesis: Series is stationary

In [20]:
from statsmodels.tsa.stattools import adfuller

In [21]:
results = adfuller(df.Revenue)

print(f'Test statistics: {results[0]}')
print(f'p-value: {results[1]}')
print(f'Critical values: {results[4]}')

if results[1] > 0.05:
    print('Series is non-stationary')
else:
    print('Series is stationary')

Test statistics: -2.4448360381972445
p-value: 0.12947943121838468
Critical values: {'1%': -3.568485864, '5%': -2.92135992, '10%': -2.5986616}
Series is non-stationary
