# Working with Time Series Data in Python

## What is Time Series Data?

In [None]:
import pandas as pd

df = pd.read_csv("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=IBM&apikey=demo&datatype=csv")

df.head() 

In [None]:
df['timestamp'] = pd.to_datetime(df.timestamp)

df.head()

In [None]:
df.index = df.timestamp

print(df.head())

df['close'].plot()

### We can do this all in one with...

In [None]:
one_step = pd.read_csv("https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=IBM&apikey=demo&datatype=csv" \
                       , parse_dates=['timestamp'], index_col = 'timestamp')

one_step.head()

# Working With Time Series Data

## Time-based indexing and slicing

In [None]:
# We can slice the dataframe

start = df['timestamp'].iloc[10]
stop = df['timestamp'].iloc[20]

sample = df[start:stop]

sample.head()

In [None]:
# We can access date time values

df.timestamp.dt.month

## What is Stationarity? Why does it matter?

A **stationary** time series is one whose statistical properties, such as mean, variance, and autocorrelation, do not change over time.

- **Constant Mean:** The mean of the series should not be a function of time.

- **Constant Variance:** The variance of the series should not be a function of time. This property is known as homoscedasticity

- **Constant Covariance:** The covariance of the i-th term and the (i+m)-th term should not be a function of time.

### The tale of two datasets

In [None]:
### Heart Rate data


hr = pd.read_csv("https://raw.githubusercontent.com/alyssaq/python-data-science-intro/master/datasets/heart-rate-time-series.csv", header=None)
hr.columns = ['rate']

# Plot our Heart Rate Data
hr.plot(title='HR Data')

### The tale of two datasets

In [None]:
df['close'].plot(title='Stock Data')

## Tests for Stationarity

### Augmented Dickey-Fuller (ADF) Test

The ADF test is one of the most popular statistical tests to check the stationarity of a time series. The null hypothesis of the ADF test is that the time series is non-stationary due to the presence of a unit root.

$ 
\Delta Y_t = \alpha + \beta t + \gamma Y_{t-1} + \delta_1 \Delta Y_{t-1} + \ldots + \delta_{p-1} \Delta Y_{t-p+1} + \epsilon_t
$

Where:
- $\Delta Y_t$ is the difference series
- $\alpha$, $\beta$, and $\gamma$ are coefficients
- $p$ is the number of lags
- $\epsilon_t$ is the error term



### Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

The KPSS test is another popular test for stationarity, where the null hypothesis is that the data is stationary around a deterministic trend.

$
Y_t = \alpha + \beta t + \rho Y_{t-1} + \epsilon_t
$

Where:
- $Y_t$ is the time series
- $\alpha$ and $\beta$ are coefficients
- $\rho$ is the autoregressive parameter
- $\epsilon_t$ is the error term

## The tests in action

### The ADF test

In [None]:
from statsmodels.tsa.stattools import adfuller

result_adf_stock = adfuller(df['close'])
result_adf_hr = adfuller(hr)

print(f'ADF Statistic Stock: {result_adf_stock[0]}')
print(f'p-value: {result_adf_stock[1]}')
print('Critical Values:')
for key, value in result_adf_stock[4].items():
    print(f'\t{key}: {value}')

print("---------------------------")
    
print(f'ADF Statistic HR: {result_adf_hr[0]}')
print(f'p-value: {result_adf_hr[1]}')
print('Critical Values:')
for key, value in result_adf_hr[4].items():
    print(f'\t{key}: {value}') 

## The tests in action

### The KPSS Test

In [None]:
### NOTE: You may not get the same values from the video.
## Data is live which will result in a change in value each time you run. However,
## the conclusion is the same.

from statsmodels.tsa.stattools import kpss

# Perform KPSS test
result_adf_stock = kpss(df['close'], nlags="auto")
result_adf_hr = kpss(hr, nlags="auto")


print(f'KPSS Statistic: {result_adf_stock[0]}')
print(f'p-value: {result_adf_stock[1]}')
print('Critical Values:')
for key, value in result_adf_stock[3].items():
    print(f'\t{key}: {value}')
    
print("---------------------------")

print(f'KPSS Statistic: {result_adf_hr[0]}')
print(f'p-value: {result_adf_hr[1]}')
print('Critical Values:')
for key, value in result_adf_hr[3].items():
    print(f'\t{key}: {value}')

## How do we fix TS data that is not Stationarity?

## Differencing

Differencing is a technique used to make a non-stationary time series stationary. It involves transforming the series by computing the differences between consecutive observations.

### First Order Differencing

The first difference is given by the following equation:

$ 
\Delta Y_t = Y_t - Y_{t-1}
$


In [None]:
df = df.sort_index()
new_station_data = df['close'].diff()
new_station_data = new_station_data.dropna()
new_station_data.head()

In [None]:
### NOTE: You may not get the same values from the video.
## Data is live which will result in a change in value each time you run. However,
## the conclusion is the same.

result_adf_stock = adfuller(new_station_data)
result_kpss_stock = kpss(new_station_data, nlags="auto")

print(f'ADF Statistic Stock: {result_adf_stock[0]}')
print(f'p-value: {result_adf_stock[1]}')
print('Critical Values:')
for key, value in result_adf_stock[4].items():
    print(f'\t{key}: {value}')
    
print("---------------------------")

print(f'KPSS Statistic: {result_kpss_stock[0]}')
print(f'p-value: {result_kpss_stock[1]}')
print('Critical Values:')
for key, value in result_kpss_stock[3].items():
    print(f'\t{key}: {value}')
    


In [None]:
new_station_data.plot(title='Stock Data')