# On Analyzing Real World Time Series for Forecasting
Throughtout this notebook, we will do some preliminary investigations on time series datasets that come from real world problems. We'll begin by conducting exploratory data analysis and then begin developing our models.

## What Needs to be Done
- [ ] Map out system structure for performing our computations of interest
- [ ] Flush out a few models of interest so we can make forecast on real world data
- [ ] Develop the abstraction for our `Model` class that most models typically have when forecasting
- [ ] Figure out whether the data we're analyzing is stationary or not.
- [ ] Write support for `ARIMA` class of models.
- [ ] Revisit chapter 2 and add support for other quantities of interest to compute.

### Attributes:
1. data: A list or array-like structure storing the time series data points.
2. timestamps: A list or array-like structure that stores the timestamps for each data point. It could be dates, times, or simply indices.
3. frequency: A string or some identifier representing the frequency of data collection (e.g., 'daily', 'monthly').

### Methods:
5. decompose(): To decompose the time series into trend, seasonality, and residuals.
7. train_test_split(split_ratio): To split the data into a training and test set.
8. smoothing(method): Apply various smoothing techniques (e.g., moving average).

### Files:
8. time_series: A python module that contain classes and methods/functions.
9. data_loader: A python module that loads data. In this module, we have functions for loading passenger airline and Yahoo Finance stock data.

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

from collections import namedtuple
from time_series import TimeSeriesFactory, UnivariateTimeSeries
from data_loader import build_airline_passenger_uts, build_stock_uts

## Airline Passenger Data Analysis

In [2]:
airline_passengers = build_airline_passenger_uts()
airline_passengers

UnivariateTimeSeries(passengers_count)

- `object` `.methods()`

In [None]:
print(airline_passengers.time_col)
print(airline_passengers.value_col)

In [None]:
series, col = airline_passengers.get_series(True)
series

In [3]:
ap_series = airline_passengers.get_series(False)
print(ap_series)

[112 118 132 129 121 135 148 148 136 119 104 118 115 126 141 135 125 149
 170 170 158 133 114 140 145 150 178 163 172 178 199 199 184 162 146 166
 171 180 193 181 183 218 230 242 209 191 172 194 196 196 236 235 229 243
 264 272 237 211 180 201 204 188 235 227 234 264 302 293 259 229 203 229
 242 233 267 269 270 315 364 347 312 274 237 278 284 277 317 313 318 374
 413 405 355 306 271 306 315 301 356 348 355 422 465 467 404 347 305 336
 340 318 362 348 363 435 491 505 404 359 310 337 360 342 406 396 420 472
 548 559 463 407 362 405 417 391 419 461 472 535 622 606 508 461 390 432]


In [4]:
airline_passengers.stationarity_test(ap_series)

mean_1=182.902778, mean_2=377.694444
variance_1=2244.087770, variance_2=7367.962191
Mean is not Stationary
Variance is not Stationary


In [None]:
train, validation, test = airline_passengers.get_train_validation_test_split(60, 40)
print(train), print(validation), print(test)

In [None]:
print(airline_passengers.get_slice(1, 50))

In [None]:
print(airline_passengers.autocovariance(3))
print(airline_passengers.autocorrelation(3))
print(airline_passengers.autocovariance_matrix(2))
print(airline_passengers.autocorrelation_matrix(3))

In [None]:
new_uts = airline_passengers.normalize()
new_uts

In [None]:
new_uts.data

In [None]:
print(airline_passengers.get_order_k_diff(2))

In [None]:
print(airline_passengers.mean())
print(airline_passengers.std())
print(airline_passengers.variance())
print(airline_passengers.max_min_range())
print(airline_passengers.get_statistics())

In [None]:
airline_passengers.plot(90)

In [None]:
airline_passengers.plot_autocorrelation(2, plot_full=True)

In [None]:
airline_passengers.scatter_plot(1)

In [None]:
print(airline_passengers.get_historical_data(series))

In [None]:
print(airline_passengers.get_true_label_data(series))

In [None]:
stationary_series

## Stock Data Analysis

In [5]:
# Only grab stocks whose data is available for the entire time period
start_date, end_date = "2013-01-01", "2023-08-08"
Stock = namedtuple("Stock", ["symbol", "name"])
stocks = [
    ("^GSPC", "S&P 500"),
    ("AAPL", "Apple"),
    ("INTC", "Intel"),
    ("AMZN", "Amazon"),
    ("TSLA", "Tesla"),
    ("GOOGL", "Google")
]
stocks = [Stock(*s) for s in stocks]
stocks = {s.symbol: build_stock_uts(s.symbol, s.name, start_date=start_date, end_date=end_date) for s in stocks}

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


In [15]:
values_cols = list(stocks.keys())
stock_mvts = TimeSeriesFactory.create_time_series(
    time_col="date",
    time_values=stocks[values_cols[0]].data.index,
    values_cols=values_cols,
    values=[stock.get_series() for stock in stocks.values()]
)

In [18]:
type(stocks['AAPL'])

time_series.UnivariateTimeSeries

In [16]:
aapl_series = stocks['AAPL'].get_series()
aapl_series

array([ 19.77928543,  19.56714249,  19.17749977, ..., 191.57000732,
       185.52000427, 182.13000488])

In [17]:
stocks['AAPL'].stationarity_test(aapl_series)

mean_1=27.644029, mean_2=108.634631
variance_1=65.864152, variance_2=2292.798569
Mean is not Stationary
Variance is not Stationary
