<a href="https://colab.research.google.com/github/Ananya-Joshi/CSTE_TA_Workbooks/blob/main/Forecasting_Colab_Lecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [16]:
#Activity 1: Familiarity with Packages, Setting
import pandas as pd
import numpy as np
import math
import plotly.express as px #plotting
import plotly.graph_objects as go

For each of the following activities, run the cell, describe what you observe, and then describe a takeaway or possible remedy to any challenges you observe.

In this intro colab, we are considering the univariate (single variable) case. Here, we are trying to forecast future values of a stream given past values.

Here's a statistical treatment:

**Data Stream**: $x_{0:T-1}$, an array with data for the last T timesteps


**Oracle Forecast**: $\tilde{x}_{T:T+horizon}$ where horizon is how many future steps taken.

**Our Forecast**: $\hat{x}_{T:T+horizon}$ where horizon is how many future steps taken.

**Evaluation Metric**: $\sum_{i \in T}^{T+horizon} |\tilde{x}(i) - \hat{x}(i)| $

In [74]:
#Activity 2: Introduction to simple forecaster & statistical terms

def plot_forecasts(T = 100, horizon=10, aggregation=1):
  orig_data = pd.Series([math.sin(t/10) for t in range(T)])
  x_T = orig_data.groupby(orig_data.index//aggregation).sum()
  orig_oracle = pd.Series([math.sin(t/10) for t in range(T, T+horizon)])
  x_horiz = orig_oracle.groupby(orig_oracle.index//aggregation).sum()
  x_tilde = x_horiz #perfect oracle
  orig_forecast = pd.Series([math.sin((t-1)/10) for t in range(T, T+horizon)]) #using previous prediction for current step
  x_hat = orig_forecast.groupby(orig_forecast.index//aggregation).sum()
  orig_horizon = pd.Series([math.sin((T-1)/10)] * (horizon))
  x_hat_horizon = orig_horizon.groupby(orig_horizon.index//aggregation).sum()
  fig  = go.Figure()
  fig.add_trace(go.Scatter(x = list(range(0, T+1, aggregation)), y=x_T, name='history', marker_color = 'black'))
  fig.add_trace(go.Scatter(x = list(range(int(T), T+horizon, aggregation)), y=x_horiz, name='oracle', marker_color = 'red'))
  fig.add_trace(go.Scatter(x = list(range(int(T), T+horizon, aggregation)), y=x_hat, name='simple forecast (updated)', marker_color = 'purple'))
  fig.add_trace(go.Scatter(x = list(range(int(T), T+horizon, aggregation)), y=x_hat_horizon, name='simple forecast (one-shot)', marker_color = 'mediumpurple'))

  fig.update_layout(title='Data Stream', xaxis_title = 'Time', yaxis_title='Value').show(renderer='colab')

  print(f"Error Updated {sum([abs(x-y) for x, y in zip(x_hat, x_tilde)])}")
  print(f"Error One Shot {sum([abs(x-y) for x, y in zip(x_hat_horizon, x_tilde)])}")


plot_forecasts()

Error Updated 0.5379003595310561
Error One Shot 3.6245984696993503


In [78]:
#Activity 3: Impact of data aggregation across two types of models

#Aggregation is combining timesteps - here we use frequency as a proxy for aggregation.
#You can use any positive numbers (integers) up to horizon for the aggregation value - what do you observe
plot_forecasts(horizon = 100, aggregation=1)



Error Updated 6.408256259609609
Error One Shot 63.57533291034759


In [80]:
#Activity 5: Historical data vs. horizon forecasts
#change the value of T vs. horizon - what do you observe? How much data do you need if you know the model? What if you don't have the model?
plot_forecasts(T = 100, horizon = 100)

Error Updated 6.408256259609609
Error One Shot 63.57533291034759
