# Intro to Time Series Modeling

- Conceptualize using steps in time as the features for a linear model
- Recognize mathematical notation for time steps/lag
- Perform a validation spliit on times series data
- List the differences between a one-step-ahead forecast and a dynamic forecast

## Intro

- This unit will cover using time patterns as a feature for predicting behavior.
- In previous lessons the idea of Linear Regression was covered, whereby different features had coefficients denoting their influence on the target. In the same way, time points can be considered features and used to predict future values.
    - To use time series data points as features in this way, the timepoints *MUST* be spaced evenly.

**Lags**
- Lags are the previous time points in a time series model. Each step backward represents another lag.
- If the target time is 'Yt', the previous time would be 'Yt-1', followed by 'Yt-2', etc.
- *Data must be constructed differently to use a linear regression for time series data, with each column being the value for the target from a previous time lag.*
- **True time series models will do this for us, but it will be demonstrated manually here to explain the concepts.**

**Imports**

In [1]:
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn import set_config
set_config(transform_output='pandas')
plt.rcParams['figure.figsize'] = (12, 4)
sns.set_context('talk', font_scale=0.9)

**Custom Functions**

In [5]:
def regression_metrics(y_true, y_pred, label='', verbose=True, output_dict=False):
    # Get metrics
    mae = mean_absolute_error(y_true, y_pred)
    mse = mean_squared_error(y_true, y_pred)
    rmse = mean_squared_error(y_true, y_pred, squared=False) 
    r_squared = r2_score(y_true, y_pred)
    if verbose == True:
        # Print Result with Label and Header
        header = "-"*60
        print(header, f"Regression Metrics: {label}", header, sep='\n')
        print(f"- MAE = {mae:,.3f}")
        print(f"- MSE = {mse:,.3f}")
        print(f"- RMSE = {rmse:,.3f}")
        print(f"- R^2 = {r_squared:,.3f}")
    if output_dict == True:
      metrics = {'Label':label, 'MAE':mae,
                 'MSE':mse, 'RMSE':rmse, 'R^2':r_squared}
      return metrics

In [6]:
def evaluate_regression(reg, X_train, y_train, X_test, y_test, verbose = True,
                        output_frame=False):
  # Get predictions for training data
  y_train_pred = reg.predict(X_train)
 
  # Call the helper function to obtain regression metrics for training data
  results_train = regression_metrics(y_train, y_train_pred, verbose = verbose,
                                     output_dict=output_frame,
                                     label='Training Data')
  print()
  # Get predictions for test data
  y_test_pred = reg.predict(X_test)
  # Call the helper function to obtain regression metrics for test data
  results_test = regression_metrics(y_test, y_test_pred, verbose = verbose,
                                  output_dict=output_frame,
                                    label='Test Data' )
  
  # Store results in a dataframe if ouput_frame is True
  if output_frame:
    results_df = pd.DataFrame([results_train,results_test])
    # Set the label as the index 
    results_df = results_df.set_index('Label')
    # Set index.name to none to get a cleaner looking result
    results_df.index.name=None
    # Return the dataframe
    return results_df.round(3)