# TIME SERIES REGRESSION MODELS 

## Selecting predictors 

Practices that are **not recommended** when selecting predictors 
    
    1. Plot the forecast variable against a particular predictor and drop the variable if there is no noticeable reslationship
    2. Do multiple regression and eliminate the ones with small p-values
   
Both of the mentioned above do not account for effects of predictors on each other. 
    
Instead, we will use a measure of predictive accuracy. 
    
## Predictive Accuracy Measures
There are 5 measures of predictive accuracy:
    
    1. Adjusted $R^2$
    
    2. Cross-validation
    
    3. Akaike's Information Criterion
    
    4. Corrected Akaike's Information Criterion 
    
    5. Schwarz's Bayesian Information Criterion 


### Adjusted $R^2$

$R^2$: coefficient of determination. 
Rule: The higher $R^2$, the better. 

The adjusted-$R^2$ is to help curb the following disadvantages of using $R^2$

1. $R^2$ tells us how well the regression model fits the data, _not_ how well it'll predict the future data. 
2. $R^2$ does not allow for "degree of freedom". Adding _any_ variable tends to increase the value of $R^2$ 

The adjusted-$R^2$ is calculated by: 

$$Adjusted-R^2 = 1-(1-R^2)\frac{T-1}{T-k-1}$$

where: 

$T$: the number of observations 

$k$: the number of predictors

--> Adjusted-$R^2$ tends to include too many predictor variables

### Cross-validation : Leave-one-out cross-validation

Procedures: 
1. Remove an observation $t$ (eg. $t=1$) from the dataset, and fit the model using the remaining data. 
2. Compute the error for observation $t$: $e_t^ = y_t - \hat{y_t}$
3. Repeat the above steps for each of other observations $t=2, \dots, T$
4. Compute MSE (mean-squared errors) from $e_1, \dots, e_t$

$$ CV=MSE = \frac{1}{T} \sum_{t=1}^T (e_t)^2 $$

Rule: the smaller CV, the better the model. 

### Akaike's Information Criterion (AIC)

$$ AIC = T \ln \left( \frac{SSE}{T} \right) + 2(k+2)$$ 

where: 

$T$: the number of observations 

$k$: the number of predictors 

$SSE = sum of squared errors = \sum_{t = 1}^T e_t^2$

$k+2$ refers to $k$ coefficients for the predictors, the intercept, and the variance of the residuals. 

Rule: the smaller AIC, the better the model for forecasting

### Corrected Akaike's Information Criterion 

This measure is to correct the issue of choosing too many predictors when $T$ is small. 

$$ AIC_c = AIC + \frac{2(k+2)(k+3)}{T-k-3}$$

Rule: same as AIC

### Schwarz's Bayesian Information Criterion (BIC)

$$BIC = T\ln \left( \frac{SSE}{T} \right) + (k+2)\ln(T)$$

Rule: The smaller the BIC value, the better the model. 

**NOTE:** AIC models and BIC penalize the fit of the model (SSE) with the number of parameters, but BIC does it more heavily than AICs. If the value of $T$ is large enough, they all lead to the same model. 