# Formulas

## SSE (Sum Squared Error)

$$SSE = \sum\limits_{i=1}^{n} (y_i - \hat{y_i})^2$$

$SSE$ is a measure of how much variation is left unexplained by the model, where:

- $y_i$: actual/observed value
- $\hat{y_i}$: predicted value

## SST (Total Sum of Squares)

$$SST = \sum\limits_{i=1}^{n} (\hat{y_i} - \bar{y})^2$$

$SST$ represents the total amount of variation in observed values (what we would get if we used the mean of the data as our model), where:

- $\hat{y_i}$: predicted value
- $\bar{y_i}$: mean of the actual/observed values

## $R^2$

$$R^2 = 1 - \frac{SSE}{SST}$$

$R^2$ is the proportion of variance that can be explained by the model, and can be used as an indicator of goodness of fit for simple linear regression (SLR) models.

## $R^2_a$ (Adjusted $R^2$)

$$R^2_a = 1 - \frac{SSE/(n-p-1)}{SST/(n-1)}$$

$R^2_a$ is a better indicator of goodness for multiple linear regression (MLR) models over the normal $R^2$ as it penalizes for having too many features that are not reducing $SSE$, where:

- n: number of observations
- p: number of features

## MSPE (Mean Squared Prediction Error)

$$MSPE = \frac{1}{n}\sum^n_{i=1} (y_i - \hat{y_i})^2$$

$MSPE$ quantifies the discrepancy between the predicted values and the observed value, and can help to evalute the performance of a model.

## AIC (Akaike Information Criteria)

$$AIC = 2(p+1) - 2\log(\frac{SSE}{n})$$

$AIC$ estimates the relative amount of information that is lost by a given model in effort to minimize the information that's lost.

## BIC (Bayesian Information Criteria)

$$BIC = (p+1)log(n) - 2logL(\hat{\beta})$$

$BIC$ is similar estimate to $AIC$ with slightly different parameters, where

- $logL(\hat{\beta})$ is the log likelihood function

---

# Hypothesis Tests

## t-test

- $H_0$: $\beta_j = c$
- $H_A$: $\beta_j \neq c$

$c \in \mathbb{R}$

We'll assume $c=0$ for our purposes

Used in finding if a feature has an effect on the response. In the case of a small enough p-value, we can reject the null hypothesis in favor of the alternative hypothesis which means there is statistical evidence that the associated feature (variable) has an effect on the response variable.

## Chi-squared Test

- $H_0$: the model is not useful (the model with no predictors, null model, is as good as the model with predictors - i.e. the predictors do not improve the model fit)
- $H_A$: the model is useful (the model with predictors provides a better fit)

Compares the null deviance to the residual deviance in generalized linear models. A small enough p-value suggests strong evidence against the null hypothesis that the model with no predictors (null model) is as good as the model with predictors (full model). In other words, the predictors significantly improve the model fit.

## Partial F-test

Given a full model ($\Omega$) and a reduced model ($\omega$),

- $H_0$: $\beta_j = 0$, $\forall j \notin \omega$ but in $\Omega$ (the reduced model is sufficient)
- $H_A$: $\beta_j \neq 0$ for at least one of $j \notin \omega$ but in $\Omega$ (the reduced model is not sufficient)

When there is a small enough p-value, this suggests that the reduced is not sufficient. Used in testing reduced models against full models.

## Full F-test

- $H_0$: $y_i = \beta_0 + \epsilon_i$ (i.e. $\beta_1 = \beta_2 = \dots = \beta_p = 0$)
- $H_A$: $\beta_k \neq 0$ for at least one value in $k \in \{1, \dots, p\}$

The null hypothesis essentially states that there is no useful linear relationship between the response and any of the predictors. A small enough p-value suggests strong evidence against the null hypothesis, or in other words the model is better than the most reduced model possible. Useful in multiple linear regression (MLR) where the individual t-tests suggesting evidence against the null hypothesis may result in type I errors.

---

# Models

## Generalized Linear Model (GLM)

A GLM has three components:

- Random Component
- Systematic Component
- Link Function

In our case, we'll be using $GLM$ for logistic regression.

## Multiple Linear Regression (MLR)

$y = \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p$

## Simple Linear Regression (SLR)

$y = \beta_0 + \beta_1 x_1$