# Linear Regression

This notebook covers the following topics:

- Introduction to Linear Regression
- Simple Linear Regression
- Generating Linear Data with Python
- Representing Data as Vectors
- Estimating the Least Squares Coefficients in the SLR Case
- Bias & Variance and Consistent Estimators
- Regression Model Accuracy Metrics
- Ordinary Least Squares Regression in the SLR Case
- OLS Assumptions
- Singular Value Decomposition & The Pseudoinverse
- Gradient Descent: An Iterative Approach


# Introduction to Linear Regression

Linear regression is a statistical method that is used to model the relationship between explanatory and response variables by fitting a linear equation to observed data.

Mathematically, a Linear Equation is an equation of the form:

$$y = \beta_0 + \beta_1x_1 + \beta_2x_2+\ldots+\varepsilon \tag{1}$$

Where $y$ is the response variable, whose value is determined by the explanatory variables $x$ and their scalar $\beta$ coefficients, along with an additional $\varepsilon$ error term, which is responsible for the variablility in $y$ that cannot be explained by a linear relationship with the explanatory variables. 

In a typical regression problem, we have many instances of observed $(x_1, x_2, \ldots, x_n, y)$ points, but the true $\beta$ coefficients are unknown. Hence, we cannot use Equation (1) to compute the exact value of the response variable $y$. However, if we obtain good approximate $\hat\beta$ coefficients, then we can also obtain a good approximation of the true value of the response variable, $\hat y$, for any given set of obsevations $(x_1, x_2, \ldots, x_n)$ using the approximated $\hat\beta$ coefficients:

$$\hat y = \hat\beta_0 + \hat\beta_1x_1 + \hat\beta_2x_2+\ldots+\varepsilon \tag{2}$$

The goal of Linear Regression is to estimate the $\hat\beta$ coefficients that minimize the difference between the observed values of the response variable, $y$, and the values predicted by our approximate linear equation model, $\hat y$. There are a number of different approaches that we can choose to calculate $\hat\beta$. 

In this notebook, we will discuss the conceptual details for various approaches for computing $\hat\beta$ for simple linear regression problems. These solutions generally fall into either of the two following categories:
- Closed-form, exact solutions
- Iterative, numerical solutions

Once a solution is obtained, the performance of a model can be evaluated by metrics like the **Root Sum of Squares** ($\text{RSS}$) and **R-Squared** ($\mathrm{R^2}$) values.