# Linear Regression

Just like naive Bayesian models are great for early classification baselines, linear regression models are great for early regression baselines. Unlike classification tasks where the goal is to determine the class label for a sample, in regression analysis we are concerned more with predicting some continuous value given a sample.

### How does it work

The classical form for a linear regression is often taught in introductory statistics classes as a straight line that fits to the data provided. $$y = mx + b \ \ or$$ $$y = w_0 + w_1x$$ where $b$ represents the y-axis intercept and $m$ is the weight coefficient. This form helps to explain the relationship between $x$ (explanatory variable) and $y$ (target variable) such that new $y$ values can be predicted given an arbitrary $x$ following training.

In short the use of a linear regression, as described above, is to fit a straight line to a dataset. <img src="./assets/linear-regression.png" alt="fitting a line to a dataset" style="width: 75%;"/>

Estimators that accept only one $x$ (explanatory variable) are called simple linear regressors, whereas generalized versions of the linear regression model that accept more than one $x$ are called multiple linear regressors. Such a model can be expressed as

$$ y = w_0x_0 + w_1x_1 + ... + w_mx_m \ ,$$
which can be shortened to a sumation of products
$$ \sum^m_{i=0} w_ix_i \ ,$$
and finally described in linear terms as
$$ w^Tx \ .$$

Now would be a good time to mention that the 'linear' in linear regression doesn't necessarily mean that only for straight 2D lines. In fact we can use the LinearRegression estimator model to fit to multidimensional linear models of the form $y = a_0 + a_1x_1 + a_2x_2 + ...$. This can be visualized as fitting a plane to a point in 3 dimensions and in dimensions higher a hyper plane.

In [3]:
import numpy as np
from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)
# Here we create a scatter of points in 3 dimensions
# 0.5 as the y intercept and coefficients of 1.5, -2, and 1. 
rng = np.random.RandomState(1)
X = 10 * rng.rand(100, 3)
y = 0.5 + np.dot(X, [1.5, -2., 1.])

model.fit(X, y)
print(model.intercept_)
print(model.coef_)

0.5000000000000144
[ 1.5 -2.   1. ]


The LinearRegression model was able to determine the y intercept and coefficients of this 3 dimensional dataset.

If up to this point it feels like the LinearRegression model is still limited in capability thats because we have been strictly observing linear relationships. There is a trick to adapt this algorithm to non linear relationships between variables by using a **basis function** to transform the data.