## Linear Regression

$$
f(x) = \beta_0 + \sum^p_{J=1}x_j \beta_j 
$$
We want to Minimize the residual sum of squares. To do so, first we need to compute it.
$$
\begin{aligned}
S(\beta) &= \sum^N_{i = 1}(y_i - f(x_i))^2 \\
        &= \sum^N_{i = 1} \big( y_i - \beta_0 - \sum^p_{j=1}x_{ij}\beta_j \big)^2
\end{aligned}
$$
* We have to add one more column with value 1 constant to multiply with the $\beta_0$, by doing so, the formula become:
$$
\begin{aligned}
S(\beta) &= \sum^N_{i=1} \big(y_i - \sum^p_{\color{red}{j=0}}x_{ij}\beta_j \big) ^2 \\
&= (y - X\beta)^T (y - X\beta) \\
&= y^Ty - y^T X\beta - \beta^T X^T y + \beta^T X^T X \beta
\end{aligned}
$$
* Minimizing and getting the estimated $\beta$ formula
$$
\begin{aligned}
\frac{\partial S}{\partial \beta} &= \frac{\partial (y^Ty - y^T X\beta - \beta^T X^T y + \beta^T X^T X \beta)}{\partial \beta} = 0 \\
&= -2X^T y + 2X^T X \beta = 0 \\
&= X^T X \beta = X^T y \\
\hat{\beta} &= (X^T X)^{-1} X^T y
\end{aligned}
$$
* To predic the target value to a new vector x:
$$
\hat{y} = \hat{f}(x) = (1:x)^T \hat{\beta}
$$

In [23]:
import numpy as np

In [169]:
X = np.array([[-1.75,  1.15,  0.98,  0.22, -0.19, -0.46, -0.58,  0.67, -0.53,-0.44],
                 [3.34, 2.75, 3.51, 1.93, 3.26, 3.44, 3.82, 2.9 , 4.03, 1.88]]).T
X

array([[-1.75,  3.34],
       [ 1.15,  2.75],
       [ 0.98,  3.51],
       [ 0.22,  1.93],
       [-0.19,  3.26],
       [-0.46,  3.44],
       [-0.58,  3.82],
       [ 0.67,  2.9 ],
       [-0.53,  4.03],
       [-0.44,  1.88]])

In [170]:
y = np.array([-19.56,  10.54,   5.53,   0.5 ,  -5.24,  -7.54,  -9.71,   5.26, -10.69,  -5.1])
y

array([-19.56,  10.54,   5.53,   0.5 ,  -5.24,  -7.54,  -9.71,   5.26,
       -10.69,  -5.1 ])

In [171]:
class LinearRegression():
    def __init__(self):
        self.beta = None
    
    def fit(self, X, y):
        X = np.array(X)
        y = np.array(y)
        X = np.insert(X, 0, np.ones(10), axis = 1)
        XTX = X.T @ X #can use this operator
        inverse = np.linalg.inv(XTX)
        self.beta = np.dot(np.dot(inverse, X.T), y) # or np.dot, which may be more confusing with more operations
        return self.beta
    
    def predict(self, X):
        return self.beta[0] + np.dot(X,self.beta[1:])

In [172]:
lr = LinearRegression()

In [173]:
lr.fit(X, y) #same results of beta as the professor

array([ 2.90978614,  9.91248941, -1.81105788])

In [174]:
lr.predict([-1.75, 3.34])

-20.48600365967664

In [175]:
def mse(y, y_pred):
    return np.mean((y - y_pred)**2) 

In [176]:
y[0]

-19.56

In [177]:
mse(y[0], lr.predict([-1.75, 3.34]))

0.857482777734534

### Correlated columns

In [178]:
X2 = np.ones((10,2))

In [179]:
X2[:,1] = X2[:,1] * 2

In [180]:
X2

array([[1., 2.],
       [1., 2.],
       [1., 2.],
       [1., 2.],
       [1., 2.],
       [1., 2.],
       [1., 2.],
       [1., 2.],
       [1., 2.],
       [1., 2.]])

In [181]:
y = np.random.random(size=10)
y

array([0.98750119, 0.07096794, 0.50466576, 0.9940942 , 0.39816767,
       0.17969805, 0.45522842, 0.91509241, 0.9932778 , 0.32192881])

In [187]:
lr.fit(X2, y)

LinAlgError: Singular matrix

* Erros due to correlated columns