# Linear Regression

First I'm going to implement linear regression in a for loop using the basic mathematical principals behind this linear model.

First we take the weighted sum of the feature vector + 1 for the bias or intercept term. z = X*W + b

Next we use the resultant prediction vector (z) and compare it to the actual labels in the mean square error formula. - MSE is conventional for linear regression, its not perfect but its fine for linearly separable data.

Next we update the weights and bias term using gradient descent to determine the number by which to multiply the W / B term and mitigating it by the learning rate to prevent steps being too large or too small -> won't converge in this case.



In [None]:

some_data = whatever
X = some_data[:, :-1]
y = some_data[:, -1]

initialise W randomly
initialise b = 1
learning_rate = 0.01
epochs = 1000

for epoch in range(epochs):
    # Compute predictions
    z = X * W + b

    # Compute loss
    print(f"MSE at epoch {epoch} = {sum((z - y) ** 2) / len(y)}")

    # Compute gradients
    dW = -(2/len(y)) * sum(X.T * (y - z))
    db = -(2/len(y)) * sum(y - z)

    # Update weights
    W = W - learning_rate * dW
    b = b - learning_rate * db


# Pseudo-Inverse

Technically we don't have to implement regression like this. ***IF***  we are working with a small - medium size dataset, we can compute the optimal weights using the, 'normal equation,' for linear regression and the, 'pseudo-inverse.'



---

What we're actually doing above is calculating the weighted sum for each feature vector in the data set individually. We can calculate it for the entire dataset by computing X.T by X giving us a square matrix representing feature correlations and inverting it.

We can then multiply that by X.T.dot(y) which essentially maps the target vector y, onto the feature space yielding the optimal weight vector.

Note : this method only works if X.T.dot(X) can be inverted.

```
np.linalg.inv((X.T.dot(X))).dot(X.T).dot(y)
```
This method will work regardless although it should be noted that neither method is suitable for large data sets as computing the pseudo-inverse is quite expensivie O(n^3)

```
np.linalg.pinv(X).dot(y)
```




In [None]:
def psuedo_inverse(X, y):
  X = np.concatenate((np.ones((X.shape[0],1)), X.reshape(X.shape[0], X.shape[1])), axis=1)
  return np.linalg.pinv(X).dot(y)

# Same regression as above assuming the one above converges and assuming the data is linearly separable
weights = pseudo_inverse(X, y)
print(weights)