# Linear Regression
Linear regression is a statistical method that models a relationship between s dependent variables and one or more independant variables

**One independant variable**

`y = mx + c` where `y` can be house prices and `x` can be house size

**Multiple independant variables**

`y = w1x1 + w2x2 .... + w6x6 + b` where `y` is the house prices, `x1, x2..` are features like house size, number of bedrooms and distance to city center, and `w1,w2...` are weights which determine the influence of each feature and `b` is the bias, which adjusts the baseline prediction (shifts up or down without changing slope)

## Cost Function (MSE - cost function)
Minimise MSE to get the best weight and bias

$$
MSE = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2
$$

Where,

$ N $ is the number of data points,

$ y_i $ is the actual value we are trying to predict,i.e, the target variable,

$ hat{y}_i $ is the predicted value from the regression equation

MSE measures how far the predictions are from actual values. A smaller MSE means a better fit.

### Side Note
New terms I learnt were loss function, cost function, and objective function
* **Loss function** is defined on data point, prediction and label and measures the penalty (Squared loss, hinge loss, 0/1 loss)

* **Cost function** is a bit more general, it is a sum of loss functions over your training set plus some model complexity penalty (regularization) (MSE, SVM cost function)

* **Objective function** is the most general term for any function that is optimized during training


## Gradient Descent Optimation (MSE reductions)
This is an iterative algo that updates weight to minimize loss

$$
w_j = w_j - {\alpha} \frac{\partial MSE}{\partial w_j}
$$

and for bias,

$$
b = b - {\alpha} \frac{\partial MSE}{\partial b}
$$

where $ {\alpha} $ is the learning rate, and the partial function is the gradient of the cost function.

Gradient descent moves downhill in the cost function to find the optimal weight and bias (local minimum)

## Calculating the Gradient partial

For biases,

$$
\frac{\partial MSE}{\partial b} = \frac{-2}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)
$$

For weights,

$$
\frac{\partial MSE}{\partial w_j} = \frac{-2}{N} \sum_{i=1}^{N} x_i (y_i - \hat{y}_i)
$$

Here $\hat{y}_i$ is equal to ${x_i}{W} + {b}$


## Implementation

Possible functions include, cost_function(), compute_gradients() (the partial derivative), grad_desc() and a predict() for fitting

In [None]:
import numpy as np
from sklearn import datasets

In [None]:
#Generate random data
X, y = datasets.make_regression(n_samples=300, n_features=1, noise=20, random_state=2)

: 

In [None]:
class LinearRegression:
    def __init_(self, learning_rate=0.01, epochs=100, batch_size=10):
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.batch_size = batch_size
    def cost_function(self,y, y_pred):
        return np.mean((y - y_pred) **2) # for a single data point, will be used in a loop to iterate over all points
    
    def compute_grad(self, X, y, y_pred):
        error = y - y_pred

        db = -2 * np.mean(error) # np.mean handles over N samples, can substitute with len(X)
        dW = -2 * np.mean(X.T.dot(error)) # Transpose X to fit vector of error

        return db, dW
        
    def grad_desc(self, epochs, batch_size):
    def predict(self):


IndentationError: expected an indented block after function definition on line 6 (1158231363.py, line 7)