# In this notebook you will implement [linear regression](https://towardsdatascience.com/how-does-linear-regression-actually-work-3297021970dd)!
1. Fill the function to compute MSE cost using the formula below
2. Implement `predict` function
3. Implement gradient descend based on cost and predict functions
4. Implement alternative solver - normal equation, using the prediction function and formula below
5. Put it all together in `fit` method
6. Test your model and compare it to the one from sklearn

### How to predict
$pred = X W$, where:  
$X$ - input features vector  
$W$ - weights  
$pred$ - predicted value

### MSE cost formula
$cost = \frac{1}{n}\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^{2}$, where:  
$n$ - number of examples  
$Y_i$ - correct value of example $i$  
$\hat{Y}_i$ - predicted value of example $i$

### [Gradient descent](https://builtin.com/data-science/gradient-descent)
$grad = \frac{2}{n}\sum_{i=1}^{n} ((\hat{Y}_i - Y_i) X_i)$  
$W = W - \alpha * grad$, where:  
$grad$ - calculated gradient  
$W$ - weights  
$n$ - number of training examples  
$Y_i$ - correct value of example $i$  
$\hat{Y}_i$ - predicted value of example $i$  
$X_i$ - feature vector of example $i$

### [Normal equation formula](https://medium.com/swlh/understanding-mathematics-behind-normal-equation-in-linear-regression-aa20dc5a0961)
$W = (X^T X)^{-1} (X^T Y)$, where:  
$W$ - weights  
$X$ - input features  
$Y$ - correct values

### Useful [pytorch](https://pytorch.org/get-started) functions
`torch.mm(X, Y)` - matrix multiplication $X Y$  
`torch.sum(X)` - sum of all elements over $X$  
`torch.abs(x)` - absolute value of x, i. e. $|x|$  
`torch.det(X)` - determinant of $X$  
`torch.inverse(X)` - inverse of $X$, i. e. $X^{-1}$  
`torch.zeros(size)` - tensor filled with 0-s of shape `size`  
`torch.ones(size)` - tensor filled with 1-s of shape `size`  
`torch.rand(size)` - tensor of shape `size` filled with random numbers  
`torch.cat(tensors)` - concatinates the sequence of `tensors`  
`X.t()` - transpose of $X$ matrix  
`X.size()` - shape of $X$ tensor

In [None]:
# Import needed packages
import torch
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression

In [None]:
class myLinearRegression:
    def __init__(self,
            solver='gr_d',  # Optimization method. Gradient descent by default, otherwise normal equation
            normal_init=True,  # If True - weights will be initialized with random numbers, otherwise with zeros
            fit_intercept=True,  # If True - bias will be added
            learning_rate=1e-3,  # Step of the gradient descent
            tolerance=1e-3,  # If cost change is less than tolerance - we're done with learning
            n_steps=10000):  # Maximum number of gradient descent steps
        self.solver = solver
        self.fit_intercept = fit_intercept
        self.learning_rate = learning_rate
        self.n_steps = n_steps
        self.normal_init = normal_init
        self.tolerance = tolerance
        self.W = None  # Initialize weights as None
    
    def cost(self, X, Y):
        """
        Computes MSE cost. General formula is provided in the cell above
        X - predicted values
        Y - correct values
        """
        ############### Calculate the MSE score
        
        
        cost = ...
        ###############  ~ 2-4 lines
        return cost
    
    def predict(self, X):
        """
        Computes the prediction on X
        """
        # X must be a Tensor
        if type(X) != torch.Tensor:
            X = torch.from_numpy(X).float()
            
        if self.fit_intercept:
            ############### Transform X into a matrix where first collumn is full of ones, and the second collumn is X
            X = ...
            ###############  ~ 1 line
            
        ############### Calculate the prediction
        prediction = ...
        ###############  ~ 1 line
        return prediction
    
    def intercept(self):
        """
        Return model's intecept (bias)
        """
        if self.fit_intercept:
            return ...  # return the intercept
        else:
            return None
  
    def gradient_descent(self, X, Y):
        """
        Gradient descent solver
        X - predicted values
        Y - correct values
        """
        ############### Set m be equal to the number of training examples
        m = ...
        ###############  ~ 1 line

        for i in range(self.n_steps):
            ############### Predict values, calculate cost, find the gradient and then use it to update weights
            
            
            self.weights -= ...
            
            change = ...  # Absolute difference between old cost and new one
            if change < self.tolerance:  # 
                break
            ###############  ~ 6-9 lines

    def normal_equation(self, X, Y):
        """
        Normal equation solver
        Computes weights using the normal equation formula (cell above)
        Keep in mind that the determinant shouldn't be zero
        """
        ############### Calculate the weigts
        
        
        self.W = ...
        ###############  ~ 1-5 lines


    def fit(self, X, Y):
        """
        Main method
        Initialize parameters here
        Then use one of the solvers you've implemented before
        """
        # Convert X, Y to Tensors
        X = torch.from_numpy(X).float()
        Y = torch.from_numpy(Y).float().view(-1, 1)
        
        if self.normal_init:
            ############### Initialize weights with random values
            self.W = ...
            ###############  ~ 1 line
        else:
            ############### Initialize weights with zeros
            self.W = ...
            ###############  ~ 1 line

        if self.solver == 'gr_d':
            ############### Call gradient descent solver
            pass
            ###############  ~ 1 line
        else:
            ############### Call normal equation solver
            pass
            ###############  ~ 1 line
    

## Let's test your model
If everything done right:  
- You won't get any errors  
- MSE score of your model will be similar to the one from sklearn (about 1653)

In [None]:
X, Y = make_regression(n_samples=10000, n_features=5,
                           n_targets=1, bias=2.5, noise=40, random_state=42)
print(f'X shape: {X.shape}')
print(f'y shape: {Y.shape}')

grad_model = myLinearRegression(solver='gr_d')
grad_model.fit(X, Y)
grad_preds = grad_model.predict(X)
print(f'Grad descent model MSE: {mean_squared_error(Y, grad_preds)}')
print(f'Intercept: {grad_model.intercept}')


# sklearn model
sklearn_model = LinearRegression()
sklearn_model.fit(X, Y)
sklearn_preds = sklearn_model.predict(X)
print(f'Sklearn MSE: {mean_squared_error(Y, sklearn_preds)}')
print(f'Intercept: {sklearn_model.intercept_} \nCoefs: {sklearn_model.coef_}')