# Linear Regression Model from Scratch (No-Scikit Learn)

In this notebook, we are going to make our own Linear Regression model from scratch. Scikit learn will only be used to import the datasets for testing purposes.
<br>
<br>
Definition of linear regression: find the best fit line through a bunch of points.<br>
Goal: Find the best line (y = mx + b) that fits through your data points by minimizing prediction errors.


### *** Coding Checklist ***

□ Step 1: Load your data (x and y values) <br>
□ Step 2: Initialize random weights (m and b) <br>
□ Step 3: Create prediction function (y = mx + b) <br>
□ Step 4: Create cost function (MSE calculation) <br>
□ Step 5: Create gradient calculation <br>
□ Step 6: Create weight update function <br>
□ Step 7: Training loop (repeat steps 3-6) <br>
□ Step 8: Plotting to visualize results <br>
□ Step 9: Make predictions on new data <br>

***

### Step 1: Data Setup and Initialization

Import necessary tools and datasets from Scikit learn for testing purposes.

In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes

#### Get to know the dataset "diabetes": What will be X? What will be Y? How many features are there?

In [31]:
diabetes = load_diabetes()

In [32]:
diabetes

{'data': array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
          0.01990749, -0.01764613],
        [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
         -0.06833155, -0.09220405],
        [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
          0.00286131, -0.02593034],
        ...,
        [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
         -0.04688253,  0.01549073],
        [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
          0.04452873, -0.02593034],
        [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
         -0.00422151,  0.00306441]]),
 'target': array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
         69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
         68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
         87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
        259.,  53., 190., 142.,  75., 142., 155., 225.,  59

In [25]:
print(diabetes.DESCR)

.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

:Number of Instances: 442

:Number of Attributes: First 10 columns are numeric predictive values

:Target: Column 11 is a quantitative measure of disease progression one year after baseline

:Attribute Information:
    - age     age in years
    - sex
    - bmi     body mass index
    - bp      average blood pressure
    - s1      tc, total serum cholesterol
    - s2      ldl, low-density lipoproteins
    - s3      hdl, high-density lipoproteins
    - s4      tch, total cholesterol / HDL
    - s5      ltg, possibly log of serum triglycerides level
    - s6      glu, blood sugar level

Note: Each of these 10 feature variables have bee

In [30]:
print(f"Amount of features:", len(diabetes.feature_names))
diabetes.feature_names

Amount of features: 10


['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

In [34]:
X = diabetes.data
Y = diabetes.target

In [35]:
X.shape, Y.shape

((442, 10), (442,))

#### Create Linear Regression Class