# Linear Regression with gradient descent from scratch

<a href="https://colab.research.google.com/drive/1U8KVjx-XrVy-O8Ryx2YfjSkdsTwtuZQw" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">
</a>

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).

`Linear regression` is a commonly used statistical technique to model the relationship between a dependent variable (also known as the response variable) and one or more independent variables (also known as predictors or features).

In linear regression, we assume to have a linear relationship between the independent and the dependent variable, and we try to find the best-fitting line that summarizes this relationship. This line can then be used to make predictions about the dependent variable based on the values of the independent variables.

![image](http://cdn-images-1.medium.com/max/640/1*eeIvlwkMNG1wSmj3FR6M2g.gif)

[Source: Linear Regression](https://primo.ai/index.php?title=Linear_Regression).

In this notebook, we will implement a `Linear Regression` model. For this, let us first create an artificial dataset using a noisy linear distribution.

In [1]:
import plotly.graph_objects as go
import numpy as np

x = np.random.randn(400)

noise = np.random.normal(1, 20, 400)*0.05

y = x + noise

fig = go.Figure(data=go.Scatter(
    x=x, y=y, mode='markers', name='Mystery Function'))
fig.update_layout(template='plotly_dark',
                  title='Some data distribution following a linear trend',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.show()

A linear equation is one of the most simple models you can have. Is basically:

$$y = w \times x + b$$

_where:_

- $y$ is the slope of the line.
- $w$ is the first (and in this case only) coefficient.
- $b$ is our bias (the intercept at $y$).

A simple method to find the "_best line fit_" is the _Ordinary Least Mean Square coast function_.

$$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} \left(h_{\theta}(x^{(i)}) - y^{(i)}\right)^2$$

_where_:

- $J(\theta)$ is the cost function.
- $\theta$ is the vector of model parameters.
- $m$ is the number of training examples.
- $x^{(i)}$ and $y^{(i)}$ are the features and labels of the $i$-th training example (respectively).
- $h_{\theta}(x^{(i)})$ is the predicted value of the label for the $i$-th example using the current model parameters $\theta$.  

In linear regression, we want to find the values of $\theta$ that minimize the cost function $J(\theta)$, which can be achieved using various optimization algorithms, such as `gradient descent`.

Let us first try fitting a line without `gradient descent` using the Ordinary Least Mean Square coast function.

In [3]:
numerator = 0
denominator = 0

for i in range(len(x)):
    numerator += (x[i] - np.mean(x)) * (y[i] - np.mean(y))
    denominator += (x[i] - np.mean(x)) ** 2

b = numerator / denominator
w = np.mean(y) - (b * np.mean(x))

model_y = []
for i in range(len(x)):
    model_y.append((w * x[i]) + b)

rmse = 0

for i in range(len(x)):
    y_pred = (w * x[i]) + b
    rmse += (y[i] - y_pred) ** 2

rmse = np.sqrt(rmse/len(x))

fig = go.Figure(data=go.Scatter(
    x=x, y=y, mode='markers', name='Mystery Function'))
fig.update_layout(template='plotly_dark',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.add_trace(go.Scatter(x=x, y=model_y,
              name=f'model = {round(w, 4)} * x + {round(b, 4)}'))
fig.show()

print(f'model = {w} * x + {b}')
print(f'Root Mean Square Error: {rmse}.')


model = 0.11381782204063073 * x + 1.0751220152559853
Root Mean Square Error: 1.6592256759639759.


Not so good, right? Let us do better and implement a simple gradient descent algorithm. 🙃

Bellow, we implement:

- a `prediction` function,.
- a function to calculate the loss (`mse` - mean square error).
- the `gradient_descent` function.
- And a `fit` function that ties all together.

For every epoch, we will make a prediction using our current model. Then we will calculate the direction of the gradient based on the cost of our loss function. Later we will change the values of our $w$ and $b$ using the output of the `gradient_descent` function, parametirized by our `learning_rate`. We will repeat this process 50,000 times.

In [5]:
learning_rate = 0.001
epochs = 50_000
cost_list = []
weigths = []
bias = []
grads = []

def prediction(x, w, b):
    """
        Predicts the output based on the input feature x, weights w and bias b.
        Parameters:
        -----------
        x: numpy array
            Input feature vector

        w: numpy array
            Weight vector

        b: float
            Bias value
        Returns:
        --------
        y: float
            Predicted value based on the input feature x, weights w and bias b.
    """
    return np.dot(x, w) + b


def mse(y, y_pred):
    """
    Calculate the mean squared error between the actual and predicted values.

    Args:
        actual (list or numpy array): List or numpy array of actual values.
        predicted (list or numpy array): List or numpy array of predicted values.

    Returns:
        float: The mean squared error between the actual and predicted values.
    """
    actual = np.array(y)
    predicted = np.array(y_pred)
    differences = np.subtract(actual, predicted)
    squared_differences = np.square(differences)
    return squared_differences.mean()


def gradient_descent(x, y, y_pred):
    """
    This function calculates the gradient descent of a linear regression
    model using the mean squared error (MSE) as the cost function. It takes
    in the input features (x), target values (y), and predicted target
    values (y_pred) as arguments.

    Parameters:
        x (numpy.ndarray): Input features of shape (n_samples, n_features).
        y (numpy.ndarray): Target values of shape (n_samples, 1).
        y_pred (numpy.ndarray): Predicted target values of shape (n_samples, 1).

    Returns:
        dw (numpy.ndarray): Gradient of the weights with respect to the cost function.
        db (float): Gradient of the bias with respect to the cost function.
    """
    error = y_pred - y
    cost = mse(y, y_pred)
    cost_list.append(cost)
    dw = (1 / len(x)) * np.dot(x.T, error)
    db = (1 / len(x)) * np.sum(error)
    return dw, db


def fit(x, y, w, b):
    """
    Train a linear regression model on the given data using gradient descent.

    Args:
        x (numpy.ndarray): Input features of shape (m, n).
        y (numpy.ndarray): Output values of shape (m,).
        w (numpy.ndarray): Initial weights of shape (n,).
        b (float): Initial bias.
        epochs (int): Number of iterations to run gradient descent for.
        learning_rate (float): Step size for gradient descent.

    Returns:
        w1 (numpy.ndarray): Learned weights of shape (n,).
        b1 (float): Learned bias.
    """
    w1 = w
    b1 = b
    for i in range(epochs):
        y_pred = prediction(x, w1, b1)
        dw, db, = gradient_descent(x, y, y_pred)
        grads.append((dw, db))
        w1 -= learning_rate * dw
        b1 -= learning_rate * db
        weigths.append(w1)
        bias.append(b1)
    return w1, b1

w0, b0 = fit(x, y, w, b)

And it is done. We found the best fit for our data distribution. Since we recorded all gradient updates, loss scores, and the evolution of our parameters ($w$ and $b$), we can plot them against time.

In [6]:
import pandas as pd

grads_df = pd.DataFrame(grads)
model_LR = np.dot(x, w0) + b0

rmse = 0
for i in range(len(x)):
    y_pred = (w0 * x[i]) + b0
    rmse += (y[i] - y_pred) ** 2

rmse = np.sqrt(rmse/len(x))
print(f'\nLR Model RMSE: {rmse}.')

fig = go.Figure(data=go.Scatter(
    x=list(range(1, epochs)), y=cost_list, name='Cost/Loss'))
fig.update_layout(template='plotly_dark',
                  title=f'LR Model - Cost/Loss progression ({epochs} epochs)',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.show()



LR Model RMSE: 1.0079756681105123.


In [7]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(range(1, epochs)),
              y=grads_df[0].abs(), name=f'weigth_gradient'))
fig.add_trace(go.Scatter(x=list(range(1, epochs)),
              y=grads_df[1].abs(), name=f'bias_gradient'))
fig.update_layout(template='plotly_dark',
                  title=f'LR model - Evolution ({epochs} epochs) of the Gradient for Weight and Bias',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.show()


In [8]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(range(1, epochs)), y=weigths, name=f'weigth'))
fig.add_trace(go.Scatter(x=list(range(1, epochs)), y=bias, name=f'bias'))
fig.update_layout(template='plotly_dark',
                  title=f'LR model - Evolution ({epochs} epochs) of the Weight and Bias',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.show()

In [9]:
fig = go.Figure(data=go.Scatter(
    x=x, y=y, mode='markers', name='Mystery Function'))
fig.update_layout(template='plotly_dark',
                  title='Linear Regression Model via Gradient Descent',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.add_trace(go.Scatter(x=x, y=model_LR,
              name=f'model = {round(w0, 4)} * x + {round(b0, 4)}'))
fig.show()

Now that its a good fitting line! 🙃


---

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).
