# Gradient Descent

[Mathematics Behind Gradient Descent](https://medium.com/geekculture/mathematics-behind-gradient-descent-f2a49a0b714f)

[Machine Learning: Gradient Descent and Cost Function](https://www.youtube.com/watch?v=vsWrXfO3wWw)

## Setting Up Necessary Things

In [1]:
# Ignore All Warnings
import warnings
warnings.filterwarnings("ignore")

## Necessary Imports

In [2]:
# For Numerical
import numpy as np

## Equation and Data Generating

In [3]:
# For Regenerative Purpose
np.random.seed(42)

# Random x values
x = np.random.randint(1, 100, 25)

`Equation:` $y = 2 * x + 3$

In [4]:
# Equation and y values
y = 2 * x + 3

In [5]:
# See x and y values
print("x: ", list(x))
print("y: ", list(y))

x:  [52, 93, 15, 72, 61, 21, 83, 87, 75, 75, 88, 24, 3, 22, 53, 2, 88, 30, 38, 2, 64, 60, 21, 33, 76]
y:  [107, 189, 33, 147, 125, 45, 169, 177, 153, 153, 179, 51, 9, 47, 109, 7, 179, 63, 79, 7, 131, 123, 45, 69, 155]


## Cost Function

Mean Squared Error (MSE):
$$ MSE = \frac{1}{n} \sum_{i = 1}^{n} (Y_i - \hat{Y}_i)^2 $$
$$ MSE = \frac{1}{n} \sum_{i = 1}^{n} (Y_i - (mx_i + b))^2 $$

Partial Derivatives of MSE:
$$ \frac{\partial}{\partial m} = \frac{2}{n} \sum_{i = 1}^{n} - x_i (y_i - (mx_i + b)) $$
$$ \frac{\partial}{\partial b} = \frac{2}{n} \sum_{i = 1}^{n} -(y_i - (mx_i + b)) $$

## Learning Rate
$ \alpha = 0.01 $

## Update Parameters
$$ m = m - \alpha *  \frac{\partial}{\partial m} $$

$$ b = b - \alpha * \frac{\partial}{\partial b} $$

Here, $ \alpha $ = learning rate


## Gradient Descent

In [6]:
# Gradient Descent Method
def gradient_descent(x, y):
    m, b = 0, 0
    iterations = 1000
    n = len(x)
    learning_rate = 0.0001
    
    for i in range(iterations):
        y_prediction = m * x + b
        cost = (1/n) * sum([value ** 2 for value in (y - y_prediction)])
        m_d = - (2/n) * sum(x * (y - y_prediction))
        b_d = - (2/n) * sum(y - y_prediction)
        m = m - learning_rate * m_d
        b = b - learning_rate * b_d
        print(f"[{i + 1}] m: {m} | b: {b} | Cost: {cost}")
    
    return m, b

In [7]:
# Run Gradient Descent
m, b = gradient_descent(x, y)

[1] m: 1.3633440000000001 | b: 0.020408000000000003 | Cost: 13939.56
[2] m: 1.817386286464 | b: 0.027309359424 | Cost: 1548.159906299496
[3] m: 1.968597560573486 | b: 0.029712503770975743 | Cost: 173.78528500925574
[4] m: 2.018954936984769 | b: 0.030617571030301743 | Cost: 21.348272130278744
[5] m: 2.035724245300849 | b: 0.031023717820198533 | Cost: 4.44071979546257
[6] m: 2.041307488045027 | b: 0.031263700151174884 | Cost: 2.5652235772190806
[7] m: 2.043165358410497 | b: 0.0314483380495467 | Cost: 2.35698636656676
[8] m: 2.0437825424366003 | b: 0.03161453867223923 | Cost: 2.333670825095621
[9] m: 2.0439865322281863 | b: 0.03177459346421269 | Cost: 2.3308656290155794
[10] m: 2.0440529131802463 | b: 0.031932595930331896 | Cost: 2.3303353317159408
[11] m: 2.0440734653949533 | b: 0.03208990935900867 | Cost: 2.3300573747391424
[12] m: 2.0440787550318604 | b: 0.03224698777586525 | Cost: 2.3298074289068786
[13] m: 2.0440789617496034 | b: 0.032403982388474535 | Cost: 2.3295606130670548
[14] m

In [8]:
# Check Examples
x_val = 30
result = x_val * m + b
print(f"result of {x_val}: {result}")

result of 30: 61.43848883917572
