## Cost Function - 
##### A Cost Function is used to measure just how wrong the model is in finding a relation between the input and output.
##### It tells us how badly our model is behaving/predicting.

## Cost Function for Linear Regression
##### A Linear Regression model uses a straight line to fit the model. This is done using the equation for a straight line.

$Y = mX + b$

##### where Y is the dependent variable,  
##### X is the independent variable, 
##### m is the slope of the line and 
##### b is the y-intercept, the point where the line crosses the y-axis when x is 0.


##### For our Linear Regression model, the cost function will be the minimum of the Root Mean Squared Error of our model, obtained by subtracting the predicted values from actual values. Our cost function will be the minimum of these error values.

$Cost Function = \frac{1}{n}\sum_{i=1}^n(y - (mx + b))^2$

## Gradient Descent
##### Gradient Descent is an algorithm which is used to optimize the cost function or the error of our model. It is used to find the minimum value of error possible in our model.
##### Gradient Descent can be visualized as a ball rolling down a hill. In this case, the ball will roll to the lowest point in the hill. This point can be taken as the point where error is least.

![image.png](attachment:7bbb4de4-bc56-4d11-ac2a-0f25d61b7938.png) ![image.png](attachment:94f5b5e9-3373-4dea-98d9-2dd29ec42344.png)

##### In Gradient Descent, we find the error in our model for different values of input variables. This is repeated and soon see that the error values keep getting smaller and smaller. Soon we'll arrive at the values for variables when the error is the least and cost function is optimized.
![image.png](attachment:ec2f148d-868a-4010-8a32-789e2a321e01.png)

In [1]:
import numpy as np
import math

In [2]:
# Gradient 

In [3]:
def gradient_descent_(X, y):
    m_curr = b_curr = 0
    iterations = 1000
    n = len(X)
    learning_rate = 0.08
    cost_old = float('inf')  # Initialize cost_old to a large value
    epsilon = 1e-20  # Small value for early stopping
    X = np.array(X)
    y = np.array(y)

    for i in range(iterations):
        y_predicted = np.dot(X, m_curr) + b_curr
        cost_new = np.mean((y - y_predicted)**2)
        md = -2 * np.mean(X * (y - y_predicted))
        bd = -2 * np.mean(y - y_predicted)

        m_curr = m_curr - learning_rate * md
        b_curr = b_curr - learning_rate * bd

        if np.abs(cost_old - cost_new) < epsilon:
            break
        cost_old = cost_new
    print("m: {}, b: {}, cost: {}, iteration: {}".format(m_curr, b_curr, cost_new, i+1))

In [4]:
# Example usage:
X = np.array([1, 2, 3, 4, 5])
y = np.array([5, 7, 9, 11, 13])
gradient_descent_(X, y)

m: 2.0000000002599894, b: 2.9999999990613553, cost: 1.694077086867124e-19, iteration: 788


In [5]:
def gradient_descent_multi(X, y):
    # Assuming X is a matrix with each row representing a data point and each column representing a feature
    m_curr = np.zeros(X.shape[1])  # Initialize coefficients to zeros
    b_curr = 0  # Initialize y-intercept
    iterations = 1000
    n = len(y)
    learning_rate = 0.001
    cost_old = float('inf')
    epsilon = 1e-20

    for i in range(iterations):
        y_predicted = np.dot(X, m_curr) + b_curr
        cost_new = np.mean((y - y_predicted)**2)
        gradients = -2/n * np.dot(X.T, (y - y_predicted))

        m_curr = m_curr - learning_rate * gradients  # Update coefficients
        b_curr = b_curr - learning_rate * np.mean(y - y_predicted)  # Update y-intercept

        if np.abs(cost_old - cost_new) < epsilon:
            break

        cost_old = cost_new

    print("Coefficients {}, y-intercept {}, cost {}, iteration {}".format(m_curr, b_curr, cost_new, i+1))

# Example usage for multiple linear regression:
X_multi = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y_multi = np.array([5, 7, 9, 11])
gradient_descent_multi(X_multi, y_multi)

Coefficients [0.98009518 1.60527067], y-intercept -0.3125877452702304, cost 0.4878048780487819, iteration 506


In [6]:
np.zeros(X_multi.shape[1])

array([0., 0.])

In [7]:
X_multi.shape

(4, 2)