
# ðŸ“‰ Notebook 3 â€“ Cost Function and Gradient Descent (Linear Regression)

In this notebook, you will:

- Create a simple dataset  
- Define a **linear model**  
- Implement a **cost function**  
- Implement **gradient descent**  
- See how parameters are updated to fit the data  

This is a core idea in machine learning.



## 1. Simple Dataset

We start with a small synthetic dataset that roughly follows a linear pattern.


In [None]:

import numpy as np
import matplotlib.pyplot as plt

# Simple dataset: x (input), y (target)
x = np.array([1, 2, 3, 4, 5], dtype=float)
y = np.array([2, 4, 6, 8, 10], dtype=float)  # approximately y = 2x

print("x:", x)
print("y:", y)

plt.figure()
plt.scatter(x, y)
plt.xlabel("x")
plt.ylabel("y")
plt.title("Simple Dataset")
plt.grid(True)
plt.show()



## 2. Linear Model

We use a simple linear model:

\[ \hat{y} = wx + b \]

Where:
- \( w \) is the weight (slope)  
- \( b \) is the bias (intercept)  
- \( \hat{y} \) is the prediction  


In [None]:

def predict(x, w, b):
    return w * x + b

# Example with w = 0, b = 0
w_test = 0.0
b_test = 0.0
y_pred_example = predict(x, w_test, b_test)
print("Predictions with w=0, b=0:", y_pred_example)



## 3. Cost Function (Mean Squared Error)

To measure how good our model is, we use a **cost function**:

\[ J(w, b) = \frac{1}{2m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})^2 \]

Where:
- \( m \) is the number of training examples  
- \( \hat{y}^{(i)} \) is the prediction for example i  
- \( y^{(i)} \) is the true value  

The goal is to **minimize** this cost.


In [None]:

def compute_cost(x, y, w, b):
    m = len(x)
    predictions = predict(x, w, b)
    errors = predictions - y
    cost = (1 / (2 * m)) * np.sum(errors ** 2)
    return cost

# Cost with w=0, b=0
cost_initial = compute_cost(x, y, 0.0, 0.0)
print("Cost with w=0, b=0:", cost_initial)



## 4. Gradients of the Cost Function

To minimize the cost, we need its derivatives (gradients) with respect to w and b.

For the cost function:

\[ J(w, b) = \frac{1}{2m} \sum_{i=1}^{m} (wx^{(i)} + b - y^{(i)})^2 \]

The gradients are:

\[ \frac{\partial J}{\partial w} = \frac{1}{m} \sum_{i=1}^{m} (wx^{(i)} + b - y^{(i)}) x^{(i)} \]

\[ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} (wx^{(i)} + b - y^{(i)}) \]


In [None]:

def compute_gradients(x, y, w, b):
    m = len(x)
    predictions = predict(x, w, b)
    errors = predictions - y
    dw = (1 / m) * np.sum(errors * x)
    db = (1 / m) * np.sum(errors)
    return dw, db

dw_initial, db_initial = compute_gradients(x, y, 0.0, 0.0)
print("Initial dw:", dw_initial)
print("Initial db:", db_initial)



## 5. Gradient Descent Algorithm

Gradient descent updates the parameters in the opposite direction of the gradient:

\[ w := w - \alpha \frac{\partial J}{\partial w} \]  
\[ b := b - \alpha \frac{\partial J}{\partial b} \]

Where \( \alpha \) is the **learning rate**.


In [None]:

def gradient_descent(x, y, w_init, b_init, learning_rate, num_iterations):
    w = w_init
    b = b_init
    cost_history = []

    for i in range(num_iterations):
        dw, db = compute_gradients(x, y, w, b)
        w = w - learning_rate * dw
        b = b - learning_rate * db

        cost = compute_cost(x, y, w, b)
        cost_history.append(cost)

        if i % 10 == 0:
            print(f"Iteration {i:3d}: w = {w:.4f}, b = {b:.4f}, cost = {cost:.4f}")

    return w, b, cost_history

w_init = 0.0
b_init = 0.0
learning_rate = 0.01
num_iterations = 201

w_final, b_final, cost_history = gradient_descent(x, y, w_init, b_init, learning_rate, num_iterations)

print("\nFinal parameters:")
print("w =", w_final)
print("b =", b_final)



## 6. Cost Over Iterations

Let's visualize how the cost decreases during gradient descent.


In [None]:

iterations = np.arange(len(cost_history))

plt.figure()
plt.plot(iterations, cost_history)
plt.xlabel("Iteration")
plt.ylabel("Cost J(w, b)")
plt.title("Cost Decrease Over Iterations")
plt.grid(True)
plt.show()



## 7. Final Fitted Line

Now we plot the data and the line defined by the learned parameters \( w \) and \( b \).


In [None]:

plt.figure()
plt.scatter(x, y, label="Data")
x_line = np.linspace(min(x), max(x), 100)
y_line = predict(x_line, w_final, b_final)
plt.plot(x_line, y_line, label="Fitted line")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Linear Regression Fit with Gradient Descent")
plt.grid(True)
plt.legend()
plt.show()



## âœ… Summary

In this notebook, you:

- Created a simple linear dataset  
- Defined a linear prediction function  
- Implemented a cost function (mean squared error)  
- Computed gradients of the cost  
- Implemented gradient descent  
- Observed the cost decreasing over iterations  
- Plotted the final fitted line  



