
# Week 1 Lab â€“ Linear Regression with One Feature (No Vectorization)

In this lab we will work with a very simple version of supervised learning:

- Our data consists of pairs $(x^{(i)}, y^{(i)})$, where $x^{(i)}$ is one feature and $y^{(i)}$ is the target.
- We will use a **linear regression model** with one feature.
- We will measure how good the model is using a **cost function** based on mean squared error.
- We will train the model with **gradient descent**.
- In all code, we will avoid NumPy vectorization and use **explicit Python loops**.



## 0. Theory Refresher

### 0.1 Linear Regression (One Feature)

We assume there is (approximately) a linear relationship between the input $x$ and the output $y$.  
Our model (or hypothesis) is a function that depends on the parameters $w$ and $b$:

$$
f_{w,b}(x) = wx + b
$$

- $w$ is the **slope**: how much $f_{w,b}(x)$ changes when $x$ increases by 1.
- $b$ is the **intercept**: the value of $f_{w,b}(x)$ when $x = 0$.
- For a dataset with $m$ examples, we write the $i$-th example as $(x^{(i)}, y^{(i)})$.  
  The prediction for that example is:
  $$
  \hat{y}^{(i)} = f_{w,b}(x^{(i)}) = w x^{(i)} + b
  $$



### 0.2 Cost Function (Mean Squared Error)

We need a way to measure how well a particular line (given by $w$ and $b$) fits the data.

We use the **mean squared error (MSE)** cost function:

$$
J(w,b) = \frac{1}{2m} \sum_{i=1}^{m} \big( \hat{y}^{(i)} - y^{(i)} \big)^2
       = \frac{1}{2m} \sum_{i=1}^{m} \big( f_{w,b}(x^{(i)}) - y^{(i)} \big)^2
$$

- The term $(\hat{y}^{(i)} - y^{(i)})$ is the **error** for example $i$.
- We square the error so that positive and negative errors do not cancel out, and to penalize large errors more.
- The factor $\frac{1}{2m}$ is for mathematical convenience when taking derivatives.

Our goal is to find values of $w$ and $b$ that **minimize** $J(w,b)$.



### 0.3 Gradient Descent

To minimize $J(w,b)$, we use **gradient descent**.  
The idea is to start with some initial $(w,b)$ and repeatedly update them in the direction that decreases the cost.

We compute the partial derivatives:

$$
\frac{\partial J}{\partial w} = \frac{1}{m} \sum_{i=1}^{m} \big( f_{w,b}(x^{(i)}) - y^{(i)} \big) x^{(i)}
$$

$$
\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} \big( f_{w,b}(x^{(i)}) - y^{(i)} \big)
$$

Given a **learning rate** $\alpha > 0$, we update:

$$
w := w - \alpha \frac{\partial J}{\partial w}, \qquad
b := b - \alpha \frac{\partial J}{\partial b}
$$

We repeat these updates many times. If $\alpha is chosen well, the cost $J(w,b)$ will decrease and $(w,b)$ will move toward values that fit the data.


## 1. Setup

In [None]:

# Install required libraries (run this once if needed)
%pip install numpy pandas matplotlib


In [None]:

import numpy as np
import matplotlib.pyplot as plt



## 2. Create a Simple Dataset

We will create a synthetic dataset that roughly follows a linear relationship:

$$
y \approx 3 x + 2 + \text{noise}
$$

Each point is a pair $(x^{(i)}, y^{(i)})$ with **one feature** $x^{(i)}$.


In [None]:

m = 50

# Use numpy only to generate evenly spaced values, then convert to plain Python list
x_array = np.linspace(0, 10, m)
x = [float(v) for v in x_array]

true_w = 3.0
true_b = 2.0

rng = np.random.default_rng(0)
noise_array = rng.normal(loc=0.0, scale=2.0, size=m)
noise = [float(v) for v in noise_array]

# Build y using explicit loops
y = []
for i in range(m):
    y_value = true_w * x[i] + true_b + noise[i]
    y.append(y_value)

print(f"Number of examples m = {m}")
print("First 5 x values:", x[:5])
print("First 5 y values:", y[:5])


### 2.1 Visualize the Data

In [None]:

plt.figure()
plt.scatter(x, y)
plt.xlabel("x")
plt.ylabel("y")
plt.title("Dataset: one feature x vs target y")
plt.show()



## 3. Linear Regression Model with One Feature

We use the model (hypothesis function):

$$
f_{w,b}(x^{(i)}) = w x^{(i)} + b
$$

where:
- $w$ is the slope,
- $b$ is the intercept.


In [None]:

def predict(x_list, w, b):
    """Compute predicted y values for a list of x, using f_{w,b}(x) = w x + b."""
    y_hat_list = []
    for i in range(len(x_list)):
        y_hat_list.append(w * x_list[i] + b)
    return y_hat_list

w_test = 0.0
b_test = 0.0
y_hat_test = predict(x, w_test, b_test)
print("First 5 predictions with w=0, b=0:", y_hat_test[:5])



## 4. Cost Function $J(w,b)$

We define the **mean squared error** cost function:

$$
J(w,b) = \frac{1}{2m} \sum_{i=1}^{m} \big( f_{w,b}(x^{(i)}) - y^{(i)} \big)^2
$$

This measures how well the model $f_{w,b}(x) = w x + b$ fits the data.


In [None]:

def compute_cost(x_list, y_list, w, b):
    """Compute the cost J(w,b) using explicit loops."""
    m_local = len(x_list)
    total = 0.0
    for i in range(m_local):
        f_wb = w * x_list[i] + b
        diff = f_wb - y_list[i]
        total += diff * diff
    cost = total / (2 * m_local)
    return cost

print("Cost with w=0, b=0:", compute_cost(x, y, w_test, b_test))



### 4.1 Visualize the Cost Function as a Surface

We can visualize how $J(w,b)$ changes as we vary $w$ and $b$.  
Below we plot the **cost surface** $J(w,b)$ in 3D using explicit loops.


In [None]:

from mpl_toolkits.mplot3d import Axes3D  # needed to register the 3D projection
from matplotlib import cm

# Choose reasonable ranges around the expected optimum
w_values = [float(v) for v in np.linspace(-1.0, 7.0, 60)]
b_values = [float(v) for v in np.linspace(-5.0, 10.0, 60)]

# Build W, B, J_vals as lists of lists using explicit loops
W = []
B = []
J_vals = []

for i in range(len(b_values)):  # rows: b
    row_W = []
    row_B = []
    row_J = []
    for j in range(len(w_values)):  # cols: w
        w_curr = w_values[j]
        b_curr = b_values[i]
        row_W.append(w_curr)
        row_B.append(b_curr)
        j_val = compute_cost(x, y, w_curr, b_curr)
        row_J.append(j_val)
    W.append(row_W)
    B.append(row_B)
    J_vals.append(row_J)

W_arr = np.array(W)
B_arr = np.array(B)
J_arr = np.array(J_vals)

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection="3d")
ax.plot_surface(W_arr, B_arr, J_arr, cmap=cm.viridis, linewidth=0, antialiased=True)
ax.set_xlabel("w")
ax.set_ylabel("b")
ax.set_zlabel("J(w,b)")
ax.set_title("Cost surface J(w,b)")
plt.show()



## 5. Gradient Descent

We use **gradient descent** with the update rules:

$$
\frac{\partial J}{\partial w} = \frac{1}{m} \sum_{i=1}^{m} \big( f_{w,b}(x^{(i)}) - y^{(i)} \big) x^{(i)}, \quad
\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} \big( f_{w,b}(x^{(i)}) - y^{(i)} \big)
$$

Update:

$$
w := w - \alpha \frac{\partial J}{\partial w}, \quad
b := b - \alpha \frac{\partial J}{\partial b}
$$


In [None]:

def compute_gradients(x_list, y_list, w, b):
    """Compute dJ/dw and dJ/db using explicit loops."""
    m_local = len(x_list)
    sum_dw = 0.0
    sum_db = 0.0

    for i in range(m_local):
        f_wb = w * x_list[i] + b
        error = f_wb - y_list[i]
        sum_dw += error * x_list[i]
        sum_db += error

    dj_dw = sum_dw / m_local
    dj_db = sum_db / m_local
    return dj_dw, dj_db

dj_dw_test, dj_db_test = compute_gradients(x, y, w_test, b_test)
print("Gradients at w=0, b=0:", dj_dw_test, dj_db_test)


### 5.1 Implement the Gradient Descent Loop

In [None]:

def gradient_descent(x_list, y_list, w_init, b_init, alpha, num_iterations):
    """Run gradient descent using explicit loops for gradients and cost."""
    w = w_init
    b = b_init
    history_iterations = []
    history_costs = []

    for i in range(num_iterations):
        dj_dw, dj_db = compute_gradients(x_list, y_list, w, b)
        w = w - alpha * dj_dw
        b = b - alpha * dj_db

        cost = compute_cost(x_list, y_list, w, b)
        history_iterations.append(i)
        history_costs.append(cost)

        if i % max(1, (num_iterations // 10)) == 0:
            print(f"Iteration {i:4d}: w={w:7.4f}, b={b:7.4f}, cost={cost:8.4f}")

    return w, b, history_iterations, history_costs

alpha = 0.01
num_iterations = 1000

w_init = 1.0
b_init = 1.0

w_learned, b_learned, it_hist, cost_hist = gradient_descent(x, y, w_init, b_init, alpha, num_iterations)
print("\nLearned parameters:")
print("w =", w_learned)
print("b =", b_learned)


### 5.2 Plot the Cost over Iterations

In [None]:

plt.figure()
plt.plot(it_hist[10:], cost_hist[10:])  # skip the first pointplt.xlabel("Iteration")
plt.ylabel("Cost J(w,b)")
plt.title("Gradient Descent: Cost vs Iterations")
plt.show()


### 5.3 Visualize the Fitted Line

In [None]:

plt.figure()
plt.scatter(x, y, label="Data")
y_pred = predict(x, w_learned, b_learned)
plt.plot(x, y_pred, label="Fitted line")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Linear Regression Fit (one feature)")
plt.legend()
plt.show()



## 6. Exercises (for You to Try)

1. **Change the learning rate $\alpha$**:
   - Try values like `0.001`, `0.1`, `0.5`.
   - What happens to the speed of convergence? Does the algorithm diverge for some values?

2. **Change the number of iterations**:
   - Try `num_iterations = 100`, `500`, `2000`.
   - How does the final cost change?

3. **Try different initial values** for `w_init` and `b_init`:
   - Does gradient descent still converge to similar values?

4. **Noise level**:
   - Go back to the cell where we define `noise_array` and change `scale` (e.g., `scale=0.5` or `scale=5.0`).
   - How does the fitted line look with less/more noise?

5. **(Optional) Manual check**:
   - Pick some values of $w$ and $b$, compute $J(w,b)$ using `compute_cost`,
   - Plot the line and see visually if a smaller cost corresponds to a better fit.
