# Interactive Machine Learning Concept Visualizer

This notebook provides an interactive visualization of linear regression, a fundamental machine learning algorithm. We will walk through the process of loading data, preparing it, training a model, and visualizing the results in both 2D and 3D.

We will be using the California Housing dataset from `scikit-learn`.

## 1. Imports and Setup

First, let's import the necessary libraries for data manipulation, plotting, and creating animations.

In [2]:
from IPython.display import HTML
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from mpl_toolkits.mplot3d import Axes3D # Necessary for 3D plots
from sklearn.datasets import fetch_california_housing
%matplotlib qt

## 2. Load and Prepare the Data

We'll load the California Housing dataset and select one feature, Median Income (`MedInc`), to predict the Median House Value.

In [3]:
housing = fetch_california_housing(as_frame=True)

print(housing.data.shape, housing.target.shape)
print(housing.feature_names)

(20640, 8) (20640,)
['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']


In [7]:
# Select 1 feature (Feature = MedianIncome)
X = housing.data[["MedInc"]].to_numpy()

# Select target (Target = MedianHouseValue)
y = housing.target.to_numpy()

print(X.shape, y.shape)

(20640, 1) (20640,)


## 3. Initial Data Visualization

Let's create a scatter plot to visualize the relationship between Median Income and Median House Value.

In [8]:
fig, ax = plt.subplots()
ax.scatter(X, y, s=1)
ax.set_xlabel("Median Income")
ax.set_ylabel("Median House Value")
ax.set_title("California Housing Data")
plt.show()

## 4. Model Training with Animated Visualization (2D)

Now, we'll train a simple linear regression model using gradient descent. We'll animate the process to see how the model's fit improves over time.

### 4.1. Data Sampling and Hyperparameter Setup

In [9]:
# Lists for storing loss values by epoch
loss_list = []
epoch_list = []

# Make the data into a smaller sample for faster training
sample_size = 1000
indices = np.random.choice(X.shape[0], sample_size, replace=False)
X_sample = X[indices]
y_sample = y[indices]
print(X_sample.shape, y_sample.shape)

# Set up model hyperparameters
np.random.seed(42)
batch_size = 200
n_batches = int(np.ceil(X_sample.shape[0] / batch_size))
n_epochs = 10
learning_rate = 0.015

# Initialize model parameters
theta = np.array([[1.2]])
bias = 0.1
print(f"Initial theta: {theta}, Initial bias: {bias}")

(1000, 1) (1000,)
Initial theta: [[1.2]], Initial bias: 0.1


### 4.2. Animation Setup

In [15]:
# Set up a figure for animation
fig, ax = plt.subplots()
ax.scatter(X_sample, y_sample, s=1)
line, = ax.plot(X_sample, X_sample.dot(theta) + bias, color='red')
ax.set_ylim(0, 5.5)
ax.set_xlabel("Median Income")
ax.set_ylabel("Median House Value")
ax.set_title("California Housing Data with Linear Regression Fit")
plt.close() # Prevent static plot from showing

### 4.3. Training and Animation Function

In [16]:
# Function to update the line for animation
def update_line(frame):
    global theta, bias, X_shuffled, y_shuffled
    
    # Shuffle the data at the beginning of each epoch
    if frame % n_batches == 0:
        # Calculate loss over the entire dataset at the start of each epoch
        y_pred_full = X_sample.dot(theta) + bias
        loss = np.mean((y_pred_full - y_sample.reshape(-1, 1))**2)
        epoch = frame // n_batches
        
        epoch_list.append(epoch)
        loss_list.append(loss)
        
        indices = np.random.permutation(X_sample.shape[0])
        X_shuffled = X_sample[indices]
        y_shuffled = y_sample[indices]
    
    # Get the current batch
    batch_index = frame % n_batches
    start = batch_index * batch_size
    end = min(start + batch_size, X_sample.shape[0])
    X_batch = X_shuffled[start:end]
    y_batch = y_shuffled[start:end]
    
    # Add bias term to the input features
    X_batch_b = np.c_[np.ones((X_batch.shape[0], 1)), X_batch]
    
    # Make predictions
    y_pred = X_batch_b.dot(np.array([bias, theta.item()]).reshape(-1, 1))
    
    # Compute gradients
    error = y_pred - y_batch.reshape(-1, 1)
    gradients = 2 / X_batch_b.shape[0] * X_batch_b.T.dot(error)
    
    # Update parameters
    bias -= learning_rate * gradients[0, 0]
    theta -= learning_rate * gradients[1:, 0].reshape(-1, 1)
    
    # Update the line in the plot
    line.set_ydata(X_sample.dot(theta) + bias)
    return line,

### 4.4. Run the Animation

In [17]:
# Create and display the animation
n_frames = n_epochs * n_batches
ani = FuncAnimation(fig, update_line, frames=n_frames, blit=True, interval=100)
HTML(ani.to_jshtml())

In [18]:
# Print final parameters and loss
print("Final parameters:")
print("Weight (theta):", theta)
print("Bias:", bias)
if loss_list:
    print("Final loss (MSE):", loss_list[-1])

Final parameters:
Weight (theta): [[0.45467172]]
Bias: 0.2932584629922606
Final loss (MSE): 0.6417800533823416


## 5. Loss Visualization

Let's plot the loss (Mean Squared Error) at each epoch to see how our model's performance improved during training.

In [19]:
# Create a new figure for the loss plot
fig_loss, ax_loss = plt.subplots()
ax_loss.plot(epoch_list, loss_list)
ax_loss.set_xlabel("Epoch")
ax_loss.set_ylabel("Loss (MSE)")
ax_loss.set_title("Loss per Epoch")
ax_loss.grid(True)
ax_loss.set_xlim(-1, n_epochs)
plt.show()

## 6. 3D Visualization with Two Features

Now, let's extend our model to use two features: Median Income (`MedInc`) and Average Rooms (`AveRooms`). This will allow us to visualize the regression as a 3D plane.

### 6.1. Data Preparation for 3D

In [20]:
# Select two features and the target
X_3d = housing.data[["MedInc", "AveRooms"]].to_numpy()
y_3d = housing.target.to_numpy()

# Create a smaller sample for faster training
sample_size = 1000
np.random.seed(42) # for reproducibility
indices = np.random.choice(X_3d.shape[0], sample_size, replace=False)
X_3d = X_3d[indices]
y_3d = y_3d[indices]

print("---> Data Loaded ---")
print(f"X shape: {X_3d.shape}")
print(f"y shape: {y_3d.shape}")

---> Data Loaded ---
X shape: (1000, 2)
y shape: (1000,)


### 6.2. Model Training for 3D

In [21]:
print("\n---> Starting Model Training ---")

# Hyperparameters
learning_rate = 0.01
n_epochs = 30
batch_size = 100
n_batches = int(np.ceil(X_3d.shape[0] / batch_size))

# Initial parameters (2 weights for 2 features)
np.random.seed(42) # for reproducibility
theta_3d = np.random.randn(2, 1)
bias_3d = np.random.randn(1)

# --- Training Loop ---
for epoch in range(n_epochs):
    # Shuffle data at the start of each epoch
    shuffle_indices = np.random.permutation(X_3d.shape[0])
    X_shuffled = X_3d[shuffle_indices]
    y_shuffled = y_3d[shuffle_indices]

    for i in range(n_batches):
        # Get batch
        start = i * batch_size
        end = min(start + batch_size, X_3d.shape[0])
        X_batch = X_shuffled[start:end]
        y_batch = y_shuffled[start:end]

        # Add bias term to features
        X_batch_b = np.c_[np.ones((X_batch.shape[0], 1)), X_batch]

        # Combine all parameters into one vector [bias, theta1, theta2]
        params = np.vstack([bias_3d, theta_3d])

        # Make predictions and calculate error
        y_pred = X_batch_b.dot(params)
        error = y_pred - y_batch.reshape(-1, 1)

        # Compute gradients
        gradients = 2 / X_batch.shape[0] * X_batch_b.T.dot(error)

        # Update parameters
        bias_3d -= learning_rate * gradients[0, 0]
        theta_3d -= learning_rate * gradients[1:, 0].reshape(-1, 1)

    # Print loss at the end of each epoch (print every 5th epoch)
    if epoch % 5 == 0 or epoch == n_epochs - 1:
        y_pred_full = X_3d.dot(theta_3d) + bias_3d
        loss = np.mean((y_pred_full - y_3d.reshape(-1, 1))**2)
        print(f"Epoch: {epoch}, Loss: {loss:.4f}")

print("\n---> Training Complete ---")
print(f"Final Bias: {bias_3d[0]:.4f}")
print(f"Final Thetas: [{theta_3d[0,0]:.4f}, {theta_3d[1,0]:.4f}]")


---> Starting Model Training ---
Epoch: 0, Loss: 0.7363
Epoch: 5, Loss: 0.7192
Epoch: 10, Loss: 0.6998
Epoch: 15, Loss: 0.7006
Epoch: 20, Loss: 0.7142
Epoch: 25, Loss: 0.6993
Epoch: 29, Loss: 0.7602

---> Training Complete ---
Final Bias: 0.7334
Final Thetas: [0.4549, -0.0373]


### 6.3. Plot the Final 3D Result

In [23]:
print("\n---> Generating 3D Plot ---")

# Create a meshgrid to plot the regression plane
x_surf = np.linspace(X_3d[:, 0].min(), X_3d[:, 0].max(), 10)
y_surf = np.linspace(X_3d[:, 1].min(), X_3d[:, 1].max(), 10)
x_surf, y_surf = np.meshgrid(x_surf, y_surf)

# Calculate the z-values for the plane using the final trained parameters
z_surf = (x_surf * theta_3d[0, 0] + y_surf * theta_3d[1, 0] + bias_3d).squeeze()

# Create the 3D figure
fig = plt.figure(figsize=(12, 9))
ax = fig.add_subplot(111, projection='3d')

# Plot the original data points
ax.scatter(X_3d[:, 0], X_3d[:, 1], y_3d, c='blue', s=5, label='Data Points', alpha=0.6)

# Plot the regression plane
ax.plot_surface(x_surf, y_surf, z_surf, color='red', alpha=0.5, label='Regression Plane')

# Set labels and title
ax.set_xlabel('Median Income', fontsize=10)
ax.set_ylabel('Average Rooms', fontsize=10)
ax.set_zlabel('Median House Value', fontsize=10)
ax.set_title('3D Linear Regression with Two Features', fontsize=14)

plt.show()


---> Generating 3D Plot ---
