# **Radial Basis Function (RBF) Network for 2D Regression**

This notebook demonstrates the implementation of a Radial Basis Function (RBF) network for regression on a 2D synthetic dataset. It covers the process from generating the training and test data, defining the core RBF network architecture and its mathematical underpinnings, training the model using a closed-form solution (least squares), and finally, evaluating its performance with visualizations and metrics.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
np.random.seed(42)

# **1. 2D Synthetic Data Generation**

In this section, we generate a synthetic 2D regression dataset. The target variable is created as a **Mixture of Gaussians (MoG)** in a 2D input space, providing a non-linear surface that the RBF network will learn to approximate. This function generates both the input features (X) and the corresponding target values (y), and can optionally visualize the 3D surface of the generated data.

In [None]:
def mog2D_gen(Npatterns=10000, visualize=True):
    """
    Generates a 2D Mixture of Gaussians synthetic dataset

    Parameters:
    - Npatterns: Number of data points (should be a perfect square)
    - visualize: Whether to plot the generated data

    Returns:
    - X: Input features (Npatterns x 2)
    - y: Target values (Npatterns x 1)
    - XX1, XX2: Meshgrid coordinates (for visualization)
    - YY: Meshgrid target values (for visualization)
    """
    n = int(np.sqrt(Npatterns))
    x1 = np.linspace(-10, 10, n)
    x2 = np.linspace(-10, 10, n)
    XX1, XX2 = np.meshgrid(x1, x2)

    # Define parameters for three 2D Gaussian kernels
    centre1, centre2, centre3 = -8, -1, 7
    ampl1, ampl2, ampl3 = 3, -5, 9
    epsilon = 0.3

    # Compute each 2D kernel component
    k1 = ampl1 * np.exp(-((XX1 - centre1) * epsilon)**2) * np.exp(-((XX2 - centre1) * epsilon)**2)
    k2 = ampl2 * np.exp(-((XX1 - centre2) * epsilon)**2) * np.exp(-((XX2 - centre2) * epsilon)**2)
    k3 = ampl3 * np.exp(-((XX1 - centre3) * epsilon)**2) * np.exp(-((XX2 - centre3) * epsilon)**2)

    # Combine kernels to form target output
    YY = k1 + k2 + k3

    # Flatten for training data
    X = np.column_stack([XX1.ravel(), XX2.ravel()])
    y = YY.ravel()

    if visualize:
        fig = plt.figure(figsize=(10, 7))
        ax = fig.add_subplot(111, projection='3d')
        surf = ax.plot_surface(XX1, XX2, YY, cmap='viridis')
        fig.colorbar(surf)
        ax.set_title('2D Mixture of Gaussians Synthetic Dataset', fontsize=14)
        ax.set_xlabel('X1', fontsize=12)
        ax.set_ylabel('X2', fontsize=12)
        ax.set_zlabel('y', fontsize=12)
        plt.show()

    return X, y, XX1, XX2, YY


# **2. RBF Network Core Functions**

This section defines the fundamental building blocks of our RBF network. An RBF network consists of an input layer, a hidden layer of RBF units, and a linear output layer. The training process typically involves initializing the RBF centers (e.g., via clustering) and then finding the optimal output layer weights through a least squares method.

## **2.1 Key Formulas**

The RBF network relies on two primary mathematical components:

### **Gaussian Radial Basis Function (RBF)**
This function is used in the hidden layer to measure the similarity between an input vector $\mathbf{x}$ and a center vector $\mathbf{c}$. The output is highest when $\mathbf{x}$ is close to $\mathbf{c}$ and decreases as the distance increases, controlled by the width parameter $\epsilon$:
$$\phi(\mathbf{x}, \mathbf{c}, \epsilon) = e^{-\epsilon^2 \|\mathbf{x} - \mathbf{c}\|^2}$$
where:
* $\mathbf{x}$ is the input vector (e.g., $[x_1, x_2]$ for a 2D input).
* $\mathbf{c}$ is the center vector of the radial basis function for a specific hidden unit.
* $\epsilon$ is the width parameter, controlling the spread of the Gaussian (a larger $\epsilon$ means a narrower function).
* $\|\cdot\|$ denotes the Euclidean norm (distance).

### **Network Output (Linear Output Layer)**
The hidden layer's outputs, which are the activations of each RBF unit for a given input, are then linearly combined to produce the final network output. A bias term is typically included. For a set of $N$ input patterns, the predictions $\hat{\mathbf{Y}}$ are calculated as:
$$\mathbf{\hat{Y}} = \mathbf{H}_{ext} \mathbf{W}_2$$
where:
* $\mathbf{\hat{Y}}$ is the column vector of predicted outputs (N x 1).
* $\mathbf{H}_{ext}$ is the matrix of extended hidden layer outputs (N x (Number of Hidden Units + 1)). Each row corresponds to an input sample, with the first column being a bias of 1, and subsequent columns being the activations of the RBF units for that input.
* $\mathbf{W}_2$ is the column vector of output weights ((Number of Hidden Units + 1) x 1), including the bias weight.

In [ ]:
def RBF_train_offline(X, y, model_params):
    """
    Trains an RBF network offline (batch mode).
    This function performs two main steps:
    1. Initializes the RBF centers using either random selection or k-means clustering.
    2. Computes the output layer weights (W2) by solving a least squares problem.
    """
    n_hidden = model_params['n_hidden']
    n_features = model_params['n_features']

    # Step 1: Initialize RBF centers (W1)
    if model_params['centres_generation_method'] == 'random':
        # Randomly select training points as centers
        idx = np.random.choice(X.shape[0], n_hidden, replace=False)
        W1 = X[idx, :]
    elif model_params['centres_generation_method'] == 'clustering':
        # Use k-means clustering to find centers
        from sklearn.cluster import KMeans
        print('Clustering training data using K-means to initialize centers...')
        kmeans = KMeans(n_clusters=n_hidden, random_state=42, n_init=10).fit(X) # Added n_init for robustness
        W1 = kmeans.cluster_centers_
        print('K-means clustering completed.')

    # Step 2: Compute RBF activations (H)
    epsilon = model_params['epsilon']
    # Calculate Euclidean distance between each input sample and each center
    distances = np.sqrt(((X[:, np.newaxis, :] - W1[np.newaxis, :, :])**2).sum(axis=2))
    # Apply Gaussian RBF activation function
    H = np.exp(-(distances * epsilon)**2)

    # Add bias term to the hidden layer output matrix (H_ext)
    H = np.column_stack([np.ones(X.shape[0]), H])

    # Step 3: Solve for output weights (W2) using least squares
    # W2 = (H_ext^T * H_ext)^-1 * H_ext^T * y
    # Using np.linalg.pinv (Moore-Penrose pseudo-inverse) for robustness
    W2 = np.linalg.pinv(H) @ y.reshape(-1, 1)

    return model_params, W1, W2


In [ ]:
def RBF_predict(X, W1, W2, epsilon):
    """
    Makes predictions using a trained RBF network.
    This function performs the forward pass through the RBF network to compute outputs.
    """
    # Compute RBF activations for the given input X
    # Calculate Euclidean distance between each input sample and each center
    distances = np.sqrt(((X[:, np.newaxis, :] - W1[np.newaxis, :, :])**2).sum(axis=2))
    # Apply Gaussian RBF activation function
    H = np.exp(-(distances * epsilon)**2)

    # Add bias term to the hidden layer output matrix
    H = np.column_stack([np.ones(X.shape[0]), H])

    # Compute predictions by multiplying hidden layer outputs with output weights
    y_pred = H @ W2

    return y_pred

# **3. Main Execution: 2D RBF Network Training and Evaluation**

This is the main script that orchestrates the entire RBF network workflow. It begins by generating the 2D synthetic data, sets up the RBF model's configuration parameters, trains the network, and then evaluates its performance using visualizations and quantitative metrics like Mean Squared Error (MSE).

### Data Generation and Visualization

In [ ]:
# Generate training and test data
Ntrain = 150*150  # 22500 points (150x150 grid)
Ntest = 2500      # 2500 points (50x50 grid)

Xtrain, ytrain, XX1_train, XX2_train, YY_train = mog2D_gen(Ntrain, True)
Xtest, ytest, XX1_test, XX2_test, YY_test = mog2D_gen(Ntest, False)

# Shuffle training data to ensure randomness for training process (if batching were used)
shuffled_ind = np.random.permutation(Xtrain.shape[0])
Xtrain = Xtrain[shuffled_ind, :]
ytrain = ytrain[shuffled_ind]

### Model Configuration

In [ ]:
# Initialize model parameters
model = {
    'n_output': 1,              # Single output regression
    'n_features': 2,            # 2D input (X1, X2)
    'n_hidden': 50,             # Number of RBF centers (hidden units)
    'epsilon': 0.1,             # RBF width parameter: controls the 'tightness' of the RBFs
    'centres_generation_method': 'clustering'  # Method to initialize RBF centers: 'random' or 'clustering'
}

### Model Training

In [ ]:
# Train the RBF network using the offline (batch) method
print("\nStarting RBF network training...")
model, W1, W2 = RBF_train_offline(Xtrain, ytrain, model)
print("RBF network training complete.")


### Prediction and Evaluation

In [ ]:
# Make predictions on both the training and test datasets
ytrain_pred = RBF_predict(Xtrain, W1, W2, model['epsilon'])
ytest_pred = RBF_predict(Xtest, W1, W2, model['epsilon'])

# Reshape predictions to match ytrain/ytest shapes for plotting
ytrain_pred = ytrain_pred.ravel()  # Flatten to (N_samples,)
ytest_pred = ytest_pred.ravel()    # Flatten to (N_samples,)

# Visualize results: 3D scatter plots of true vs predicted values
fig = plt.figure(figsize=(14, 6))

# Training set predictions visualization
ax1 = fig.add_subplot(121, projection='3d')
ax1.scatter(Xtrain[:, 0], Xtrain[:, 1], ytrain, c='b', label='True values', alpha=0.6)
ax1.scatter(Xtrain[:, 0], Xtrain[:, 1], ytrain_pred, c='r', alpha=0.3, label='Predictions')
ax1.set_title('Training Set: True vs Predicted', fontsize=12)
ax1.set_xlabel('X1', fontsize=10)
ax1.set_ylabel('X2', fontsize=10)
ax1.set_zlabel('y', fontsize=10)
ax1.legend()

# Test set predictions visualization
ax2 = fig.add_subplot(122, projection='3d')
ax2.scatter(Xtest[:, 0], Xtest[:, 1], ytest, c='b', label='True values', alpha=0.6)
ax2.scatter(Xtest[:, 0], Xtest[:, 1], ytest_pred, c='r', alpha=0.3, label='Predictions')
ax2.set_title('Test Set: True vs Predicted', fontsize=12)
ax2.set_xlabel('X1', fontsize=10)
ax2.set_ylabel('X2', fontsize=10)
ax2.set_zlabel('y', fontsize=10)
ax2.legend()

plt.tight_layout()
plt.show()

# Scatter plots of true vs predicted values (2D view)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(ytrain, ytrain_pred, '.b', alpha=0.5) # x-axis: True y, y-axis: Predicted y
plt.title('Training Set Predictions (True vs Predicted)', fontsize=12)
plt.xlabel('True y', fontsize=10)
plt.ylabel('Predicted y', fontsize=10)
plt.grid(True)
plt.axis('equal') # Ensures a square plot for better comparison

plt.subplot(1, 2, 2)
plt.plot(ytest, ytest_pred, '.r', alpha=0.5) # x-axis: True y, y-axis: Predicted y
plt.title('Test Set Predictions (True vs Predicted)', fontsize=12)
plt.xlabel('True y', fontsize=10)
plt.ylabel('Predicted y', fontsize=10)
plt.grid(True)
plt.axis('equal')

plt.tight_layout()
plt.show()


### Performance Metrics

In [ ]:
# Calculate Mean Squared Error (MSE) for both training and test sets
mse_train = np.mean((ytrain - ytrain_pred)**2)
mse_test = np.mean((ytest - ytest_pred)**2)

print(f'Training MSE: {mse_train:.4f}')
print(f'Test MSE: {mse_test:.4f}')