Given a dataset X (shape:m*n), compute the RBF kernel matrix for all pairs of samples.

In [1]:
import numpy as np
from scipy.spatial.distance import cdist

def rbf_kernel_matrix(X, gamma):
    """
    Compute the RBF kernel matrix for a given dataset.

    Parameters:
    X (ndarray): Input data matrix of shape (m, n), where m is the number of samples.
    gamma (float): Kernel coefficient (inverse of 2 * sigma^2).

    Returns:
    ndarray: RBF kernel matrix of shape (m, m).
    """
    # Compute squared Euclidean distances between all pairs of samples
    sq_dists = cdist(X, X, metric='sqeuclidean')
    
    # Compute the RBF kernel matrix
    return np.exp(-gamma * sq_dists)

# Example usage:
X = np.array([[1, 2], [3, 4], [5, 6]])  # Example dataset (3 samples, 2 features)
gamma = 0.5  # Hyperparameter
K = rbf_kernel_matrix(X, gamma)
print(K)  # Output: RBF kernel matrix


[[1.00000000e+00 1.83156389e-02 1.12535175e-07]
 [1.83156389e-02 1.00000000e+00 1.83156389e-02]
 [1.12535175e-07 1.83156389e-02 1.00000000e+00]]


Explanation:

1. Pairwise Squared Euclidean Distance:

   - We use scipy.spatial.distance.cdist(X, X, metric='sqeuclidean') to compute the squared Euclidean distance efficiently.

   - This creates an (m,n)(m,m) matrix, where each entry represents the squared distance between two samples.
     
2. Compute the RBF Kernel:

    - Apply the RBF formula to each pair:
  
      K(i,j) = exp ( −γ || Xi-Xj ||^2)

    - The resulting matrix is symmetric and positive semi-definite.
  
3. Time Complexity:

     - O(m^2n) due to pairwise distance computation, making it efficient for moderate datasets.