In this note, we are interested in calculating the kernel between two feature vectors $x^{(i)}$ and $x^{(j)}$. 

We assume the kernel has the form:

$
k(a, b) = ||a-b||^2  \hspace{400pt}
$

Typically the feature vectors appear in a design matrix,
$
A = \left(\begin{matrix} a^{(1)^T} \\  a^{(2)^T} \\ \dots \\ a^{(N_a)^T}\end{matrix} \right) \hspace{500pt}
$

$
B = \left(\begin{matrix} b^{(1)^T} \\  b^{(2)^T} \\ \dots \\ b^{(N_b)^T}\end{matrix} \right) \hspace{500pt}
$

where $A$ is a $N_a \times p$ matrix, and $B$ is a $N_b \times p$ matrix. When calculating all possible combinations of the pair-wise similarity that a kernel approximates, we need to calculate the matrix $K(A, B)$, where $K_{i,j} = || a^{(i)}  - b^{(j)} ||^2 \hspace{300pt}$


In [1]:
import datetime
import numpy as np


In [2]:
def kernel_naive(A, B):
    Nrow, Ncol = A.shape[0], B.shape[0]    
    K = np.zeros((Nrow, Ncol))
    for i in range(Nrow):
        for j in range(Ncol):
            a = A[i, :]
            b = B[j, :]
            K[i, j] = np.sum((a-b)**2)
    return K


In [3]:
def kernel_vectorized(A, B):
    A2 = np.sum(A**2, 1).reshape(-1, 1)  #  a N_a X 1 column vector
    B2 = np.sum(B**2, 1)  # a 1 x N_b row vector
    AdotB = np.dot(A, B.T)  # N_a x N_b 
    K = A2 + B2 - 2*AdotB
    return K

# Note that the above A2 + B2 operation relis on numpy to automatically fill in the mismatched dimension,
#  a more explicit expression will be the following:
#   N_a = A.shape[0]
#   N_b = B.shape[0]
#   np.tile(A2, reps=(1, N_b)) + np.tile(B2, reps=(N_a, 1))

In [4]:
p = 50
Ntrain = 4000
Ntest = 1000

Xtrain = np.random.normal(loc=4, scale=1.5, size=(Ntrain, p))
Xtest = np.random.normal(loc=4, scale=1.5, size=(Ntest, p))


The demos below show the dramatic difference in performance between the naive implementation by looping through the element one-by-one and the matrix operation

In [9]:
DIFF_EPSILON = 1e-10

# K11
print('calculate K11:')
start_time = datetime.datetime.now()
K_naive = kernel_naive(Xtrain, Xtrain)
elapsed_time_naive = datetime.datetime.now() - start_time
print('naive: {0:}'.format(elapsed_time_naive))

start_time = datetime.datetime.now()
K_vector = kernel_vectorized(Xtrain, Xtrain)
elapsed_time_vector = datetime.datetime.now() - start_time
print('vectorized: {0:}'.format(elapsed_time_vector))

print(np.all(np.abs(K_naive - K_vector) < DIFF_EPSILON))
ratio = elapsed_time_naive/elapsed_time_vector
print('boost when calculating K11: {0:.1f}'.format(ratio))
print('')


# K12
print('calculate K12:')
start_time = datetime.datetime.now()
K_naive = kernel_naive(Xtrain, Xtest)
elapsed_time_naive = datetime.datetime.now() - start_time
print('naive: {0:}'.format(elapsed_time_naive))

start_time = datetime.datetime.now()
K_vector = kernel_vectorized(Xtrain, Xtest)
elapsed_time_vector = datetime.datetime.now() - start_time
print('vectorized: {0:}'.format(elapsed_time_vector))

print(np.all(np.abs(K_naive - K_vector) < DIFF_EPSILON))
ratio = elapsed_time_naive/elapsed_time_vector
print('boost when calculating K12: {0:.1f}'.format(ratio))
print('')


# K22
print('calculate K22:')
start_time = datetime.datetime.now()
K_naive = kernel_naive(Xtest, Xtest)
elapsed_time_naive = datetime.datetime.now() - start_time
print('naive: {0:}'.format(elapsed_time_naive))

start_time = datetime.datetime.now()
K_vector = kernel_vectorized(Xtest, Xtest)
elapsed_time_vector = datetime.datetime.now() - start_time
print('vectorized: {0:}'.format(elapsed_time_vector))

print(np.all(np.abs(K_naive - K_vector) < DIFF_EPSILON))
ratio = elapsed_time_naive/elapsed_time_vector
print('boost when calculating K22: {0:.1f}'.format(ratio))
print('')


calculate K11:
naive: 0:01:17.933787
vectorized: 0:00:00.276547
True
boost when calculating K11: 281.8

calculate K12:
naive: 0:00:22.500848
vectorized: 0:00:00.052054
True
boost when calculating K12: 432.3

calculate K22:
naive: 0:00:05.249574
vectorized: 0:00:00.016570
True
boost when calculating K22: 316.8

