<a href="https://colab.research.google.com/github/aasem/cvisionmcs/blob/main/linearclassification_losses.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Linear Classification (SVM and Softmax) for CIFAR10 Dataset
**Preliminary Stuff - Definitions, Data Access and Visualization**


In [8]:
!git clone https://github.com/aasem/cvisionmcs
# Definitions
import numpy as np
import matplotlib.pyplot as plt
import sys
# sys.path.append('/content/cvisionmcs')
from cvisionmcs import data_utils
from cvisionmcs import download
%matplotlib inline

fatal: destination path 'cvisionmcs' already exists and is not an empty directory.


**Downloading the Dataset**

In [9]:
url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
download_dir = "./data"
download.maybe_download_and_extract(url,download_dir)

Data has apparently already been downloaded and unpacked.


**Converting Raw Files into Training and Test Datasets**

In [25]:
cifar10_dir = './data/cifar-10-batches-py'
X_train, y_train, X_test, y_test = data_utils.load_CIFAR10(cifar10_dir)

# Checking the size of the training and testing data
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)


**Reshaping and Appending the Bias**

In [26]:
# reshaping data and placing into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
print(X_train.shape, X_test.shape)

(50000, 3072) (10000, 3072)


In [27]:
# append 1 in the last column to cater for bias and transform into columns
X_train = np.append(X_train, np.ones((X_train.shape[0],1)), axis=1)
X_test = np.append(X_test, np.ones((X_test.shape[0],1)), axis=1)
X_train = np.transpose(X_train)
X_test = np.transpose(X_test)
print(X_train.shape, X_test.shape)

(3073, 50000) (3073, 10000)


**Computing SVM Loss**

The loss for $i$th example is given by:

$$L_i = \sum_{j\neq y_i} \max(0, s_j - s_{y_i} + \Delta)$$

where $s_j$ is the score for $j$th class and $s_{y_i}$ is the score for the true class $y_i$. For a linear classifier

$$s_j = \mathbf{W}x_{i}+b$$ which is reduced to

$$s_j = \mathbf{W}x_{i}$$

if we embed the bias $b$ with in the weight matrix $\mathbf{W}$. We keep  $\Delta = 1$ as a fixed parameter. If we sum all losses over all examples and add the regularization factor, the total loss becomes:

$$L=\underbrace{\frac{1}{N}\sum_{i}L_{i}}_{\text{data loss}}+\underbrace{\lambda R(\mathbf{W})}_{\text{regularization loss}}$$

where

$$R(\mathbf{W}) = \sum_k\sum_l\mathbf{W}_{k,l}^2$$

and $\lambda$ is a hyperparameter chosen during validation.


In [28]:
def loss_svm(W, X, y, r_lambda):
    """
    Compute the SVM loss.
    
    Input Parameters
    ----------
    W: (K, D+1) array of weights, K is the number of classes and D is the dimension of one sample plus bias
    X: (D+1, N) array of training data, each column is a training sample with D-dimension plus bias
    y: (N, ) 1-dimension array of target data with length N with lables 0,1, ... K-1, for K classes
    r_lambda: (float) regularization strength for optimization.

    Returns
    -------
    loss: (float)
    """
    
    # initialization
    loss = 0.0
    delta = 1.0
    N = y.shape[0]

    # compute all scores s_j
    scores = W.dot(X) # [K x N] matrix
 
    # get the true class score 
    true_class_score = scores[y, range(N)] # [1 x N]
    
    margins = scores - true_class_score + delta # [K x N]

    # threshold the margins to max(0, -)
    margins = np.maximum(0, margins)
    margins[y, range(N)] = 0 # neglect the true class scores

    loss = np.sum(margins) / num_train

    # add regularization to loss
    loss += 0.5 * r_lambda * np.sum(W * W)
   
    return loss

**Computing Softmax Loss**


The cross-entropy (softmax) loss for $i$th example is given by:

$$L_i = -\log\left(\frac{e^{s_{y_i}}}{ \sum_j e^{s_j} }\right)$$ or equivalently

$$L_i = -s_{y_i} + \log\sum_j e^{s_j}$$

where $s_j$ is the score for $j$th class and $s_{y_i}$ is the score for the true class $y_i$. For a linear classifier

$$s_j = \mathbf{W}x_{i}+b$$ which is reduced to

$$s_j = \mathbf{W}x_{i}$$

if we embed the bias $b$ with in the weight matrix $\mathbf{W}$. We keep  $\Delta = 1$ as a fixed parameter. If we sum all losses over all examples and add the regularization factor, the total loss becomes:

$$L=\underbrace{\frac{1}{N}\sum_{i}L_{i}}_{\text{data loss}}+\underbrace{\lambda R(\mathbf{W})}_{\text{regularization loss}}$$

where

$$R(\mathbf{W}) = \sum_k\sum_l\mathbf{W}_{k,l}^2$$

and $\lambda$ is a hyperparameter chosen during validation.

In [29]:
def loss_softmax(W, X, y, r_lambda):
    """ Compute the softmax loss
    
    Input Parameters
    ----------
    W: (K, D+1) array of weights, K is the number of classes and D is the dimension of one sample plus bias
    X: (D+1, N) array of training data, each column is a training sample with D-dimension plus bias
    y: (N, ) 1-dimension array of target data with length N with lables 0,1, ... K-1, for K classes
    r_lambda: (float) regularization strength for optimization.
    
    Returns
    -------
    loss: (float)
    """
    loss = 0
    N = y.shape[0] 
  
    # compute all scores s_j
    scores = W.dot(X) # [K x N] matrix
    # Shift scores so that the highest value is 0
    scores -= np.max(scores)
    scores_exp = np.exp(scores)
    true_class_scores_exp = scores_exp[y, range(N)] # [N, ]
    scores_exp_sum = np.sum(scores_exp, axis=0) # [N, ]
    loss = -np.sum(np.log(true_class_scores_exp / scores_exp_sum))
    loss /= N
    loss += 0.5 * r_lambda * np.sum(W * W)

    return loss

In [30]:
import time
# generate a rand weights W 
W = np.random.randn(10, X_train.shape[0]) * 0.001
tic = time.time()
svmloss = loss_svm(W, X_train, y_train, 0.0001)
toc = time.time()
print('SVM loss: %f; Computed in: %fs' % (svmloss, toc - tic))

SVM loss: 33.415491; Computed in: 0.309494s


In [31]:
# generate a rand weights W 
W = np.random.randn(10, X_train.shape[0]) * 0.001
tic = time.time()
smaxloss = loss_softmax(W, X_train, y_train, 0.0001)
toc = time.time()
print ('Softmax loss: %f; Computed in %fs' % (smaxloss, toc - tic))

Softmax loss: 8.508621; Computed in 0.293414s
