<a href="https://colab.research.google.com/github/AJ112103/ML-implementations/blob/main/cross_entropy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import torch


In [7]:
p = np.array([5, 3, 6, 2, 7, 1])
q1 = p
q2 = np.array([6, 3, 4, 1, 8, 2])
q3 = np.array([4, 4, 5, 3, 6, 2])

In [4]:
-sum(p * np.log(q1))

-37.101248648050245

In [6]:
-sum(p * np.log(q3))

-36.179670406056275

In [9]:
p = np.zeros(10, dtype=int); p[4] = 1 #initializing a 1 hot encoded vector if length 10 with the correct class at index 4
q = np.random.rand(10) #initialising a vector of length 10 with values from [0, 1)
q = q/sum(q) #normalising it to represent a probability mass function where values sum to 1

In [10]:
p

array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0])

In [11]:
q

array([0.15724562, 0.14498029, 0.06904939, 0.12394221, 0.10045654,
       0.10450193, 0.0860722 , 0.10975971, 0.05089389, 0.05309823])

In [13]:
H = -np.log(q[p > 0]) #Only need the value of q at the index where p != 0
H

array([2.29803013])

In [18]:
q[4] = 20 #increase probbaility since this is the correct class
q = q/sum(q) #renormalize
q

array([0.00752388, 0.00693701, 0.00330387, 0.00593038, 0.9569587 ,
       0.0050002 , 0.00411838, 0.00525178, 0.00243517, 0.00254064])

The correct class now has a probability of roughly 95%

In [19]:
H = -np.log(q[p>0])
H

array([0.04399504])

Cross entropy loss function is minimized since the predicted probability of the correct class in increased.

# Multiple Classification Problems

For instance if we are given multiple images which we need to classify into 'n' different classes

In [21]:
p = np.zeros((5, 8), dtype=int)
p[0][3] = 1 #in the 1st image, the correct class is at the 3rd index
p[1][6] = 1 #in the 2nd image, the correct class is at the 6th index
p[2][2] = 1 #...
p[3][7] = 1 #...
p[4][0] = 1 #...
p

array([[0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 0]])

Creating the predicted probability array of qs

In [22]:
q = np.random.rand(40).reshape(5, 8)
np.sum(q, axis=1).shape #this returns a vector with shape (5,) which is the sum of all vectors acroos the 1st axis i.e. the 8 entries in each vector

(5,)

In [23]:
q = q/np.expand_dims(np.sum(q, axis=1), axis=1) #this returns normalized probability mass functions for each image with shape (5, 1) making it compatible with the Hs

In [24]:
Hs = -np.log(q[p>0])

In [25]:
Hs

array([1.74518207, 2.69514302, 2.35518434, 1.27760235, 2.7506281 ])

In [29]:
L = sum(Hs)
L

10.82373987359877

# How to Obtain the '*q* ' Vector in the Context of Machine Learning

In [31]:
yhat = 20*np.random.rand(40).reshape(5, 8) **2 #squaring to make all entries in the 5x8 matrix positive
yhat

array([[1.17626533e+01, 9.44075096e+00, 1.30804109e+01, 4.18211020e+00,
        1.01891257e+00, 5.03183375e+00, 9.18221738e+00, 2.39974790e+00],
       [7.04282107e+00, 4.64067357e-01, 7.50966112e-01, 6.22899213e+00,
        1.13053012e+01, 7.86261316e+00, 9.58316412e-04, 1.24594130e+01],
       [2.02608073e+00, 1.41412079e+01, 7.14977539e-01, 4.17095124e+00,
        9.47650388e+00, 1.63912206e-02, 5.54904397e+00, 3.38610459e-01],
       [7.81287328e+00, 2.55351133e-01, 1.33195328e+00, 8.53103333e-01,
        3.26752762e+00, 2.42869782e-03, 8.79534527e+00, 2.46200856e-01],
       [1.75799798e+01, 1.32747949e+01, 1.65576967e+00, 5.56776156e-01,
        1.67211378e+01, 1.39583272e-01, 8.05492288e-02, 1.14607688e+01]])

In [32]:
yhat.shape #5 different images and 8 different classes per image

(5, 8)

In [33]:
q = np.exp(yhat)
q = q/np.expand_dims(np.sum(q, axis=1), axis=1) #This is how the vector of predicted probabilities is defined in PyTorch

In [34]:
q

array([[2.03637970e-01, 1.99742003e-02, 7.60595247e-01, 1.03913204e-04,
        4.39452016e-06, 2.43052998e-04, 1.54237399e-02, 1.74823826e-05],
       [3.33540412e-03, 4.63481094e-06, 6.17490912e-06, 1.47811098e-03,
        2.36766383e-01, 7.57146012e-03, 2.91679779e-06, 7.50834916e-01],
       [5.42363726e-06, 9.90429421e-01, 1.46179259e-06, 4.63229344e-05,
        9.33186185e-03, 7.26931642e-07, 1.83778259e-04, 1.00330184e-06],
       [2.71326672e-01, 1.41677969e-04, 4.15781054e-04, 2.57574465e-04,
        2.88054026e-03, 1.10016926e-04, 7.24727350e-01, 1.40387490e-04],
       [6.94757484e-01, 9.37810703e-03, 8.43405939e-08, 2.81028157e-08,
        2.94335661e-01, 1.85167715e-08, 1.74552916e-08, 1.52859935e-03]])

In [35]:
np.sum(q, axis=1) #is normalized now

array([1., 1., 1., 1., 1.])

In [36]:
c_tilda = np.where(p)
c_tilda #gets all the indices where p=1 for the images

(array([0, 1, 2, 3, 4]), array([3, 6, 2, 7, 0]))

In [39]:
q[c_tilda]

array([1.03913204e-04, 2.91679779e-06, 1.46179259e-06, 1.40387490e-04,
       6.94757484e-01])

In [37]:
Hs = -np.log(q[c_tilda])
L = sum(Hs)

In [38]:
L

44.588122460619815

# In PyTorch this is equivalent to:

Creating the loss function

In [41]:
L = torch.nn.CrossEntropyLoss(reduction = 'sum')

Evaluating the Loss with the data above:

In [42]:
L(torch.tensor(yhat), torch.tensor(p, dtype=torch.float))

tensor(44.5881, dtype=torch.float64)

We get the exact same value as above