# Working with torch tensors and calculating perplexity: Ungraded Lecture Notebook

In [1]:
import numpy
import torch
import torch.nn.functional as F

# Setting random seeds
numpy.random.seed(32)


In [3]:
numpy_array = numpy.random.random((5,10))
print(f"The regular numpy array looks like this:\n\n {numpy_array}\n")
print(f"It is of type: {type(numpy_array)}")


The regular numpy array looks like this:

 [[0.85888927 0.37271115 0.55512878 0.95565655 0.7366696  0.81620514
  0.10108656 0.92848807 0.60910917 0.59655344]
 [0.09178413 0.34518624 0.66275252 0.44171349 0.55148779 0.70371249
  0.58940123 0.04993276 0.56179184 0.76635847]
 [0.91090833 0.09290995 0.90252139 0.46096041 0.45201847 0.99942549
  0.16242374 0.70937058 0.16062408 0.81077677]
 [0.03514717 0.53488673 0.16650012 0.30841038 0.04506241 0.23857613
  0.67483453 0.78238275 0.69520163 0.32895445]
 [0.49403187 0.52412136 0.29854125 0.46310814 0.98478429 0.50113492
  0.39807245 0.72790532 0.86333097 0.02616954]]

It is of type: <class 'numpy.ndarray'>


In [6]:
numpy_array.dtype


dtype('float64')

You can easily cast regular numpy arrays or lists into torch tensors using the `torch.from_numpy()` function:

In [7]:
torch_tensor = torch.from_numpy(numpy_array).float()

print(f"The torch tensor looks like this:\n\n {torch_tensor}\n")
print(f"It is of type: {type(torch_tensor)}")


The torch tensor looks like this:

 tensor([[0.8589, 0.3727, 0.5551, 0.9557, 0.7367, 0.8162, 0.1011, 0.9285, 0.6091,
         0.5966],
        [0.0918, 0.3452, 0.6628, 0.4417, 0.5515, 0.7037, 0.5894, 0.0499, 0.5618,
         0.7664],
        [0.9109, 0.0929, 0.9025, 0.4610, 0.4520, 0.9994, 0.1624, 0.7094, 0.1606,
         0.8108],
        [0.0351, 0.5349, 0.1665, 0.3084, 0.0451, 0.2386, 0.6748, 0.7824, 0.6952,
         0.3290],
        [0.4940, 0.5241, 0.2985, 0.4631, 0.9848, 0.5011, 0.3981, 0.7279, 0.8633,
         0.0262]])

It is of type: <class 'torch.Tensor'>


In [8]:
torch_tensor.dtype


torch.float32

This notebook also aims to teach you how you can calculate the perplexity of a trained model.


## Calculating Perplexity

The perplexity is a metric that measures how well a probability model predicts a sample and it is commonly used to evaluate language models. It is defined as:

$$P(W) = \sqrt[N]{\prod_{i=1}^{N} \frac{1}{P(w_i| w_1,...,w_{i-1})}}$$

As an implementation hack, you would usually take the log of that formula (so the computation is less prone to underflow problems). You would also need to take care of the padding, since you do not want to include the padding when calculating the perplexity (to avoid an artificially good metric).

After taking the logarithm of $P(W)$ you have:

$$log P(W) = {\log\left(\sqrt[N]{\prod_{i=1}^{N} \frac{1}{P(w_i| w_1,...,w_{i-1})}}\right)}$$


$$ = \log\left(\left(\prod_{i=1}^{N} \frac{1}{P(w_i| w_1,...,w_{i-1})}\right)^{\frac{1}{N}}\right)$$

$$ = \log\left(\left({\prod_{i=1}^{N}{P(w_i| w_1,...,w_{i-1})}}\right)^{-\frac{1}{N}}\right)$$

$$ = -\frac{1}{N}{\log\left({\prod_{i=1}^{N}{P(w_i| w_1,...,w_{i-1})}}\right)} $$

$$ = -\frac{1}{N}{{\sum_{i=1}^{N}{\log P(w_i| w_1,...,w_{i-1})}}} $$


You will be working with a real example from this week's assignment. The example is made up of:
   - `predictions` : log probabilities for each element in the vocabulary for 32 sequences with 64 elements (after padding).
   - `targets` : 32 observed sequences of 64 elements (after padding).

In [34]:
# Load from .npy files
predictions = numpy.load('predictions.npy')
targets     = numpy.load('targets.npy')

# Cast to torch tensors
predictions = torch.from_numpy(predictions)
targets     = torch.from_numpy(targets)

# Print shapes
print(f'predictions has shape: {predictions.shape}')
print(f'targets has shape: {targets.shape}')



predictions has shape: torch.Size([32, 64, 256])
targets has shape: torch.Size([32, 64])


In [10]:
predictions[0,:,:].shape

torch.Size([64, 256])

In [11]:
predictions[0,:,:]


tensor([[-15.5800, -25.7356, -15.5769,  ..., -15.5747, -15.5715, -15.5694],
        [-24.0108, -35.8008, -23.7436,  ..., -23.8079, -23.7276, -23.8044],
        [-15.7837, -14.4168, -15.5128,  ..., -15.7292, -15.6716, -15.5321],
        ...,
        [-22.3767, -29.0965, -22.2665,  ..., -22.1575, -22.2124, -22.2859],
        [-23.1877, -39.6231, -23.0719,  ..., -23.0587, -22.9287, -23.1310],
        [-21.8435, -26.0352, -21.8776,  ..., -21.5768, -21.7424, -21.6944]])

In [12]:
targets.shape

torch.Size([32, 64])

In [13]:
targets[0,:]


tensor([105, 110,  32, 115, 117,  99, 104,  32, 100, 105, 115, 100,  97, 105,
        110, 102, 117, 108,  32, 109,  97, 110, 110, 101, 114,  32, 109, 101,
         32, 116, 111,  32, 119, 111, 111,  46,   1,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0], dtype=torch.int32)

In [16]:
# import torch
# import torch.nn.functional as F

# Example tensors
targets_     = torch.tensor([0, 2, 1, 3, 0])
predictions_ = torch.randn(5, 4)  # Random predictions tensor with shape [batch_size, num_classes]

# Convert targets to one-hot encoding
reshaped_targets_ = F.one_hot(targets_, num_classes=predictions_.shape[-1])

print(f'reshaped_targets has shape: {reshaped_targets_.shape}')


reshaped_targets has shape: torch.Size([5, 4])


In [17]:
reshaped_targets_


tensor([[1, 0, 0, 0],
        [0, 0, 1, 0],
        [0, 1, 0, 0],
        [0, 0, 0, 1],
        [1, 0, 0, 0]])

In [18]:
predictions_


tensor([[-0.0048,  0.2567, -0.2345, -0.0077],
        [ 0.9892,  1.3906, -1.2490,  0.2552],
        [-0.9308, -1.9889,  0.4647, -0.2742],
        [-0.4462, -0.7178, -1.8211,  2.3688],
        [ 1.7213, -0.5217, -0.1385,  1.1919]])

Notice that the predictions have an extra dimension with the same length as the size of the vocabulary used.

Because of this you will need a way of reshaping `targets` to match this shape. For this you can use `trax.layers.one_hot()`.

Notice that `predictions.shape[-1]` will return the size of the last dimension of `predictions`.

In [35]:

targets = targets.to(torch.int64)
# Convert targets to one-hot encoding
reshaped_targets = F.one_hot(targets, num_classes=predictions.shape[-1])
print(f'reshaped_targets has shape: {reshaped_targets.shape}')


reshaped_targets has shape: torch.Size([32, 64, 256])


In [20]:
reshaped_targets.shape


torch.Size([32, 64, 256])

In [21]:
reshaped_targets[0,:,:].shape

torch.Size([64, 256])

In [22]:
print(reshaped_targets[0,:,:])


tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [1, 0, 0,  ..., 0, 0, 0],
        [1, 0, 0,  ..., 0, 0, 0],
        [1, 0, 0,  ..., 0, 0, 0]])


In [52]:
targets.shape

torch.Size([32, 64])

In [54]:
targets[0,:]


tensor([105, 110,  32, 115, 117,  99, 104,  32, 100, 105, 115, 100,  97, 105,
        110, 102, 117, 108,  32, 109,  97, 110, 110, 101, 114,  32, 109, 101,
         32, 116, 111,  32, 119, 111, 111,  46,   1,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0])

In [55]:
F.one_hot(targets[0,:], num_classes=256)


tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [1, 0, 0,  ..., 0, 0, 0],
        [1, 0, 0,  ..., 0, 0, 0],
        [1, 0, 0,  ..., 0, 0, 0]])

In [56]:
F.one_hot(targets[0,:], num_classes=256).shape


torch.Size([64, 256])

In [57]:
F.one_hot(targets[0,:], num_classes=256)[0,:]

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [58]:
F.one_hot(targets[0,:], num_classes=256)[-1,:]

tensor([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [59]:
F.one_hot(targets[0,:], num_classes=256)[-2,:]

tensor([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [60]:
F.one_hot(targets[0,:], num_classes=256)[-28,:]


tensor([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [25]:
import numpy

# Define the 3D arrays for predictions_ and reshaped_targets_
predictions_ = numpy.array([[[0.2, 0.5, 0.3],
                             [0.1, 0.6, 0.3]],

                            [[0.4, 0.4, 0.2],
                             [0.3, 0.3, 0.4]]])

reshaped_targets_ = numpy.array([[[0, 1, 0],
                                  [1, 0, 0]],

                                 [[0, 0, 1],
                                  [1, 0, 0]]])

# Compute the element-wise multiplication
multiplied_ = predictions_ * reshaped_targets_

# Sum along the last axis (axis=-1) to get the probabilities of the true classes
log_p_ = numpy.sum(multiplied_, axis=-1)

# Print the results
print("Predictions:")
print(predictions_)
print("\nReshaped Targets:")
print(reshaped_targets_)
print("\nMultiplied (Element-wise):")
print(multiplied_)
print("\nlog_p (Sum along axis=-1):")
print(log_p_)


Predictions:
[[[0.2 0.5 0.3]
  [0.1 0.6 0.3]]

 [[0.4 0.4 0.2]
  [0.3 0.3 0.4]]]

Reshaped Targets:
[[[0 1 0]
  [1 0 0]]

 [[0 0 1]
  [1 0 0]]]

Multiplied (Element-wise):
[[[0.  0.5 0. ]
  [0.1 0.  0. ]]

 [[0.  0.  0.2]
  [0.3 0.  0. ]]]

log_p (Sum along axis=-1):
[[0.5 0.1]
 [0.2 0.3]]


In [26]:
predictions_.shape

(2, 2, 3)

In [32]:
log_p_.shape


(2, 2)

By calculating the product of the predictions and the reshaped targets and summing across the last dimension, the total log propbability of each observed element within the sequences can be computed:

In [36]:
log_p = torch.sum(predictions * reshaped_targets, axis= -1)
log_p
# ye jo cost function main log probability use hoti hay


tensor([[ -5.3965,  -1.0311,  -0.6692,  ..., -22.3767, -23.1877, -21.8435],
        [ -4.5858,  -1.1341,  -8.5380,  ..., -20.1569, -26.8371, -23.5750],
        [ -5.2224,  -1.2824,  -0.1731,  ..., -21.3282, -19.8544, -33.8844],
        ...,
        [ -5.3965, -17.2917,  -4.3608,  ..., -20.8258, -21.0658, -22.4431],
        [ -5.9313, -14.2474,  -0.2637,  ..., -26.7432, -18.3843, -22.3553],
        [ -5.6705,  -0.1060,   0.0000,  ..., -23.3325, -28.0874, -23.8788]])

In [30]:
log_p.shape


torch.Size([32, 64])

In [61]:
log_p[0,:]


tensor([-5.3965e+00, -1.0311e+00, -6.6917e-01, -3.0611e+00, -1.5259e+00,
        -2.9263e-02, -4.7684e-05, -5.9128e-05, -1.8858e+00, -1.3325e-01,
        -7.4234e-02, -6.0565e+00, -3.6373e-03, -3.4332e-05, -1.1921e-03,
        -2.2745e-01, -9.1667e-03,  0.0000e+00, -7.5340e-04, -4.2465e+00,
        -4.9812e-01, -1.8959e-03, -3.0287e+00, -1.9073e-06, -9.1553e-04,
        -2.7161e+00, -4.0007e+00, -2.0192e+00, -2.2533e-01, -1.1322e+00,
        -2.1546e+00, -2.7493e-02, -5.1561e+00, -6.8018e-01, -7.8403e-01,
        -7.3942e+00, -1.2638e-02, -2.9633e+01, -1.6817e+01, -2.1895e+01,
        -1.7741e+01, -1.7966e+01, -2.4173e+01, -2.0454e+01, -2.3524e+01,
        -2.6218e+01, -2.5755e+01, -1.9525e+01, -2.3176e+01, -2.2120e+01,
        -2.4267e+01, -2.2434e+01, -2.4107e+01, -2.3433e+01, -2.3169e+01,
        -2.2986e+01, -2.1350e+01, -2.3652e+01, -2.5622e+01, -2.5208e+01,
        -2.3868e+01, -2.2377e+01, -2.3188e+01, -2.1843e+01])

In [62]:
log_p[-1,:]


tensor([-5.6705e+00, -1.0595e-01,  0.0000e+00, -1.0608e+00, -1.6447e+00,
        -1.2009e+00, -3.8147e-06, -1.4402e-02, -3.8453e+00, -1.6022e-04,
        -1.1399e+00, -4.3221e-03, -4.1688e+00, -6.3017e-01, -6.8665e-04,
        -5.5828e-03,  0.0000e+00, -2.0252e-02, -8.0509e-01, -7.0782e-03,
        -3.8567e-03, -3.0117e-03, -3.8392e+00, -1.2981e+00, -7.8469e+00,
        -2.7008e-03, -2.7377e+00, -2.6719e+00, -2.0596e-02, -5.0461e-02,
         0.0000e+00,  0.0000e+00, -3.2772e-02, -5.2420e+00, -3.9330e-03,
        -5.1044e-02, -3.7328e-01, -3.8471e+00, -2.3570e+00, -2.9188e-02,
        -3.0518e-05, -2.6070e-02, -7.2814e-01, -3.8486e-01, -7.2529e-02,
        -2.2230e+00, -8.4734e+00, -1.9027e+01, -2.2784e+01, -1.8997e+01,
        -2.2537e+01, -2.5025e+01, -2.4043e+01, -2.2277e+01, -2.9224e+01,
        -2.5706e+01, -2.2654e+01, -3.1554e+01, -2.4682e+01, -2.6260e+01,
        -2.7990e+01, -2.3333e+01, -2.8087e+01, -2.3879e+01])

Now you will need to account for the padding so this metric is not artificially deflated (since a lower perplexity means a better model). For identifying which elements are padding and which are not, you can use `np.equal()` and get a tensor with `1s` in the positions of actual values and `0s` where there are paddings.

In [37]:
import torch

# Define the 2D tensor for targets
targets_ = torch.tensor([[1, 0, 2],
                        [0, 3, 4]])

# Compute the non-pad mask
non_pad_ = 1.0 - torch.eq(targets_, 0).float()

# Print the results
print("Targets:")
print(targets_)
print("\nNon-pad Mask:")
print(non_pad_)


Targets:
tensor([[1, 0, 2],
        [0, 3, 4]])

Non-pad Mask:
tensor([[1., 0., 1.],
        [0., 1., 1.]])


In [42]:
torch.eq(targets_, 0)


tensor([[False,  True, False],
        [ True, False, False]])

In [44]:
non_pad = 1.0 - torch.eq(targets, 0).float()
print(f'non_pad has shape: {non_pad.shape}\n')
print(f'non_pad looks like this: \n\n {non_pad}')


non_pad has shape: torch.Size([32, 64])

non_pad looks like this: 

 tensor([[1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.],
        ...,
        [1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.]])


In [45]:
targets


tensor([[105, 110,  32,  ...,   0,   0,   0],
        [ 97, 110, 110,  ...,   0,   0,   0],
        [111, 102,  32,  ...,   0,   0,   0],
        ...,
        [105,  32,  97,  ...,   0,   0,   0],
        [101, 100, 103,  ...,   0,   0,   0],
        [121, 111, 117,  ...,   0,   0,   0]])

In [47]:
non_pad.shape


torch.Size([32, 64])

In [49]:
non_pad[0,:]

tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [50]:
log_p.shape


torch.Size([32, 64])

In [51]:
log_p[0,:]


tensor([-5.3965e+00, -1.0311e+00, -6.6917e-01, -3.0611e+00, -1.5259e+00,
        -2.9263e-02, -4.7684e-05, -5.9128e-05, -1.8858e+00, -1.3325e-01,
        -7.4234e-02, -6.0565e+00, -3.6373e-03, -3.4332e-05, -1.1921e-03,
        -2.2745e-01, -9.1667e-03,  0.0000e+00, -7.5340e-04, -4.2465e+00,
        -4.9812e-01, -1.8959e-03, -3.0287e+00, -1.9073e-06, -9.1553e-04,
        -2.7161e+00, -4.0007e+00, -2.0192e+00, -2.2533e-01, -1.1322e+00,
        -2.1546e+00, -2.7493e-02, -5.1561e+00, -6.8018e-01, -7.8403e-01,
        -7.3942e+00, -1.2638e-02, -2.9633e+01, -1.6817e+01, -2.1895e+01,
        -1.7741e+01, -1.7966e+01, -2.4173e+01, -2.0454e+01, -2.3524e+01,
        -2.6218e+01, -2.5755e+01, -1.9525e+01, -2.3176e+01, -2.2120e+01,
        -2.4267e+01, -2.2434e+01, -2.4107e+01, -2.3433e+01, -2.3169e+01,
        -2.2986e+01, -2.1350e+01, -2.3652e+01, -2.5622e+01, -2.5208e+01,
        -2.3868e+01, -2.2377e+01, -2.3188e+01, -2.1843e+01])

By computing the product of the log probabilities and the non_pad tensor you remove the effect of padding on the metric:

In [63]:
real_log_p = log_p * non_pad
print(f'real log probabilities still have shape: {real_log_p.shape}')


real log probabilities still have shape: torch.Size([32, 64])


You can check the effect of filtering out the padding by looking at the two log probabilities tensors:

In [64]:
print(f'log probabilities before filtering padding: \n\n {log_p}\n')
print(f'log probabilities after filtering padding: \n\n {real_log_p}')


log probabilities before filtering padding: 

 tensor([[ -5.3965,  -1.0311,  -0.6692,  ..., -22.3767, -23.1877, -21.8435],
        [ -4.5858,  -1.1341,  -8.5380,  ..., -20.1569, -26.8371, -23.5750],
        [ -5.2224,  -1.2824,  -0.1731,  ..., -21.3282, -19.8544, -33.8844],
        ...,
        [ -5.3965, -17.2917,  -4.3608,  ..., -20.8258, -21.0658, -22.4431],
        [ -5.9313, -14.2474,  -0.2637,  ..., -26.7432, -18.3843, -22.3553],
        [ -5.6705,  -0.1060,   0.0000,  ..., -23.3325, -28.0874, -23.8788]])

log probabilities after filtering padding: 

 tensor([[ -5.3965,  -1.0311,  -0.6692,  ...,  -0.0000,  -0.0000,  -0.0000],
        [ -4.5858,  -1.1341,  -8.5380,  ...,  -0.0000,  -0.0000,  -0.0000],
        [ -5.2224,  -1.2824,  -0.1731,  ...,  -0.0000,  -0.0000,  -0.0000],
        ...,
        [ -5.3965, -17.2917,  -4.3608,  ...,  -0.0000,  -0.0000,  -0.0000],
        [ -5.9313, -14.2474,  -0.2637,  ...,  -0.0000,  -0.0000,  -0.0000],
        [ -5.6705,  -0.1060,   0.0000,  ...

Finally, to get the average log perplexity of the model across all sequences in the batch, you will sum the log probabilities in each sequence and divide by the number of non padding elements (which will give you the negative log perplexity per sequence). After that, you can get the mean of the log perplexity across all sequences in the batch.

In [66]:
log_ppx = torch.sum(real_log_p, axis=1) / torch.sum(non_pad, axis=1)
log_ppx = torch.mean(-log_ppx)
print(f'The log perplexity and perplexity of the model are respectively: {log_ppx} and {torch.exp(log_ppx)}')


The log perplexity and perplexity of the model are respectively: 2.621185541152954 and 13.752017974853516


**Congratulations on finishing this lecture notebook!** Now you should have a clear understanding of how to work with pytorch tensors and how to compute the perplexity to evaluate your language models. **Keep it up!**