# Reference

Section: 5 \
Lecture: 21 \
Title: Entropy and Cross-Entropy \
TCS Udemy Reference Link: https://tcsglobal.udemy.com/course/deeplearning_x/learn/lecture/27841944 \
Udemy Reference Link: \
Pre-Requisite:

# Entropy and Cross-Entropy

## Formula and Notations

### Entropy

The entropy $H(X)$ of a discrete random variable $( X $) is defined as:

$$
H(X) = -\sum_{i=1}^{n} P(x_i) \log P(x_i)
$$

#### Terms:

- $ H(X)$ : Entropy of the random variable $ X $.
- $n$ : Number of possible outcomes.
- $P(x_i)$ : Probability of outcome $x_i$.
- $x_i$ : Possible outcomes of the random variable $X$.

### Cross-Entropy

The cross-entropy $L $ between the true distribution $( Y $) and the predicted distribution $( P $) is defined as:

$$
L = -\sum_{i=1}^{N} y_i \log(p_i)
$$

#### Terms:

- $ L $: Cross-entropy loss.
- $ N $: Number of classes or outcomes.
- $ y_i $: True label (1 if the class is the correct label, 0 otherwise).
- $ p_i $: Predicted probability of class $ i $.

**In both the formula, -ve sign is there, because $p_i$ is between [0,1] and log of something between 0 and 1 is -ve.**

High entropy means that the dataset has a log of variability. Low entropy means that most of the values of the dataset repeat(and therefore are redundant).

**How does entropy differs from variance?**
Both Entropy and Variance is a measure of distribution. But, I did not understand this

## Binary Cross Entropy using numpy

In [None]:
import numpy as np
import torch

In [None]:
# Entropy Calculation
x = [.20,.70, 0.10] # This is 1 event/1 experiment/1 set of datapoint
print(np.sum(x)) # Hence almost 1

H = 0
for p in x:
  H -= p*np.log(p)

print('Entropy: ' + str(H))

In [None]:
# Moreover If there are only 2 possibilities, then its called Binary Entropy
x = [0.75, 0.25] # 2 possibilities in 1 event/ 1 experiment/ 1 set of datapoint
H = -( x[0]*np.log(x[0]) + x[1]*np.log(x[1]) )
H

In deep learning, we use cross-entropy as a loss function because it measures the difference between two probability distributions: the true probabilities derived from the training data and the predicted probabilities generated by the model. Since these probabilities are typically different, cross-entropy is employed to quantify this discrepancy.

In [None]:
labels = [1, 0]
model_output = [0.75, 0.25]

H = -( labels[0]*np.log(model_output[0]) + labels[1]*np.log(model_output[1]) )
H

## Binary Cross Entropy using torch

In [None]:
import torch

In [None]:
labels = torch.tensor([1, 0], dtype=torch.float32)
model_output = torch.tensor([0.75, 0.25], dtype=torch.float32)

In [None]:
# calculating manually
H = -( labels[0]*torch.log(model_output[0]) + labels[1]*torch.log(model_output[1]) )
H

In [None]:
# Calculating using inbuilt torch function
binary_cross_entropy = torch.nn.BCELoss()
binary_cross_entropy(model_output, labels) # Binary cross entropy of torch is a bit sensitive to what is the order in which we pass the parameter.