In binary classification, where there are two classes (0 and 1), the cross-entropy loss for a single sample can be calculated as follows:

Let y be the true label (0 or 1)
Let p be the predicted probability of the positive class (class 1)
Then, the cross-entropy loss for that sample is given by: -(y * log(p) + (1-y) * log(1-p))

In [3]:
import numpy as np

def cross_entropy_loss(y_true, y_pred):
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    
    # Calculate number of samples
    m = y_true.shape[0]
    
    # Calculate cross-entropy loss
    loss = -(1/m) * np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    
    return loss


In machine learning, entropy is a measure of the impurity or uncertainty of a dataset. It is a concept that is commonly used in decision tree algorithms and other models that involve binary or multi-class classification.

In binary classification, entropy is used to calculate the homogeneity of a set of data points that belong to two classes. The entropy is calculated as:

H = -p * log2(p) - (1 - p) * log2(1 - p)

where p is the proportion of data points that belong to one of the two classes.

In multi-class classification, entropy is calculated similarly, but is based on the proportions of data points that belong to each class. The entropy formula for multi-class classification is:

H = -sum(p_i * log2(p_i))

In [4]:
import math

def entropy(p):
    if p == 0 or p == 1:
        return 0
    else:
        return -p * math.log2(p) - (1 - p) * math.log2(1 - p)

# Example usage
p = 0.7 # Proportion of positive data points
H = entropy(p)
print(f"Entropy: {H}")


Entropy: 0.8812908992306927


In [5]:
def entropy(probabilities):
    probabilities = np.array(probabilities)
    
    # Calculate entropy
    entropy = -np.sum(probabilities * np.log2(probabilities))
    
    return entropy

p = 0.7 # Proportion of positive data points
H = entropy(p) 
print(f"Entropy: {H}")


Entropy: 0.3602012209808308


Mutual information is a measure of the amount of information that one random variable provides about another random variable. In machine learning, mutual information is often used to select features that are most relevant to a particular task.

To calculate mutual information between two variables, we need to calculate the entropy of each variable and the conditional entropy of one variable given the other variable. We can do this using the entropy function that we defined earlier as well


In [8]:
def mutual_information(x, y):
    x = np.array(x)
    y = np.array(y)
    
    # Calculate joint probabilities
    joint_prob = np.histogram2d(x, y)[0] / float(x.size)
    
    # Calculate marginal probabilities
    x_prob = np.histogram(x)[0] / float(x.size)
    y_prob = np.histogram(y)[0] / float(y.size)
    
    # Calculate entropies
    h_x = -np.sum(x_prob * np.log2(x_prob))
    h_y = -np.sum(y_prob * np.log2(y_prob))
    h_xy = -np.sum(joint_prob * np.log2(joint_prob))
    
    # Calculate mutual information
    mi = h_x + h_y - h_xy
    
    return mi


Conditional entropy is a measure of the amount of uncertainty in a random variable given the value of another random variable. In other words, it measures the amount of uncertainty in a variable after we have observed the value of another variable.

To calculate conditional entropy in Python, we can use the entropy function that we defined earlier to calculate the entropy of each variable and the joint entropy of the two variables.

In [9]:
def conditional_entropy(x, y):
    x = np.array(x)
    y = np.array(y)
    
    # Calculate joint probabilities
    joint_prob = np.histogram2d(x, y)[0] / float(x.size)
    
    # Calculate marginal probabilities
    y_prob = np.histogram(y)[0] / float(y.size)
    
    # Calculate conditional probabilities
    cond_prob = joint_prob / y_prob
    
    # Calculate conditional entropy
    cond_entropy = -np.sum(cond_prob * np.log2(cond_prob))
    
    return cond_entropy


KL divergence, is a measure of the difference between two probability distributions. It measures how much information is lost when one distribution is used to approximate another distribution. KL divergence is commonly used in machine learning to compare a predicted probability distribution with a true probability distribution.

In [10]:
def kl_divergence(p, q):
    p = np.array(p)
    q = np.array(q)
    
    # Avoid division by zero
    p = np.clip(p, 1e-8, None)
    q = np.clip(q, 1e-8, None)
    
    # Calculate KL divergence
    kl_div = np.sum(p * np.log2(p / q))
    
    return kl_div
