# Information Theory in the world of Machine Learning

## Entropy

The entropy of a random variable is the average level of uncertainty associated with the variables potential state\
The measure of the expected amount of information to describe the state of the variable condisering the distribution of probabilities across all potential states

In [38]:
from typing import List
from math import log2

def entropy(probabilities: List[float]) -> float:
    
    H = -sum(p * log2(p) for p in probabilities if p > 0)
    
    return H
    

In [39]:
probabilities: List[float] = [0.25, 0.25, 0.25,0.25]

try:
    sum(probabilities) == 1
except:
    print("Error: The probabilities are not valid")
    
print(entropy(probabilities=probabilities))
    
    

2.0


## Shanon Entropy

This is the measure of the average amount of information contained in a message\
It quantifies the unpredictability of info content

In [43]:
import numpy as np
from typing import Union

def shannon_entropy(data: Union[List[Union[float, str, int]], str])->float:
    
    chars, counts = np.unique(data, return_counts=True)

    # Count of the unique characters in the message
    char_counts = list(zip(chars, counts))
    print("Count of the unique characters in the message:")
    for char, count in char_counts:
        print(f"('{char}', {count})")

    # Compute Shannon entropy
    probabilities = counts / len(data)
    return -np.sum(probabilities * np.log2(probabilities))

# Example: Calculate Shannon entropy for a text message
message1 = "Hello world"
# Example: Calculate Shannon entropy for a boolean message
message2 = [1,0,1,1,0,1,1,0]

print(f"Shannon entropy of '{message1}': {shannon_entropy(list(message1)):.2f} bits")
print(f"Shannon entropy of '{message1}': {shannon_entropy(list(message2)):.2f} bits")


Count of the unique characters in the message:
(' ', 1)
('H', 1)
('d', 1)
('e', 1)
('l', 3)
('o', 2)
('r', 1)
('w', 1)
Shannon entropy of 'Hello world': 2.85 bits
Count of the unique characters in the message:
('0', 3)
('1', 5)
Shannon entropy of 'Hello world': 0.95 bits


## Entropy in Machine Learning

Since entropy is the measure of uncertainty and the objective of ML is to minimize the uncertainty the two are linked

### Information gain

This is the measure of the reduction in Entropy achieved by splitting a dataset according to a particular feature (this is used in tree algorithms to select the features)\
This is the amount of information a feature can provide about a class

Example:\
We have a dataset with cancerous (C) and non cancerous cells (NC)


In [46]:
import pandas as pd
import numpy as np
from typing import Dict

# Data for the DataFrame
data: Dict[str, Union[str, float]] = {
    'Samples': ['C1', 'C2', 'C3', 'C4', 'NC1', 'NC2', 'NC3'],
    'Mutation 1': [1, 1, 1, 0, 0, 0, 1],
    'Mutation 2': [1, 1, 0, 1, 0, 1, 1],
    'Mutation 3': [1, 0, 1, 1, 0, 0, 0],
    'Mutation 4': [0, 1, 1, 0, 0, 0, 0]
}

# Create the DataFrame
df = pd.DataFrame(data, index=None)

# Print the DataFrame
print(df)

  Samples  Mutation 1  Mutation 2  Mutation 3  Mutation 4
0      C1           1           1           1           0
1      C2           1           1           0           1
2      C3           1           0           1           1
3      C4           0           1           1           0
4     NC1           0           0           0           0
5     NC2           0           1           0           0
6     NC3           1           1           0           0


We can create a very simple decision tree with 1 parent node which is highly impute with all the features and 2 pure child nodes one with just the cancerous cells and the other one all the non cancerous cells\
Then we wanna know how to split the data in order to classify the future nodes the best we can (which means than the node childs 1 and 2 must be as pure a possible)

 **Parent Node:** The parent node is represented with its high impurity (4C + 3NC)
* **Child Nodes Left:** Pure node with only Cancerous cells (P=4/7)
* **Child Nodes Right:** Pure node with only Non Cancerous cells (P=3/7)

In [29]:
# Definition of the variables to calculate the entropy
sum_elements_mut1: int = df['Mutation 1'].shape[0]
sum_zeros_in_mut1: int = (df['Mutation 1'] == 0).sum()
sum_ones_in_mut1: int = (df['Mutation 1'] == 1).sum()

prob_NC_mut1: float  = sum_zeros_in_mut1 / sum_elements_mut1
prob_C_mut1: float = sum_ones_in_mut1 / sum_elements_mut1

# Display the probabilities for the cancerous and non cancerous cells
print(f"Probabilities of the cancerous cells for {df['Mutation 1'].name}: {prob_C_mut1}")
print(f"Probabilities of the cancerous cells for {df['Mutation 1'].name}: {prob_NC_mut1}")

try:
    prob_C_mut1 + prob_NC_mut1 == 1.0
except:
    print("Error: The probabilities do not add up to 1")
    
# Calculate the entropy of the parent node

feature_nodes_mut1: List[float] = [prob_C_mut1, prob_NC_mut1]
H_parent_node_mut1 = entropy(feature_nodes_mut1)
print(H_parent_node_mut1) 

Probabilities of the cancerous cells for Mutation 1: 0.5714285714285714
Probabilities of the cancerous cells for Mutation 1: 0.42857142857142855
0.9852281360342516
