# Information Theory in the world of Machine Learning

## Entropy

The entropy of a random variable is the average level of uncertainty associated with the variables potential state\
The measure of the expected amount of information to describe the state of the variable condisering the distribution of probabilities across all potential states

In [6]:
from typing import List
from math import log2

def entropy(probabilities: List[int])->int:
    H = -sum(p * log2(p) for p in probabilities if p > 0)  
    return H
    

In [7]:
probabilities = [0.25, 0.25, 0.25,0.25]

try:
    sum(probabilities) == 1
except:
    print("Error: The probabilities are not valid")
    
print(entropy(probabilities=probabilities))
    
    

2.0


## Shanon Entropy

This is the measure of the average amount of information contained in a message\
It quantifies the unpredictability of info content

In [38]:
import numpy as np

def shannon_entropy(data):
    chars, counts = np.unique(data, return_counts=True)

    # Count of the unique characters in the message
    char_counts = list(zip(chars, counts))
    print("Count of the unique characters in the message:")
    for char, count in char_counts:
        print(f"('{char}', {count})")

    # Compute Shannon entropy
    probabilities = counts / len(data)
    return -np.sum(probabilities * np.log2(probabilities))

# Example: Calculate Shannon entropy for a text message
message = "Hello world"
print(f"Shannon entropy of '{message}': {shannon_entropy(list(message)):.2f} bits")

Count of the unique characters in the message:
(' ', 1)
('H', 1)
('d', 1)
('e', 1)
('l', 3)
('o', 2)
('r', 1)
('w', 1)
Shannon entropy of 'Hello world': 2.85 bits


## Entropy in Machine Learning

Since entropy is the measure of uncertainty and the objective of ML is to minimize the uncertainty the two are linked

### Information gain

This is the measure of the reduction in Entropy achieved by splitting a dataset according to a particular feature (this is used in tree algorithms to select the features)\
This is the amount of information a feature can provide about a class

Example:\
We have a dataset with cancerous (C) and non cancerous cells (NC)


In [51]:
import pandas as pd
import numpy as np

# Data for the DataFrame
data = {
    'Samples': ['C1', 'C2', 'C3', 'C4', 'NC1', 'NC2', 'NC3'],
    'Mutation 1': [1, 1, 1, 0, 0, 0, 1],
    'Mutation 2': [1, 1, 0, 1, 0, 1, 1],
    'Mutation 3': [1, 0, 1, 1, 0, 0, 0],
    'Mutation 4': [0, 1, 1, 0, 0, 0, 0]
}

# Create the DataFrame
df = pd.DataFrame(data, index=None)

# Print the DataFrame
print(df)

  Samples  Mutation 1  Mutation 2  Mutation 3  Mutation 4
0      C1           1           1           1           0
1      C2           1           1           0           1
2      C3           1           0           1           1
3      C4           0           1           1           0
4     NC1           0           0           0           0
5     NC2           0           1           0           0
6     NC3           1           1           0           0


We can create a very simple decision tree with 1 parent node which is highly impute with all the features and 2 pure child nodes one with just the cancerous cells and the other one all the non cancerous cells\
Then we wanna know how to split the data in order to classify the future nodes the best we can (which means than the node childs 1 and 2 must be as pure a possible)

 **Parent Node:** The parent node is represented with its high impurity 
* **Child Nodes:** The child nodes are indented using bullet points and indicate the outcome (True or False) of the parent node's decision.
* **Class and Probability:** Each child node includes the predicted class (e.g., Class A) and its associated probability (e.g., p1).

In [66]:
# We can calculate the Entropy for the feature Mutation 1
prob_zeros = (df['Mutation 1'] == 0).sum()/df['Mutation 1'].shape[0]
prob_ones = (df['Mutation 1'] == 1).sum()/df['Mutation 1'].shape[0]
print(prob_zeros)
print(prob_ones)

0.42857142857142855
0.5714285714285714
