<a href="https://colab.research.google.com/github/bptripp/ai-course/blob/main/degree_of_mixing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Quantifying the Mixing of Outcomes
Decision trees split examples so that the outcomes are as uniform as possible within each group. Different decision tree algorithms minimize different measures of non-uniformity, including Gini impurity and information-theoretic entropy. For this course it is not necessary to understand exactly how these measures work. However, if you are interested, read on below for more detail on entropy.

Entropy is the degree of uncertainty in a variable (e.g. the decision to walk or not). This kind of entropy is related to but distinct from thermodynamic “entropy”, which is the degree of disorder in a system. Information-theoretic entropy is highest when different outcomes are equally likely, and zero when the outcome is certain. A unit of entropy is called a “bit”. A fair coin toss that has not yet occurred has one bit of entropy, and its outcome has one bit of information.

To calculate entropy:
1.	List the probabilities of different outcomes. For example, in a coin toss, the probabilities of heads and tails are both 0.5. We could write this fact as $P_H=P_T=0.5$.
2.	Calculate the base-2 logarithm of each probability, i.e. $\log_2P$. The base-2 logarithm of a number is the exponent of 2 that produces that number. $2^{-1}=0.5$, so the $\log_20.5=-1$. A probability of 1 (the highest possible probability) has a logarithm of 0. Lower probabilities have negative logarithms. A probability of 0 has a logarithm of -∞.
3.	The entropy is the negative of the sum of each probability times its base-2 logarithm. For a coin toss, this is $E = -(P_H\log_2P_H + P_T\log_2P_T) = -(0.5\times -1 + 0.5 \times -1) = 1$.
4.	To calculate the overall entropy when we split the examples into two groups, we must add up the entropy for each group multiplied by the probability that an example falls into the group.

Consider the entropy of the walking examples if they are split according to time. The entropy of the Yes group is,

$$
\begin{aligned}
E_{Yes} &= -(P_{Walk}\log_2P_{Walk} + P_{No\_Walk}\log_2P_{No\_Walk}) \\
  &= -(2/3\log_22/3 + 1/3\log_21/3) \\
  &= -(2/3(-0.59) + 1/3(-1.59)) \\
  &= 0.92
\end{aligned}
$$

The entropy of the No group is zero, because all the outcomes in this group are the same.

To calculate the total entropy, note that there are 6/9 examples in the Yes group with entropy $E_{Yes}=0.92$, and 3/9 examples in the No group with $E_{No}=0$. So the total entropy is $E=(6/9).92+(3/9)0=.61$. This is substantially lower than the entropy of a coin toss. This means that if we split according to Time, the decision will be substantially more certain than the outcome of a coin toss.   

The code below calculates this result. If you like, you can change the numbers to see how they affect the entropy.  

In [1]:
import numpy as np

def entropy(probabilities):
  """
  :param probabilities: list of probabilities of each possible outcome
  :return: entropy of the outcome
  """
  probabilities = np.array(probabilities) / np.sum(probabilities) # make sure probabilities sum to one
  return - np.sum(probabilities * np.log2(np.clip(probabilities, 1e-12, 1))) # clip prevents divide by 0

In [None]:
group1_negative = 2 #number of cases with time=yes, walk=no
group1_positive = 4 #number of cases with time=yes, walk=yes
group2_negative = 3 #number of cases with time=no, walk=no
group2_positive = 0 #number of cases with time=no, walk=yes

group1 = group1_negative + group1_positive
group2 = group2_negative + group2_positive
total = group1 + group2

entropy_group1 = entropy([group1_negative, group1_positive])
entropy_group2 = entropy([group2_negative, group2_positive])
print('Entropy of Group 1: {}'.format(entropy_group1))
print('Entropy of Group 2: {}'.format(entropy_group2))

total_entropy = group1/total*entropy_group1 + group2/total*entropy_group2
print('Total entropy: {}'.format(total_entropy))

Entropy of Group 1: 0.9182958340544896
Entropy of Group 2: -0.0
Total entropy: 0.612197222702993
