## Entropy:
Entropy comes from Physics. It precisely measures howmuch freedom does a particle have to move around?<br>For example considering the 3 states of water:-
* Solid (Ice)
* Liquid (A drink)
* Gas (Vapour)

Thinking of the particles in Ice, Water and Vapour, we can surmise that:- 
* Ice is pretty rigid, in that its' particles don't have many places to go. They mostly stay where they are.
* Liquid is a lil less rigid as the particles have a few places to move around,
* Water vapour particles obviously have a lot more freedom to move around the atmosphere.

So Entropy measures precisely this, how much freedom does a particle have to move around. Thus the Entropy of ice is low, the Entropy of liquid water is medium and the Entropy of gaseous vapour is high.

In the field of probability it can be said that the more rigid or homogenous a set is, the less entropy it will have and vice-versa.

Another way to think of Entropy is in terms of knowledge, for example, if we were to pick a random ball from a bucket, how much do we know about the color composition of the balls in each bucket?
* **bucket** 1 = 4 red balls
* **bucket** 2 = 3 red, 1 blue balls
* **bucket** 3 = 2 red, 2 blue balls

In the 1st bucket, we know for sure that the ball will  be red, thus we have _high-knowledge_. In the 2nd bucket we know that the ball is more likely to be red, so we have _medium knowledge_, while in the 3rd bucket, the ball is as likely to be red as blue, so we have _little knowledge_ about the possible outcome.

Knowledge in this case is opposite to Entropy, in that, since we're more certain of the outcome of the 1st bucket, we have low-Entropy and medium-Entropy for bucket 2 and high-Entropy for bucket 1, whose outcome we're least certain of.

In [1]:
import numpy as np
import pandas as pd

### Calculating Entropy:
If we picked and dropped back each ball into the respective buckets, 
* What's the probability of picking 4 red balls from the 1st bucket:=> $1*1*1*1 = 1$, (100%)
* What's the probability of picking 3 red and 1 blue balls from the 2nd bucket:=> $0.75*0.75*0.75*0.25 = 0.105$, (10%)
* What's the probability of picking 2 red and 2 blue balls from the 3rd bucket:=> $0.5*0.5*0.5*0.5 = 0.0625$, (6%)

The problem of working with probabilities, which are rational numbers between 0 and 1 is that if we have many elements in each set or bucket, multiplying them over will lead to a very small irrational number difficult to work with. Therefore we need a way to avoid irrational numbers from repeated long, convoluted multiplications.

The solution is to take the $log_2{(x)}$ where $x$ is the computed probability.

Thus from our 3 buckets above:<br>
* $log_2{(1)} = 0$
* $log_2{(0.105)} = -3.2515$
* $log_2{(0.0625)} = -4$

Since most of the numbers will be negative, as the $log_2{(x)}$ is negative when $x$ < 1, we'd just take the negative log of these numbers. Therefore our values will now be
* 0
* 3.2515
* 4

Finally, we just take the average score for each value, by dividing each by 4, since there're 4 balls in each bucket
* $\frac{0}{4}=0$

* $\frac{3.2515}{4}=0.81$

* $\frac{4}{4}=1$

And by so doing, we have computed the Entropy for each bucket above.<br>To compute the Entropy, we take the **average of the negative of the log base 2 of the probabilities** of picking the balls in the right fashion from each bucket.

### Entropy Formula ideal for Multi-Class Entropy:

## $-\sum_{i=1}^np_ilog_2(p_i)$

Where $p_i$ is the probability of the $i$th item from $i=1$ through $n$

In [2]:
def calculate_entropy(*args):
    """This Method calculates the entropy
        of a set of values and returns it
    """
    total = sum(args)
    entropy = 0
    for val in args:
        temp = -(val/total)*np.log2(val/total)
        entropy+=temp
    
    return np.round(entropy, 4)

**EX**<br>
There are 5 red balls and 3 blue balls in a bucket. If 8 balls were taken one after another with replacement, calculate the entropy of picking exactly 5 red balls and 3 blue balls

In [3]:
r = 5
b = 3

calculate_entropy(r,b)

0.9544

In [4]:
# Find the entropy for a bucket that contains 4 red balls and 10 blue balls
r = 4
b = 10
calculate_entropy(r, b)

0.8631

In [5]:
# If we have a bucket with eight red balls, three blue balls, and two yellow balls, 
# what is the entropy of the set of balls? Input your answer to at least three decimal places.

r = 8
b = 3
y = 2

calculate_entropy(r, b, y)

1.3347

## Information Gain:

The formula for information gain is very simple... It's just the change in entropy. To be specific...<br>For every node in the decision tree, we can calculate the entropy of the data in the parent node and then we calculate the entropies of the two children nodes. The information gain is the difference between the entropy of the parent node and the average of the two children nodes. 