## Gini Vs Entropy

    Gini index and entropy is the criterion for calculating information gain. Decision tree algorithms use information gain to split a node. 
    Both Gini and Entropy are measures of impurity of a node. A node having multiple classes is impure whereas a node having only one class is pure. 
    


    Decision trees recursively split features with regard to their target variable’s “purity”. The entire algorithm is designed to optimize each split on maximizing purity… What is purity? Purity can be thought of as how homogenized the groupings are. For example, 
    
        > If we have 4 red gumballs and 0 blue gumballs, that group of 4 is 100% pure, based on color as the target.
        
        > If we have 2 red and 2 blue, that group is 100% impure.
    
        > If we have 3 red and 1 blue, that group is either 75% or 81% pure, if we use Gini or Entropy respectively.
        
    Why does this matter? Depending on which impurity measurement is used, tree classification results can vary. This can make small (or sometimes large!) impact on your model!
    
![](ginientropy.jpg)        

### Gini
    
    Let’s start with Gini Index, as it’s a bit easier to understand. According to Wikipedia, the goal is to “measure how often a randomly chosen element from the set would be incorrectly labeled”[1].
    
    To visualize this, let’s go back to the gumball examples. If we decided to arbitrarily label all 4 gumballs as red, how often would one of the gumballs be incorrectly labeled?
    
#### 4 red and 0 blue:
    
![](giniindex.png)        

    The impurity measurement is 0 because we would never incorrectly label any of the 4 red gumballs here. If we arbitrarily chose to label all the balls ‘blue’, then our index would still be 0, because we would always incorrectly label the gumballs.
    
    The gini score is always the same no matter what arbitrary(random) class you take the probabilities of because they always add to 0 in the formula above.
    
    A gini score of 0 is the most pure score possible.
    
#### 2 red and 2 blue:

![](giniindex2.png)

    The impurity measurement is 0.5 because we would incorrectly label gumballs wrong about half the time. Because this index is used in binary target variables (0,1), a gini index of 0.5 is the least pure score possible. Half is one type and half is the other. Dividing gini scores by 0.5 can help intuitively understand what the score represents. 0.5/0.5 = 1, meaning the grouping is as impure as possible (in a group with just 2 outcomes).
    
#### 3 red and 1 blue:

![](giniindex3.png)

    The impurity measurement here is 0.375. If we divide this by 0.5 for more intuitive understanding we will get 0.75, which is the probability of incorrectly/correctly labeling.

## Entropy

    Entropy is more computationally heavy due to the log in the equation. Like gini, The basic idea is to gauge the disorder of a grouping by the target variable. Instead of utilizing simple probabilities, this method takes the log base2 of the probabilities. The entropy equation uses logarithms because of many advantageous properties. The Main advantage is the additive property it provides.
    
    Let’s visualize how entropy works with the same gumball scenarios:
    
#### 4 red and 0 blue:

![](entropyindex.png)    

    Unsurprisingly, the impurity measurement is 0 for entropy as well. This is the max purity score using information entropy.
    
#### 2 red and 2 blue:

![](entropyindex2.png)    
    
    The impurity measurement is 1 here, as it’s the maximum impurity obtainable.
    
#### 3 red and 1 blue:

![](entropyindex3.png)    

    The purity/impurity measurement is 0.811 here, a bit worse than the gini score.

## Information Gain
    
    Information gain is why impurity is so important. Once we derive the impurity of the dataset, we can see how much information is gained as we go down the tree and measure the impurity of the nodes.
    
    IG = information before splitting (parent) — information after splitting (children)
    
![](IGgini.png)    

    the image below shows how information gain was calculated for a decision tree with Gini criterion.
    
![](IGentropy.png)

    The image below shows how information gain was calculated for a decision tree with entropy.