# Information Gain Ratio

In [None]:
import pandas as pd
import numpy as np

from typing import Dict, Any

from scipy.stats import entropy

from sklearn.metrics import mutual_info_score, adjusted_mutual_info_score, normalized_mutual_info_score

### Entropy

We define a helper function to compute the entropy $H$ of a random variable from a pandas Series:

In [None]:
def series_entropy(s: pd.Series) -> float:
    """Compute the entropy from samples in a Pandas series"""
    return entropy(s.value_counts(normalize=True))

### Information Gain Ratio

The *Information Gain* (also known as *Mutual Information*) of two random variables is defined as:

$$ IG(C; A) = H(C) - H(C|A). $$

IG can be used when constructing a decision tree as a criterion for choosing the variable on which to split the data, to form the next level of the tree.  In this setting, $C$ is the target class and $A$ is a proposed feature on which to split. The feature that maximises this quantity is chosen to split the data at the next level of the tree. 

One disadvantage with using information gain for this purpose is that features with a large number of distinct values in a variable can produce to a large value of IG, but choosing such features can typically lead to overfitting.

The *Information Gain Ratio* [1] is the ratio of the information gain to the entropy of the feature on which to split:

$$ IGR(C; A) = \frac{H(C) - H(C|A)}{H(A)} $$

Compared to IG, this is biased against features with a large number of distinct values (that have a large $H(A)$).

Below, we compute this from features contained in a dataframe `df_features` and a target class in the series `target`.

In [None]:
def igr(df_features: pd.DataFrame, target: pd.Series) -> Dict[Any, float]:
    """
    Calculate the information gain ratio for each feature in a dataframe

    Parameters
    ----------
    df_features : Dataframe
        The features for which the information gain ratio will be calculated
    target : Series
        The targets for which the information gain ratio with each feature will be calculated

    Returns
    -------
    A dictionary of feature names to information gain ratio, for each feature in df_features.
    """
    
    return {
        col: mutual_info_score(df_features[col], target) / series_entropy(df_features[col])
        for col in df_features.columns
    }

### References

- [1] Quinlan, J. Ross. "Induction of decision trees." Machine learning 1.1 (1986): 81-106 [(link)](https://doi.org/10.1007/BF00116251)
- [2] https://en.wikipedia.org/wiki/Information_gain_ratio
- [3] https://en.wikipedia.org/wiki/Information_gain_in_decision_trees
- [4] https://stats.stackexchange.com/questions/319590/what-is-the-range-of-information-gain-ratio/360901#360901