# Mutual Info 

Mutual information (MI) is a statistical measure commonly used in machine learning and information theory to quantify the degree of dependency or correlation between two random variables. Specifically, MI measures the amount of information that one variable provides about the other variable.

In the context of machine learning, MI can be used for feature selection, where it helps to identify which features are most informative for a given task. For example, in a classification problem, features with high MI values are likely to be more useful for distinguishing between different classes.

MI values range from 0 to a maximum value that depends on the range and distribution of the variables. A higher MI value indicates a stronger dependency between the variables.

MI is a useful tool in machine learning because it is model-agnostic, meaning it can be used with any type of model or data. However, it may not capture all types of dependencies between variables, such as nonlinear relationships or interactions between multiple variables.

$$I(X;Y) = \sum\limits_{y = 1}^{n}\sum\limits _{x = 1}^{n} ._{P(X , Y)}(x , y)log(\frac {_{P_{(X, Y)}}(x , y)}{_{P_X}(x) _{P_Y}(y)})$$$$= \sum\limits_{y = 1}^{n}\sum\limits _{x = 1}^{n} ._{P_{X|Y-_{y}}}(x)_{P_{Y}}log\frac {_{P_{X|Y-_{y}}}(x)_{P_{Y}}(y)}{_{P_{X}}(x)_{P_{Y}}(y)}$$$$= \sum\limits_{y = 1}^{n}._{P_{Y}}(y)\sum\limits_{x = 1}^{n}._{P_{X|Y-_{y}}}log\frac{_{P_{X|Y-_{y}}}(x)}{_{P_{X}}(x)}$$$$= \sum\limits_{y = 1}^{n}._{P_{Y}}(y)D_{KL}(_{P_{X|Y=_{y}}||P_{X}})$$$$= E_Y[D_{KL}(_{P_{X|Y=_{y}}||P_{X}})]$$$$= \sum\limits_{x = 1}^{n}\sum\limits_{y = 1}^{n}._{P}(x , y)log\Bigg[\frac{_{P}(x , y)}{_P(x)_P{y}}\Bigg]$$

**Note** 
* This notebook is higly inspired from the [Youtube Video](https://www.youtube.com/watch?v=eJIp_mgVLwE), kudos to the person for being a great teacher
* Pandas was only used to view the dataframe in a convinient way

In [None]:
import numpy as np 
import pandas as pd

Lets assume we have a data like this 

In [None]:
data = np.array([["Yes" , 1.77 , "..." , "Yes"] , 
                ["Yes" , 1.32 , "..." , "Yes"] , 
                ["Yes" , 1.81 , "..." , "Yes"] , 
                ["No" , 1.56 , "..." , "No"] , 
                ["No" , 1.64 , "..." , "Yes"]] )

We will view this in a dataset format, for better viewing experience

In [None]:
data = pd.DataFrame(data , columns = ["Loves Popcorn" , "Height" , "ETC" , "Loves Troll 2"])

In [None]:
data

Unnamed: 0,Loves Popcorn,Height,ETC,Loves Troll 2
0,Yes,1.77,...,Yes
1,Yes,1.32,...,Yes
2,Yes,1.81,...,Yes
3,No,1.56,...,No
4,No,1.64,...,Yes


Lets first change these binary values to something numerical, This is more convinient to see. This method is also called as `One Hot Encoding`, if you want to know more about this, here is the link to a [notebook](https://github.com/AyushSinghal9020/Machine-Learning-From-Scratch/tree/main/Encoding%20Techniques/One%20Hot%20Encoder), where the technique is implemmented from scratch and explained step by step

In [None]:
data["Loves Troll 2"] = np.where(data["Loves Troll 2"] == "Yes" , 1 , 0)
data["Loves Popcorn"] = np.where(data["Loves Popcorn"] == "No" , 0 , 1)

In [None]:
data

Unnamed: 0,Loves Popcorn,Height,ETC,Loves Troll 2
0,1,1.77,...,1
1,1,1.32,...,1
2,1,1.81,...,1
3,0,1.56,...,0
4,0,1.64,...,1


Now we need to make the margianl and joint probability matrix for each column in the dataset. 

Wait !!! , What is this `marginal and joint Probablity matrix`???

<img src = "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRRwfeIoJuGqrJNrIl0LZHb1x4BkMpNClcuUj4t8V7k9jG69etokemnNNDz13atx8YnL_o&usqp=CAU">

If you are not able to see this figure clearly here is more clear form

||Pass|Fail|Total||
|---|---|---|---|---|
|Male|46|56|102|$P_{male} = 0.51$
|Female|68|30|98|$P_{female} = 0.49$
|Total|114|86|200|
||$P_{passed} = 0.57$|$P_{failed} = 0.43$

The probablities in the middle are `joint probablities` and the probablities at the margin are the `marginal probablities`

But how can we calcualte this ??

One way we can reduce our work is to do math `marginal probablities are just the summision of joint probalites` or $$marginal_-probablites = \sum\limits joint_-probablities$$ in the photo above, you can imagine this as ($46 + 56 = 102)$ or $(46 + 68 = 114)$

Lets now get back to our original dataset, we have a look a like data of the photo above, and thus we can do transition in the values real quick 

In [None]:
data

Unnamed: 0,Loves Popcorn,Height,ETC,Loves Troll 2
0,1,1.77,...,1
1,1,1.32,...,1
2,1,1.81,...,1
3,0,1.56,...,0
4,0,1.64,...,1


As we are going to make a $3x3$ matrix, we first need to define it. For instance lets assume this is a zero matrix. 

In [None]:
outcome_matrix = np.zeros(shape = (3,3))

In [None]:
outcome_matrix

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

Now we will increament the values wherever needed, for this we just need to iterate over the list and check for some condtions

In [None]:
for i in range(data.shape[0]):pass

The condtions will be 

In [None]:
for i in range(data.shape[0]):
    if data["Loves Troll 2"][i] == 1:
        if data["Loves Popcorn"][i] == 1:pass
        else : pass
    else : 
        if data["Loves Popcorn"][i] == 1:pass
        else :pass

Now we just need to do some processing when the particualr condtions are fullfullied, by processing I mean to do changes in the `outcome_values`, but the `outcome_values` is a $3x3$ matrix, how can we define where to make changes, our soultion can be found in the condtions only, the condtions define the location of the values to be incremented 

In [None]:
for i in range(data.shape[0]):
    if data["Loves Troll 2"][i] == 1:
        if data["Loves Popcorn"][i] == 1: outcome_matrix[0 , 0] += 1
        else : outcome_matrix[0 , 1] += 1
    else : 
        if data["Loves Popcorn"][i] == 1:outcome_matrix[1 , 1] += 1
        else :outcome_matrix[1 , 0] += 1

In [None]:
outcome_matrix

array([[3., 1., 0.],
       [1., 0., 0.],
       [0., 0., 0.]])

For calculating probablities we just need to divide every element of the matrix with the total values.

In [None]:
probablity_matrix = outcome_matrix / data.shape[0]

In [None]:
probablity_matrix

array([[0.6, 0.2, 0. ],
       [0.2, 0. , 0. ],
       [0. , 0. , 0. ]])

But this is just joint probablity, we have not yet found the marginal probablity, as we have discussed before, marginal prbablity is just the summision of joint probablities, so we just need to run $2$ nested loops for this 

In [None]:
for i in range(2):
    for j in range(2):
        probablity_matrix[i , 2] += probablity_matrix[i , j]
        probablity_matrix[2 , i] += probablity_matrix[j , i]

In [None]:
probablity_matrix

array([[0.6, 0.2, 0.8],
       [0.2, 0. , 0.2],
       [0.8, 0.2, 0. ]])

For calculating the mutual information between two columns we have the formula$$MI = \sum\limits_{x = 1}^{n}\sum\limits_{y = 1}^{n}._{P}(x , y)log\Bigg[\frac{_{P}(x , y)}{_P(x)_P{y}}\Bigg]$$now we just need to code this 

For this we just need to iterate over the loop and increment the mi value everytime 

In [None]:
mi = 0 
for i in range(2):
    for j in range(2):
        mi += (probablity_matrix[i , j] * (np.log1p(probablity_matrix[i , j] / (probablity_matrix[2 , j] * probablity_matrix[i , 2]))))

In [None]:
mi

0.7212111758337505

In [None]:
1 - mi

0.2787888241662495

Now we just need to make a proper function for this 

In [None]:
def binary_mi(dataframe , target , features):

    mi_list = []

    if len(dataframe[target].value_counts.index) > 2:
        
        print("Please enter binary values")
    
    else :
       
        for feature in features:
    
            if len(dataframe[feature].value_counts.index) > 2:
                
                print("Please enter binary values")
            
            else :
            
                outcome_matrix = np.zeros(shape = (3 , 3))
            
                for indices in range(dataframe.shape[0]):
                    
                    if dataframe[target][indices] == dataframe[target].value_counts.index[0]:
                    
                        if dataframe[feature][indices] == dataframe[target].value_counts.index[0]: 
                    
                            outcome_matrix[0 , 0] += 1
                    
                        else : 
                    
                            outcome_matrix[0 , 1] += 1
                    
                    else : 
                    
                        if dataframe[feature][indices] == dataframe[target].value_counts.index[1]:
                    
                            outcome_matrix[1 , 1] += 1
                    
                        else :
                    
                            outcome_matrix[1 , 0] += 1
                
                probablity_matrix = outcome_matrix / data.shape[0]
                
                for first_index in range(2):
                    
                    for second_index in range(2):
                    
                        probablity_matrix[first_index , 2] += probablity_matrix[first_index , second_index]
                        probablity_matrix[2 , first_index] += probablity_matrix[second_index , first_index]
                
                mi = 0 
                
                for first_indices in range(2):
                    for second_indeices in range(2):
                        mi += (probablity_matrix[first_indices , second_indeices] * (np.log1p(probablity_matrix[first_indices , second_indeices] / (probablity_matrix[2 , second_indeices] * probablity_matrix[first_indices , 2]))))
        mi_list.append(mi) 

    return mi_list               
            

Now we have made the `mututal info` for classification, now or next task is to meake it for regressor, in regressor, we do nothing great, we just make bins of the values, and then calacuate the `mututal info` with the same technique, lets do some chages in the function we made and then re implement it 

In [None]:
def mi(dataframe , target , features):

    mi_list = []

    if len(dataframe[target].value_counts.index) > 2:
        
        for feature in features:
    
            if len(dataframe[feature].value_counts.index) > 2:
                
                outcome_matrix = np.zeros(shape = (len(dataframe[features].value_counts().index)) , 
                                          (len(dataframe[features].value_counts().index)))
            
            else :
            
                outcome_matrix = np.zeros(shape = (3 , 3))
            
            for indices in range(dataframe.shape[0]):
                
                if dataframe[target][indices] == dataframe[target].value_counts.index[0]:
                
                    if dataframe[feature][indices] == dataframe[target].value_counts.index[0]: 
                
                        outcome_matrix[0 , 0] += 1
                
                    else : 
                
                        outcome_matrix[0 , 1] += 1
                
                else : 
                
                    if dataframe[feature][indices] == dataframe[target].value_counts.index[1]:
                
                        outcome_matrix[1 , 1] += 1
                
                    else :
                
                        outcome_matrix[1 , 0] += 1
            
            probablity_matrix = outcome_matrix / data.shape[0]
            
            for first_index in range(2):
                
                for second_index in range(2):
                
                    probablity_matrix[first_index , 2] += probablity_matrix[first_index , second_index]
                    probablity_matrix[2 , first_index] += probablity_matrix[second_index , first_index]
            
            mi = 0 
            
            for first_indices in range(2):
                
                for second_indeices in range(2):
                
                    mi += (probablity_matrix[first_indices , second_indeices] * (np.log1p(probablity_matrix[first_indices , second_indeices] / (probablity_matrix[2 , second_indeices] * probablity_matrix[first_indices , 2]))))
        
        mi_list.append(mi) 

    
    else :
       
        for feature in features:
    
            if len(dataframe[feature].value_counts.index) > 2:
                
                outcome_matrix = np.zeros(shape = (len(dataframe[frature].value_couns().index)) , 
                                          len(dataframe[frature].value_couns().index)))
            
            else :
            
                outcome_matrix = np.zeros(shape = (3 , 3))
            
            for indices in range(dataframe.shape[0]):
                
                if dataframe[target][indices] == dataframe[target].value_counts.index[0]:
                
                    if dataframe[feature][indices] == dataframe[target].value_counts.index[0]: 
                
                        outcome_matrix[0 , 0] += 1
                
                    else : 
                
                        outcome_matrix[0 , 1] += 1
                
                else : 
                
                    if dataframe[feature][indices] == dataframe[target].value_counts.index[1]:
                
                        outcome_matrix[1 , 1] += 1
                
                    else :
                
                        outcome_matrix[1 , 0] += 1
            
            probablity_matrix = outcome_matrix / data.shape[0]
            
            for first_index in range(2):
                
                for second_index in range(2):
                
                    probablity_matrix[first_index , 2] += probablity_matrix[first_index , second_index]
                    probablity_matrix[2 , first_index] += probablity_matrix[second_index , first_index]
            
            mi = 0 
            
            for first_indices in range(2):
        
                for second_indeices in range(2):
        
                    mi += (probablity_matrix[first_indices , second_indeices] * (np.log1p(probablity_matrix[first_indices , second_indeices] / (probablity_matrix[2 , second_indeices] * probablity_matrix[first_indices , 2]))))
        
        mi_list.append(mi) 

    return mi_list               
            