Reference : https://github.com/atulpatelDS/Youtube/blob/main/Feature_Engineering/Feature%20Selection%20using%20Mutual%20Information%20-%20Tutorial%206.ipynb

<span style="color:green"> **Relation between 'Information Gain' and 'Mutual Information'**

Mutual Information and Information Gain are the same thing, although the context or usage of the measure often gives rise to the different names.

- Mutual information is a measure of dependence or “mutual dependence” between two random variables(x and y). 
- It measures the amount of information obtained about one variable through observing the other variable. In other
  words, it determines how much we can know about one variable by understanding another—it’s a little bit like 
  correlation, but mutual information is more general.
- In machine learning, mutual information measures how much information the presence/absence of a feature contributes
  to making the correct prediction on Y.
- Mutual information (MI)between two random variables is a non-negative value, which measures the dependency between
  the variables. It is equal to zero if and only if two random variables are independent, and higher values mean higher
  dependency.
- The mutual information between two random variables X and Y can be stated formally as follows:

  **I(X ; Y) = H(X) – H(X | Y)**
  
  - Where I(X ; Y) is the mutual information for X and Y, 
  - H(X) is the entropy for X and H(X | Y) is the conditional entropy for X given Y.
  
- Mutual information is a measure of dependence or “mutual dependence” between two random variables. As such, the 
  measure is symmetrical, meaning that I(X ; Y) = I(Y ; X).

In [1]:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.decomposition import PCA

def MAE(y_true,y_pred):
    return round(mean_absolute_error(y_true,y_pred),2)

In [2]:
num = 400
target = 'y2'
df_fulldata = pd.read_csv("./src/generated_data.csv").drop(columns = ['trend_data','season_data','noise_data','actual','y','y1','actual_y1','actual_y2'])
df_train = df_fulldata.loc[0:df_fulldata.shape[0]-num-1]
df_test = df_fulldata.loc[df_fulldata.shape[0]-num:]

X_train = df_train.drop(columns=target)
y_train = df_train[target]
X_test = df_test.drop(columns=target)
y_test = df_test[target]

X_train.shape,X_test.shape

((1059, 84), (400, 84))

In [3]:
scaler = StandardScaler()
pca = PCA(n_components=0.95)
mm = MinMaxScaler()

X_mm_train = mm.fit_transform(X_train)
X_mm_test = mm.transform(X_test)

y_mm_train = mm.fit_transform(y_train.to_numpy().reshape(-1,1))
y_mm_test = mm.transform(y_test.to_numpy().reshape(-1,1))

X_pca_train = pca.fit_transform(scaler.fit_transform(X_train))
X_pca_test = pca.transform(scaler.transform(X_test))

df_pca_train = pd.DataFrame(X_pca_train)
df_pca_test = pd.DataFrame(X_pca_test)

In [4]:
base_model = LinearRegression().fit(X_train, y_train)
# Returning the R^2 for the model
base_model_r2 = base_model.score(X_train, y_train)
y_pred = base_model.predict(X_train)
print(f'R^2: {base_model_r2:4f}')
print(f"MAE train : {MAE(y_pred, y_train)}")

base_model_r2 = base_model.score(X_test, y_test)
y_pred = base_model.predict(X_test)
print(f'R^2: {base_model_r2:4f}')
print(f"MAE test : {MAE(y_pred, y_test)}")

R^2: 0.877497
MAE train : 8.41
R^2: 0.479804
MAE test : 8.77


# Mutual Information(MI)

In [34]:
from sklearn.feature_selection import SelectKBest, mutual_info_regression

mir = mutual_info_regression(X_train,y_train)
mrs_score = pd.Series(mir,index=X_train.columns)
mrs_score.sort_values(ascending=False)

ma_365        1.027428
sum_365       1.027428
sum_180       1.007886
ma_180        1.007886
ma_6          0.969920
                ...   
diff_ma_8     0.000000
diff_8        0.000000
diff_ma_7     0.000000
diff_7        0.000000
diff_ma_12    0.000000
Length: 84, dtype: float64

In [32]:
mir = SelectKBest(score_func=mutual_info_regression, k=40)
df_mir_train = mir.fit_transform(X_train,y_train)
df_mir_test = mir.transform(X_test)
df_mir_train.shape,df_mir_test.shape

((1059, 40), (400, 40))

In [31]:
mi_model = LinearRegression().fit(df_mir_train, y_train)
# Returning the R^2 for the model
mi_model_r2 = mi_model.score(df_mir_train, y_train)
y_pred = mi_model.predict(df_mir_train)
print(f'R^2: {mi_model_r2:4f}')
print(f"MAE train : {MAE(y_pred, y_train)}")

mi_model_r2 = mi_model.score(df_mir_test, y_test)
y_pred = mi_model.predict(df_mir_test)
print(f'R^2: {mi_model_r2:4f}')
print(f"MAE test : {MAE(y_pred, y_test)}")

R^2: 0.874163
MAE train : 8.6
R^2: 0.489106
MAE test : 8.65


In [33]:
mi_model = LinearRegression().fit(df_mir_train, y_train)
# Returning the R^2 for the model
mi_model_r2 = mi_model.score(df_mir_train, y_train)
y_pred = mi_model.predict(df_mir_train)
print(f'R^2: {mi_model_r2:4f}')
print(f"MAE train : {MAE(y_pred, y_train)}")

mi_model_r2 = mi_model.score(df_mir_test, y_test)
y_pred = mi_model.predict(df_mir_test)
print(f'R^2: {mi_model_r2:4f}')
print(f"MAE test : {MAE(y_pred, y_test)}")

R^2: 0.874237
MAE train : 8.59
R^2: 0.489558
MAE test : 8.67


# Feature list

In [36]:
feature_ls = mrs_score.sort_values(ascending=False).index[:40]
feature_ls

Index(['ma_365', 'sum_365', 'sum_180', 'ma_180', 'ma_6', 'sum_7', 'ma_8',
       'ma_10', 'ma_4', 'max_14', 'sum_60', 'ma_60', 'ma_12', 'max_7',
       'max_30', 'ma_2', 'ma_14', 'sum_14', 'max_60', 'ma_30', 'sum_30',
       'max_180', 'max_365', 'lag_0', 'lag_1', 'lag_2', 'lag_180', 'lag_365',
       'lag_4', 'lag_3', 'lag_9', 'lag_5', 'lag_11', 'lag_7', 'lag_6', 'lag_8',
       'lag_12', 'lag_13', 'lag_10', 'lag_14'],
      dtype='object')

In [37]:
mi_model = LinearRegression().fit(X_train[feature_ls], y_train)
# Returning the R^2 for the model
mi_model_r2 = mi_model.score(X_train[feature_ls], y_train)
y_pred = mi_model.predict(X_train[feature_ls])
print(f'R^2: {mi_model_r2:4f}')
print(f"MAE train : {MAE(y_pred, y_train)}")

mi_model_r2 = mi_model.score(X_test[feature_ls], y_test)
y_pred = mi_model.predict(X_test[feature_ls])
print(f'R^2: {mi_model_r2:4f}')
print(f"MAE test : {MAE(y_pred, y_test)}")

R^2: 0.874237
MAE train : 8.59
R^2: 0.489558
MAE test : 8.67
