## SU (Symmetrical Uncertainty)

Symmetrical Uncertainty (SU) is a concept in information theory used to measure the relationship between two variables. The core idea of SU is based on information entropy and conditional entropy. Information entropy measures the uncertainty of a random variable, while conditional entropy represents the uncertainty of a random variable given another variable. SU assesses the relationship between two variables by comparing the ratio of these two information quantities.

Specifically, for two random variables X and Y, with information entropies denoted as $H(X)$ and $H(Y)$ respectively, and conditional entropy denoted as $H(X|Y)$, the SU is calculated using the following formula:
$$
SU(X, Y) = \frac{2 \times (H(X) - H(X|Y))}{H(X) + H(Y)}
$$
Here, $H(X|Y)$ is the conditional entropy of X given Y. The SU values range from 0 to 1, where 0 indicates no relationship between the two variables, and 1 indicates a complete relationship.

In [36]:
# calculation of SU
import numpy as np
from skfeature.utility.mutual_information import su_calculation

# load data and normalize it
data = np.load("../data/features_train.npy")
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)
data = (data-mean)/std
label = np.load("../data/simu_20000_0.1_90_140_train.npy")[:,1004] #S

# transform the data to fit the SU calculation algorithm
data = data * 10000
data = data.astype(int)

su = []
for i in range(data.shape[1]):
    su = np.append(su, su_calculation(data[:,i], label))
print(np.argsort(su)[::-1]) #descend 0 1 2 should be at the forefront

[45 18  3 14 29 47 11 48 32 36 20 51 37 19 17 49 35 50 26  4 22 12 46 24
 21  8  5  7 39 15 13  6 34 23 44 38 40 31 42 16 25 10 33  2 43  1 41  0
 52 27 30 28  9]


## FCBF (Fast Correlation-Based Filter)

FCBF is a feature selection algorithm designed for efficiently selecting relevant features in high-dimensional datasets. The algorithm focuses on identifying features that exhibit high correlation with the target variable while minimizing redundancy among the selected features. Its advatanges are computaional efficiency and consideration of redundancy.

### Steps:
1.Symmetrical Uncertainty Calculation: Compute the Symmetrical Uncertainty (SU) for each feature with respect to the target variable. SU is an information-theoretic metric that quantifies the relationship between two variables while considering the entropy of both.

2.Sort Features by SU: Rank features based on their SU values to determine their relevance to the target variable.

3.Initialize Result Set: Create an initial set to store the ultimately selected features.

4.Iterate Through Features: Select the feature with the highest SU from the sorted list and add it to the result set.

5.Remove Feature Redundancy: For features already added to the result set, remove highly redundant features. Redundancy is measured by computing the SU between selected features.

6.Repeat Selection and Removal: Iterate through the process of feature selection and redundancy removal until the desired number of features is selected or until no more features are available for selection.

In [15]:
# FCBF
import numpy as np
from skfeature.function.information_theoretical_based import FCBF

# load data and normalize it
data = np.load("../data/features_train.npy")
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)
data = (data-mean)/std
label = np.load("../data/simu_20000_0.1_90_140_train.npy")[:,1004] #S

# transform the data to fit the FCBF algorithm
data = data * 10000
data = data.astype(int)

selected_features = FCBF.fcbf(data, label)[0]

print(selected_features)

[45]


## Mutual Information

Mutual information measures the dependency between the two variables, that is, the reduction in entropy after knowing the information of another variable. Compared with Pearson correlation & F-Score, it also captures non-linear relationships.
 
  
For discrete distributions (for both x and y):  
$$
I(x, y) = H(Y) - H(Y|X) = \sum_{x\in \mathit{X}}  \sum_{x\in \mathit{Y}} \textit{p}_{(X,Y)}(x,y) \textrm{log}(\frac{\textit{p}_{(X,Y)}(x,y)}{\textit{p}_{X}(x)\textit{p}_{Y}(y)})
$$
Where $\textit{p}_{(X,Y)}(x,y)$ is the joint probability mass function (PMF) of x and y, $\textit{p}_{X}(x)$ is the PMF of x.

For continues distribution (for both x and y):  
$$
I(X, Y) = H(Y) - H(Y|X) = \int_X \int_Y  \textit{p}_{(X,Y)}(x,y) \textrm{log}(\frac{\textit{p}_{(X,Y)}(x,y)}{\textit{p}_{X}(x)\textit{p}_{Y}(y)}) \, \, dx dy 
$$
Where $\textit{p}_{(X,Y)}(x,y)$ is the joint probability density function (PDF) of x and y, $\textit{p}_{X}(x)$ is the PDF of x. In the continues situation, we usually bin the continues data first then treat them as discrete data.

In [37]:
# calculation of MI
from sklearn.feature_selection import mutual_info_regression
import numpy as np

# load data and normalize it
data = np.load("../data/features_train.npy")
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)
data = (data-mean)/std
label = np.load("../data/simu_20000_0.1_90_140_train.npy")[:,1004] #S

# transform the data to fit the MI calculation algorithm
data = data * 10000
data = data.astype(int)

mi = []
for i in range(data.shape[1]):
    mi = np.append(mi, mutual_info_regression(data[:,i].reshape(-1,1), label))
print(np.argsort(mi)[::-1]) # 0 1 2 should at the forefront

[10 15  1 41 13 48 34 18 29 32 31  3 25 45 17 50 49 47 36 16  7 28 19 44
 42 26 22  6  8 39  4 14  2 46 33  5 40 12 35 23 11 51 37 21 24 52 27 43
 38  9 20 30  0]


## MRMR (Max-Relevance Min-Redundancy)

The mRMR method tries to find a subset of features that have a higher association (MI) with the target variable while at the same time have lower inter-association with all the features already in the subset. It is a step-wise method, at each step, the feature $X_i, (X_i \notin  S)$ with the highest feature importance score $f^{mRMR}(X_i)$ will be added to the subset until reach desired number of features in the subset. 

Formula:  
$$
f^{mRMR}(X_i) = I(Y, X_i) - \frac{1}{|S|}\sum_{X_s \in S} I(X_s, X_i)
$$
where $I(Y, X_i)$ is the MI between feature $X_i$ and target variable. $\frac{1}{|S|}\sum_{X_s \in S} I(X_s, X_i)$ is the average MI between feature $X_i$ and all the features already in the subset.

In [29]:
# mRMR
import numpy as np
from skfeature.function.information_theoretical_based import MRMR

# load data and normalize it
data = np.load("../data/features_train.npy")
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)
data = (data-mean)/std
label = np.load("../data/simu_20000_0.1_90_140_train.npy")[:,1004] #S

# transform the data to fit the FCBF algorithm
data = data * 1000 #1000
data = data.astype(int)

selected_features = 15
selected_features,_,_ = MRMR.mrmr(data, label, n_selected_features = selected_features)

print(selected_features)

[ 3  9 28  2 20 27 43 48 52  1 47 30 41 37 33]
