###  Created by Luis Alejandro (alejand@umich.edu)

## MLP Feature Sensitivity to Posterior Probability (FSPP)
Computes a wrapper feature ranking especifically designed for MLP neural networks using the algorithm proposed by:

https://ieeexplore.ieee.org/abstract/document/5282531

and briefly compares to Mutual Information (MI) raking criterion

In [1]:
import numpy as np

from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import cross_validate
from sklearn.metrics import accuracy_score
from sklearn.pipeline import Pipeline

import sys
sys.path.append('../')
from utils.feature_selection.fspp import get_fspp
from utils.feature_selection.mutual import MutualInfo
from utils.feature_selection.mutualI import MutualInfoI
from utils.feature_selection.reports import report_feature_ranking

In [2]:
# Load dataset
dataset = datasets.load_breast_cancer()
print(dataset.feature_names, end="\n")
print(dataset.target_names)
predictors = dataset.data
responses = dataset.target

['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
['malignant' 'benign']


In [3]:
# Splits into training/test sets
X,X_holdout,y,y_holdout = train_test_split(predictors,responses,
                                           test_size = 0.3,
                                           random_state = 0,
                                           stratify=responses)

In [4]:
# Defines model
sc = StandardScaler()
clf = MLPClassifier(hidden_layer_sizes=(30))
estimators = [('normalizer', sc), ('classifier', clf)]
pipe = Pipeline(estimators)
results = cross_validate(pipe,X,y,cv = 5,scoring = ['accuracy'], n_jobs=-1,
                         return_estimator=True, return_train_score=True)
print('\nTime training (Avg): ', results['fit_time'].mean())
print('\nTraining Metrics: ')
print('Accuracy (Avg): ', '%.2f' % results['train_accuracy'].mean())
print('\nValidation Metrics: ')
print('Accuracy (Avg): ', '%.2f' % results['test_accuracy'].mean())

best_pipe = results['estimator'][results['test_accuracy'].argmin()]
y_pred = best_pipe.predict(X_holdout)
print('\nTest Metrics: ')
print('Accuracy: ', '%.2f' % accuracy_score(y_pred,y_holdout))


Time training (Avg):  0.21363577842712403

Training Metrics: 
Accuracy (Avg):  0.99

Validation Metrics: 
Accuracy (Avg):  0.98

Test Metrics: 
Accuracy:  0.95


In [5]:
rank = get_fspp(best_pipe,X)
report_feature_ranking(rank,dataset.feature_names,10)

Feature ranked 1 is (worst texture) with value 0.066886
Feature ranked 2 is (mean perimeter) with value 0.053913
Feature ranked 3 is (mean radius) with value 0.047033
Feature ranked 4 is (worst symmetry) with value 0.044616
Feature ranked 5 is (worst smoothness) with value 0.042552
.
.
.

Feature ranked 26 is (compactness error) with value 0.011899
Feature ranked 27 is (perimeter error) with value 0.010737
Feature ranked 28 is (mean area) with value 0.009811
Feature ranked 29 is (mean compactness) with value 0.007259
Feature ranked 30 is (worst fractal dimension) with value 0.007106


In [6]:
mi = MutualInfo(X,y,n_jobs=-1)
rank = mi.compute()
report_feature_ranking(rank,dataset.feature_names,10)

Using parallel version
Feature ranked 1 is (worst perimeter) with value 0.708011
Feature ranked 2 is (worst radius) with value 0.685644
Feature ranked 3 is (worst concave points) with value 0.667807
Feature ranked 4 is (mean concave points) with value 0.658822
Feature ranked 5 is (worst area) with value 0.638709
.
.
.

Feature ranked 26 is (mean fractal dimension) with value 0.077595
Feature ranked 27 is (fractal dimension error) with value 0.072675
Feature ranked 28 is (symmetry error) with value 0.048220
Feature ranked 29 is (texture error) with value 0.035799
Feature ranked 30 is (smoothness error) with value 0.032073


In [7]:
mi = MutualInfoI(X,y,n_jobs=-1)
rank = mi.compute()
report_feature_ranking(rank,dataset.feature_names,10)

Using parallel version
Feature ranked 1 is (worst perimeter) with value 0.203446
Feature ranked 2 is (worst radius) with value 0.181079
Feature ranked 3 is (worst concave points) with value 0.163241
Feature ranked 4 is (mean concave points) with value 0.154257
Feature ranked 5 is (worst area) with value 0.134144
.
.
.

Feature ranked 26 is (mean fractal dimension) with value -0.426971
Feature ranked 27 is (fractal dimension error) with value -0.431890
Feature ranked 28 is (symmetry error) with value -0.456345
Feature ranked 29 is (texture error) with value -0.468767
Feature ranked 30 is (smoothness error) with value -0.472493
