# Find most important feature in a model predictions using influence functions

Pang Wei Koh and Percy Lian show in their ["Understanding Black-box Predictions via Influence Functions"](https://arxiv.org/pdf/1703.04730.pdf) (ICML 2017) that influence functions can be used to approximate the learning effect of training data onto the predictions.
An extension of this is the ability to approximate the effect of a given perturbation on a training point.

The original authors demonstrate how this also enables engineering adversarial training attack. The attack consists in finding the most influential training input for a given test point prediction. Then approximating the perturbation that has the maximally negative learning effect on learning the test point.

The author suggest the possibility of using influence functions to approximate the effect of perturbing the training point has a way to analyse the training effects of features.


We use the original code from the offical authors repository for the model classes and various utilities. We use the Kaggle [Titanic dataset](https://www.kaggle.com/c/titanic/data) to experiment on the idea of using influence functions to approximate the effect of features on learning in a blackbox model.

**Plan**

1. Preprocess data
2. Train model (logistic regression)
3. Test model
4. Engineer adversarial training training data to degrade the performance of the model.
5. Analyse the noise added to the feature as a characterization of the influence of feature in learning for the model.





3. For each correctly labelled test point:
       Find most their influential training points
       Approximate the perturbation effect on each point
       
+ For each feature:
	For each train point:
	    Get influence on loss of perturbation of given training point on given feature
+ Get stats:
	+ Average by feature of their influence on z-test
	+(other stats)
    

### Preparation of the dataset

In [1]:
import pandas as pd
import numpy as np

fpath_titanic = "/home/eolus/Desktop/Dauphine/datamining/projets/blackBox/data/train.csv"
train_df = pd.read_csv(fpath_titanic)

train_df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [2]:
def extract_prefix(name):
    import re
    try:
        return re.search('(Mr\.)|(Mrs\.)|(Miss\.)', name).group()
    except:
        return ""


train_df['Prefix'] = train_df.Name.apply(extract_prefix)

for cat_col in ['Sex', 'Embarked', 'Prefix' ]:
    train_df[cat_col] = pd.factorize(train_df[cat_col])[0]
    
train_df['Age'].fillna(train_df.Age.mean(), inplace=True)    

In [3]:
features = ['Prefix', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']

X = np.array(train_df[features])
y = np.array((train_df.Survived > 0).astype('int32'))

In [4]:
# Scale
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Need to train/test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [5]:
# Create dataset objects
import sys
sys.path.append("..")

#from influence indataset as dataset
from influence.dataset import DataSet
import numpy as np
lr_train = DataSet(X_train, np.array(y_train, dtype=int))
lr_test = DataSet(X_test, np.array(y_test, dtype=int))
lr_validation = None

import tensorflow as tf
from tensorflow.contrib.learn.python.learn.datasets import base
lr_data_sets = base.Datasets(train=lr_train, validation=lr_validation, test=lr_test)

### Train model of reference

In [6]:
from influence.binaryLogisticRegressionWithLBFGS import BinaryLogisticRegressionWithLBFGS

num_classes = 2
input_dim = len(features)

weight_decay = 0.01
batch_size = 100
initial_learning_rate = 0.001 
keep_probs = None
decay_epochs = [1000, 10000]
max_lbfgs_iter = 1000

tf.reset_default_graph()

tf_model = BinaryLogisticRegressionWithLBFGS(
    input_dim=input_dim,
    weight_decay=weight_decay,
    max_lbfgs_iter=max_lbfgs_iter,
    num_classes=num_classes, 
    batch_size=batch_size,
    data_sets=lr_data_sets,
    initial_learning_rate=initial_learning_rate,
    keep_probs=keep_probs,
    decay_epochs=decay_epochs,
    mini_batch=False,
    train_dir='tmp',
    log_dir='tmp',
    model_name='titanic')

tf_model.train()


Using TensorFlow backend.


Instructions for updating:
keep_dims is deprecated, use keepdims instead
Total number of parameters: 8
Using normal model
LBFGS training took [10] iter.
After training with LBFGS: 


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


In [7]:
# Retrieve test predictions and reference labels
preds_p = tf_model.get_preds().tolist()
preds = [1 if el[0] < 0.5 else 0 for el in preds_p]
ref = tf_model.data_sets.test.labels

# True/False - Positives/Negatives    
true_pos = [(i, p) for i, p in enumerate(preds_p) if p[0] < p[1] and ref[i] == 1]
true_neg = [(i, p) for i, p in enumerate(preds_p) if p[0] > p[1] and ref[i] == 0]
false_pos = [(i, p) for i, p in enumerate(preds_p) if p[0] < p[1] and ref[i] == 0]
false_neg = [(i, p) for i, p in enumerate(preds_p) if p[0] > p[1] and ref[i] == 1]

# Confusion matrix data
print("true_positives:", len(true_pos))
print("true_negatives:", len(true_neg))
print("false_positives", len(false_pos))
print("false_negatives", len(false_neg))

# Sort true_positives and true_negatives by how confident the model is
true_pos_top = sorted(true_pos, key=lambda x : x[1][0], reverse=False)
true_neg_top = sorted(true_neg, key=lambda x : x[1][0], reverse=True)

# Sample down (top 10)
true_pos_top = true_pos_top[:30]
true_neg_top = true_neg_top[:30]

true_positives: 100
true_negatives: 137
false_positives 38
false_negatives 20


## Find most influential train points

For each true positive / true negative prediction for which the model predicted with high confidence, we approximate which train points are most responsible for the prediction.

In [8]:
def get_top_train_influence(idx):
    """
    Approximate most influential train points for a test point
    idx : index of test point
    """
    num_train = len(tf_model.data_sets.train.labels)
    influences = tf_model.get_influence_on_test_loss(
        [idx], 
        np.arange(len(tf_model.data_sets.train.labels)),
        force_refresh=True) * num_train
    influences_sorted = sorted(enumerate(influences),
                               key=lambda x:x[1],
                               reverse=True)
    influences_sorted = influences_sorted[:10]
    return influences_sorted

In [15]:
import warnings
warnings.filterwarnings('ignore')

# Get test points indices
true_pos_top_idx = [top_pos[0] for top_pos in true_pos_top]
true_neg_top_idx = [top_neg[0] for top_neg in true_neg_top]

# Approximate most influential train points for each test point
influence_train_true_pos = [get_top_train_influence(idx) for idx in true_pos_top_idx]
influence_train_true_neg = [get_top_train_influence(idx) for idx in true_neg_top_idx]

         Current function value: -0.026063
         Iterations: 4
         Function evaluations: 89
         Gradient evaluations: 82
         Hessian evaluations: 19
Optimization terminated successfully.
         Current function value: -0.030229
         Iterations: 6
         Function evaluations: 7
         Gradient evaluations: 12
         Hessian evaluations: 21
Optimization terminated successfully.
         Current function value: -0.064258
         Iterations: 6
         Function evaluations: 7
         Gradient evaluations: 12
         Hessian evaluations: 22
         Current function value: -0.030219
         Iterations: 5
         Function evaluations: 134
         Gradient evaluations: 126
         Hessian evaluations: 26
         Current function value: -0.070613
         Iterations: 5
         Function evaluations: 74
         Gradient evaluations: 69
         Hessian evaluations: 28
         Current function value: -0.024056
         Iterations: 4
         Function evalu

## Find most important features for each prediction using the influence perturbation function

For each influential train point, we get the gradient of influence wrt to input to estimate the perturbation that 

In [10]:
def get_top_pert_influence(train_indices, test_idx):
    """
    Approximate most grad of influence wrt training points in order to
    find most important feature.
    """
    influences_grad = tf_model.get_grad_of_influence_wrt_input(
        train_indices, [test_idx], force_refresh=False)
    return influences_grad

In [11]:
true_pos_influences_grad = [] 
for test_idx, train_top in zip(true_pos_top_idx, influence_train_true_pos):
    train_indices = [r[0] for r in train_top]
    top_pert_influence = get_top_pert_influence(train_indices, test_idx)
    true_pos_influences_grad.append(top_pert_influence)

true_neg_influences_grad = [] 
for test_idx, train_top in zip(true_neg_top_idx, influence_train_true_neg):
    train_indices = [r[0] for r in train_top]
    top_pert_influence = get_top_pert_influence(train_indices, test_idx)
    true_neg_influences_grad.append(top_pert_influence)

### Analyze: mean var

In [18]:
len(true_pos_influences_grad[0][0])

8

### Benchmark feature importance

In [12]:
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFECV

estimator = LogisticRegression()
selector = RFECV(estimator, step=3, cv=3)
selector = selector.fit(X_train, y_train)
selector.score(X_test, y_test)

0.8203389830508474

In [13]:
selector.estimator_.coef_[0]

array([ 0.38991178, -0.71842588,  1.00059846, -0.29151786, -0.37872978,
       -0.0958028 ,  0.1934243 ,  0.16377468])

In [14]:
features_importance = zip(features, selector.estimator_.coef_[0])
sorted_logit_coefs = sorted(features_importance, key=lambda x : abs(x[1]), reverse=True)
print("\nLogistic Regression Coeffs")
print("============================")
for el in sorted_logit_coefs:
    print(el)

    
print("\nInfluence function")
print("============================")
for el in sorted_logit_influence:
    print(el)


Logistic Regression Coeffs
('Sex', 1.0005984599032838)
('Pclass', -0.7184258754192752)
('Prefix', 0.3899117807908663)
('SibSp', -0.3787297807952296)
('Age', -0.2915178644724024)
('Fare', 0.19342430434726862)
('Embarked', 0.16377467883922647)
('Parch', -0.09580280432420257)

Influence function


NameError: name 'sorted_logit_influence' is not defined

### Average influence on all test points

In [None]:
def get_feature_influence_pert(test_idx, tf_model):
    
    num_train = len(tf_model.data_sets.train.labels)
    
    influences_grad = tf_model.get_grad_of_influence_wrt_input(
        np.arange(num_train),
        test_idx, 
        force_refresh=False)

    influence_grad_top_features = influences_grad[top_influence_ix, :]

    avg_inf_pert = [np.mean(col) for col in influence_grad_top_features.T]
    features_avg_inf = zip(features, avg_inf_pert)

    sorted_logit_influence = sorted(features_avg_inf, key=lambda x : abs(x[1]), reverse=True)
    
    print()
    for el in sorted_logit_influence:
        print(el)

In [None]:
test_idx = list(range(50))
get_feature_influence_pert(test_idx, tf_model)

### Measure influence on multiple test points of a similar label

In [None]:
# Find indices of sampleztest with Survived == 1
#[y_test == 1]


sample_ix_pos_y = np.where(y_test == 1)[0].tolist()[:50]
sample_ix_neg_y = np.where(y_test == 0)[0].tolist()[:50]


print("Class : Survived")
print("===================")
get_feature_influence_pert(sample_ix_pos_y, tf_model)

print("\n\nClass : Died miserably")
print("===================")
get_feature_influence_pert(sample_ix_neg_y, tf_model)

In [None]:
### Faire analyse de prior

#+ Fare value given Dead / Alive
#+ PClass value given dead / alive

In [None]:
import matplotlib.pyplot
import pylab

x = X_test[:, features.index('Fare')]
y = y_test

matplotlib.pyplot.scatter(y, x)

matplotlib.pyplot.show()

In [None]:
len(np.where(X_test[:, features.index('Fare')] <0 )[0])

In [None]:
import matplotlib.pyplot
import pylab

x = X_test[:, features.index('Pclass')]
y = y_test

matplotlib.pyplot.scatter(y, x)

matplotlib.pyplot.show()