# Permutation feature importance

### Definition

Permutation feature importance is a model inspection technique that can be used for any fitted estimator when the data is tabular. This is especially useful for non-linear or opaque estimators. The permutation feature importance is defined to be the decrease in a model score when a single feature value is randomly shuffled. This procedure breaks the relationship between the feature and the target, thus the drop in the model score is indicative of how much the model depends on the feature. This technique benefits from being model agnostic and can be calculated many times with different permutations of the feature

### Realization

Write the function that will read the data and the number of important features and return the false positives with original fratures, feature importance score_orig-score_perm, and fp with the given number of important features
1. Read the credit data set (train and test)
2. Predict the test set with Decision Tree Classifier and return the original model score and the number of false positives
3. Iterate through all the features np.random.permutation and calculate the feature importance as score_orig - score_temp
4. Create dictinary with FP without permutation, tuple of important features in an ascending order, and FP using n most 
important features


In [1]:
import pandas as pd
import numpy as np
import sklearn

In [2]:
df_train = pd.read_csv('train.csv')
df_test=pd.read_csv('test.csv')

In [3]:
X_train = df_train.iloc[:,:-1]
X_test = df_test.iloc[:,:-1]
y_train = df_train.iloc[:,-1:]
y_test = df_test.iloc[:,-1:]

In [4]:
def Classification(X_train,X_test,y_train,y_test):
    from sklearn.tree import DecisionTreeClassifier
    from sklearn import metrics
    # Create Decision Tree classifer object
    random_state = np.random.seed(0)
    clf = DecisionTreeClassifier()
    # Train Decision Tree Classifer
    clf = clf.fit(X_train,y_train)
    #Predict the response for test dataset
    y_pred = clf.predict(X_test)
    #Measure accuracy
    accuracy=metrics.accuracy_score(y_test, y_pred)
    #Measure FP from confusion matrix
    tn, fp, fn, tp = sklearn.metrics.confusion_matrix(y_test, y_pred).ravel()
    class_report = sklearn.metrics.classification_report(y_test, y_pred)
    return accuracy, fp

In [5]:
def Feature_Importance(X_train,X_test,y_train,y_test, n):
    Output = {}
    # Calculating the original score and original FP
    score_orig, fp_orig=Classification(X_train,X_test,y_train,y_test)   
    
    # Data frame with feature importance
    df_import=pd.DataFrame(columns=['Feature Name', 'Importance'])
    X_test_copy = X_test.copy()
    X_train_copy = X_train.copy()
    for ind, column in enumerate(X_train.columns):
        X_train_temp = X_train_copy.copy()
        np.random.seed(0)
        X_train_temp.iloc[:,ind]=np.random.permutation(X_train_copy.iloc[:,ind])
        score_x, fp_x = Classification(X_train_temp,X_test,y_train,y_test)
        permuation_feature_importance = score_orig - score_x
        df_import.loc[ind]=[column,permuation_feature_importance]
    df_import=df_import.sort_values(['Importance'],ascending=False)
    print(df_import.head(n))
    tup=list(df_import.iloc[:,:].to_records(index=False))
    my_list=df_import['Importance'].nlargest(n=n).index
    
    X_train_import=X_train.iloc[:,my_list]
    X_test_import=X_test.iloc[:,my_list] 
    score_import, fp_import=Classification(X_train_import,X_test_import,y_train,y_test)
    # Create the Output Dictionary
    Output['false positive without permutation'] = fp_orig
    Output['accuracy without permutation'] = score_orig
    Output['Feature Importance'] = tup
    Output['false positive using n important features'] = fp_import
    Output['accuracy using n important features'] = score_import
    return Output

In [6]:
Dict = Feature_Importance(X_train,X_test,y_train,y_test, 5)

                    Feature Name  Importance
40  other_installment_plans_A143        0.07
9    checking_account_status_A14        0.07
11            credit_history_A32        0.07
46                telephone_A192        0.06
3              present_residence        0.06


In [7]:
print(Dict)

{'false positive without permutation': 14, 'accuracy without permutation': 0.69, 'Feature Importance': [('other_installment_plans_A143', 0.07), ('checking_account_status_A14', 0.07), ('credit_history_A32', 0.07), ('telephone_A192', 0.06), ('present_residence', 0.06), ('job_A172', 0.06), ('credit_history_A34', 0.05), ('credit_amount', 0.05), ('other_debtors_A102', 0.05), ('savings_A63', 0.04), ('property_A122', 0.04), ('purpose_A43', 0.04), ('present_employment_A75', 0.04), ('present_employment_A74', 0.04), ('credit_history_A33', 0.04), ('credit_history_A31', 0.04), ('purpose_A41', 0.03), ('purpose_A44', 0.03), ('property_A123', 0.03), ('purpose_A48', 0.03), ('personal_A94', 0.03), ('personal_A93', 0.03), ('present_employment_A72', 0.03), ('purpose_A42', 0.02), ('checking_account_status_A12', 0.02), ('existing_credits', 0.02), ('personal_A92', 0.02), ('job_A173', 0.01), ('job_A174', 0.01), ('housing_A152', 0.01), ('duration', 0.01), ('present_employment_A73', 0.01), ('checking_account_s