# Explainability "mean decrease importance" and feature permutations.
There are multiple ways to measure feature importance. Mean decrease importance is one of the simplest and, in spite of the fact of its limitations, it can be used in a very fast way.

## Problema 1

Our first example will use a model that predicts whether a soccer/football team will have the "Man of the Game" winner based on the team's statistics. The "Man of the Game" award is given to the best player in the game. 

Let's start by reading a dataset and training a black box classifier.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier 
from sklearn import metrics

blackboxmethod = RandomForestClassifier

data = pd.read_csv('https://raw.githubusercontent.com/DataScienceUB/ExplainableDataScience/master/FIFA%202018%20Statistics.csv')

y = (data['Man of the Match'] == "Yes")  # Convert from string "Yes"/"No" to binary
feature_names = [i for i in data.columns if data[i].dtype in [np.int64]]
X = data[feature_names]

X

In [None]:
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
blackboxmodel = blackboxmethod(random_state=0).fit(train_X, train_y)

blackboxmodel.fit(val_X, val_y)
y_pred=blackboxmodel.predict(val_X)
print("Accuracy:",metrics.accuracy_score(val_y, y_pred))


**ELI5** is a Python package which helps to debug machine learning classifiers and explain their predictions. 

**ELI5** provides a way to compute feature importances for any black-box estimator by measuring how score decreases  when a feature is not available; the method is also known as “Mean Decrease Accuracy (MDA)”.

To do that one can remove feature from the dataset, re-train the estimator and check the score. But it requires re-training an estimator for each feature, which can be computationally intensive. 

To avoid re-training the estimator we can remove a feature only from the test part of the dataset, and compute score without using this feature. It doesn’t work as-is, because estimators expect feature to be present. So instead of removing a feature we can replace it with random noise - feature column is still there, but it no longer contains useful information. This method works if noise is drawn from the same distribution as original feature values (as otherwise estimator may fail). The simplest way to get such noise is to shuffle values for a feature, i.e. use other examples’ feature values - this is how permutation importance is computed.

The method is most suitable for computing feature importances when a number of columns (features) is not huge; it can be resource-intensive otherwise.

The next code line install the package.

In [None]:
!pip install eli5

Now we can call the method.

In [None]:
import eli5
from eli5.sklearn import PermutationImportance

perm = PermutationImportance(blackboxmodel, random_state=1).fit(val_X, val_y)
eli5.show_weights(perm, feature_names = val_X.columns.tolist())

The values towards the top are the most important features, and those towards the bottom matter least.

The first number in each row shows how much model performance decreased with a random shuffling (in this case, using "accuracy" as the performance metric).

Like most things in data science, there is some randomness to the exact performance change from a shuffling a column. We measure the amount of randomness in our permutation importance calculation by repeating the process with multiple shuffles. The number after the ± measures how performance varied from one-reshuffling to the next.

You'll occasionally see negative values for permutation importances. In those cases, the predictions on the shuffled (or noisy) data happened to be more accurate than the real data. This happens when the feature didn't matter (should have had an importance close to 0), but random chance caused the predictions on shuffled data to be more accurate. 

**Question 1**: Check if another model, such as a SVM, gets the same scores. All you have to do is to change the `blackboxmethod` to be a SVM.

The only change you need to do is to change your black box: 

`from sklearn.svm import SVC`



In [None]:
# Your code here

**Question 2**: What is your interpretation of the differences?

*Your answer here*

## Problem 2

Now we are going to work with a sample of data from the [Taxi Fare Prediction](https://www.kaggle.com/c/new-york-city-taxi-fare-prediction) competition.

The task is to predict the fare amount (inclusive of tolls) for a taxi ride in New York City given the pickup and dropoff locations. We will use two methods: linear regression and Random Forest Regression.

In [None]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

data = pd.read_csv('https://raw.githubusercontent.com/DataScienceUB/ExplainableDataScience/master/taxi.csv', nrows=50000)

# Remove data with extreme outlier coordinates or negative fares
data = data.query('pickup_latitude > 40.7 and pickup_latitude < 40.8 and ' +
                  'dropoff_latitude > 40.7 and dropoff_latitude < 40.8 and ' +
                  'pickup_longitude > -74 and pickup_longitude < -73.9 and ' +
                  'dropoff_longitude > -74 and dropoff_longitude < -73.9 and ' +
                  'fare_amount > 0'
                  )

y = data.fare_amount

base_features = ['pickup_longitude',
                 'pickup_latitude',
                 'dropoff_longitude',
                 'dropoff_latitude',
                 'passenger_count']

X = data[base_features]


train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
first_model = RandomForestRegressor(n_estimators=30, random_state=1).fit(train_X, train_y)

first_model.fit(val_X, val_y)
y_pred=first_model.predict(val_X)
print('Variance score: %.2f' % r2_score(val_y, y_pred))
print("Mean squared error: %.2f"
      % mean_squared_error(val_y, y_pred))

perm = PermutationImportance(first_model, random_state=1).fit(val_X, val_y)
eli5.show_weights(perm, feature_names = val_X.columns.tolist())

In [None]:
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=2)
first_model = LinearRegression().fit(train_X, train_y)

first_model.fit(val_X, val_y)
y_pred=first_model.predict(val_X)
print('Variance score: %.2f' % r2_score(val_y, y_pred))
print("Mean squared error: %.2f"
      % mean_squared_error(val_y, y_pred))

perm = PermutationImportance(first_model, random_state=1).fit(val_X, val_y)
eli5.show_weights(perm, feature_names = val_X.columns.tolist())

** Question: ** Could you explain in a clear way the result of this experiment?

*Your answer here*