## Counterfactual explanations with NICE
In this notebook we generate **sparse** counterfactual explanations with NICE for instances of the adult dataset.

In [None]:
from pmlb import fetch_data
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler,OneHotEncoder
from sklearn.pipeline import Pipeline
from nice.explainers import NICE
import pandas as pd

## Load and preprocess dataset
We import the data using `fetch_data` from the `pmlb` package. Some features are removed and the data is split into X and y

In [None]:
adult = fetch_data('adult')
X = adult.drop(columns=['education-num','fnlwgt','target','native-country'])
y = adult.loc[:,'target']
feature_names = list(X.columns)

`NICE` only supports input in the form of np.array at the moment. We transform the DataFrame and split the data in a test
and Training set

In [None]:
X = X.values #only supports arrays atm
y= y.values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Both `NICE` and our classification pipeline will need the column numbers of both the categorical and numerical features.

In [None]:
print(feature_names)
cat_feat = [1,2,3,4,5,6,7]
num_feat = [0,8,9,10]

## Training a Classification Pipeline

`NICE` requires each column to represent one feature. If One-Hot-encoding is used, it should be included in a pipeline
like the example below.

In [None]:
clf = Pipeline([
    ('PP',ColumnTransformer([
            ('num',StandardScaler(),num_feat),
            ('cat',OneHotEncoder(handle_unknown = 'ignore'),cat_feat)])),
    ('RF',RandomForestClassifier())])

clf.fit(X_train,y_train)

## Generating Explanations
The `NICE.fit()` method has a `predict_fn` argument which requires a function that returns a score output for each class.

In [None]:
predict_fn = lambda x: clf.predict_proba(x)

When initializing `NICE`, we have to specify 2 arguments. The argument `optimization` controls which property of our
counterfactual explanation is optimized. In our example we use the `"sparsity"` optimization

If `justified_cf` is set to `True`, NICE only searches
for nearest neighbours in the instances of `X_train` for which the class is correctly predicted by our classifier.

In [None]:
NICE_adult = NICE(optimization='sparsity',
                  justified_cf=True)

The `.fit()` method requires information about our dataset an classifier.

Our training sample (`X_train`) and labels (`y_train`) are required in the form of a `numpy.array`. A `list` with
 indices of the categorical and numerical features is required for the `cat_feat` and `num_feat` arguments.

In [None]:
NICE_adult.fit(X_train = X_train,
               predict_fn=predict_fn,
               y_train = y_train,
               cat_feat=cat_feat,
               num_feat=num_feat)

Once the model is fitted, it's easy to generate fast explanations for any observation by using the `.explain()` method.


In [None]:
to_explain = X_test[0:1,:]
CF = NICE_adult.explain(to_explain)
