# teex

### Generating data with available g.t. feature importance explanations

We are going to see the available options for data generation with g.t. feature importance explanations.

1. *SenecaFI*

In [4]:
from teex.featureImportance.data import SenecaFI

We are going to explore `SenecaFI`, a method from [Evaluating local explanation methods on ground truth, Riccardo Guidotti, 2021].

In particular, this method was not conceived as a data generation procedure, but rather as a way to generate transparent classifiers (i.e. a classifier with available ground truth explanations). We use that generated classifier and some artificially generated data to return a dataset with observations, labels and ground truth explanations. The dataset is of binary classification.

In [33]:
# instance the data generator
dataGen = SenecaFI(nSamples=100, nFeatures=4, randomState=0)

# retrieve the generated observations
X, y, exps = dataGen[:]

In [34]:
print(f'Observation: {X[0]} \nLabel: {y[0]} \nExplanation: {exps[0]}')

Observation: [ 0.12573022  0.50268285 -0.6635352   1.20325895] 
Label: 1 
Explanation: [-0.1479 -0.1521 -0.1457 -0.1356]


The ground truth FI explanations are scales to the range (-1, 1) by feature. That is, if a feature contains a 1 in a particular observation, that means that it is the observation where that feature is most important in the dataset. Inversely, if an observation contains a -1, it means that the specific feature contributes the most negatively in the dataset.

One can specify the number of points to be generated (`nSamples`), the number of features (`nFeatures`), the names of the features (`featureNames`) and the random state (`randomState`).

In [52]:
dataGen.featureNames  # automatically generated

['a', 'b', 'c', 'd']

The explanations are generated by first creating a random collection of points. Then, creating a random linear expression and finally evaluating its derivative at the points closest to the original observations. The underlying model can be accessed:

In [35]:
model = dataGen.transparentModel
model

<teex.featureImportance.data.TransparentLinearClassifier at 0x12cb89eb0>

This structure follows the sklearn API (`.fit`, `.predict`, `.predict_proba`) and can be used to test explainer methods, for example. An important method that it contains is the `.explain`, which given an observation, explains the prediction. All of the observations that the object receive must be of shape (nObservations, nFeatures).

Compute predictions:

In [41]:
print(f'Single observation: {model.predict(X[0].reshape(1, -1))} \nMultiple observations: {model.predict(X[:10])}')

Single observation: [1] 
Multiple observations: [1 1 0 1 1 1 1 0 1 1]


Compute class probabilities:

In [50]:
print(f'Single observation: \n{model.predict_proba(X[0].reshape(1, -1))} \n\nMultiple observations: \n{model.predict_proba(X[:10])}')

Single observation: 
[[0. 1.]] 

Multiple observations: 
[[0.         1.        ]
 [0.09840876 0.90159124]
 [1.         0.        ]
 [0.         1.        ]
 [0.03176552 0.96823448]
 [0.21362782 0.78637218]
 [0.21274531 0.78725469]
 [0.75386864 0.24613136]
 [0.         1.        ]
 [0.         1.        ]]


Compute explanations:

In [47]:
print(f'Single observation: \n{model.explain(X[0].reshape(1, -1))} \n\nMultiple observations: \n{model.explain(X[:10])}')

Single observation: 
[[-0.4912 -1.     -0.2243  1.    ]] 

Multiple observations: 
[[-0.1479 -0.1521 -0.1457 -0.1356]
 [-0.1479 -0.1521 -0.1457 -0.1356]
 [-0.1528 -0.1346 -0.1476 -0.1413]
 [-0.1398  1.     -0.1493 -0.2083]
 [-0.1601 -0.0701 -0.1483 -0.1761]
 [-0.1479 -0.1521 -0.1457 -0.1356]
 [-0.1456 -0.1456 -0.1469 -0.1498]
 [-0.1466 -0.1417 -0.1483 -0.1465]
 [-0.1316 -0.1265 -0.1476 -0.1621]
 [-0.1456 -1.     -0.1405 -0.1003]]


Note that the scaler will work with the observations that it is explaining.

<p float="left">
  <img src="https://taiao.ai/img/6825_TAIAO_logo_1000x320.jpg" alt="drawing" style="width:150px;"/>
  <img src="https://www.bourses-etudiants.ma/wp-content/uploads/2018/06/University-of-Waikato-logo.png" alt="drawing" style="width:150px;"/>
  <img src="https://www.upc.edu/comunicacio/ca/identitat/descarrega-arxius-grafics/fitxers-marca-principal/upc-positiu-p3005.png" alt="drawing" style="width:200px;"/>
</p>