<a id="top"></a>
# AntakIA tutorial
***
### Using AntakIA - the example of the German Credit dataset

AntakIA helps you understand and explain your _black-box_ machine-learning models, by identifying the most relevant way of segregating your dataset and the best surrogate models to apply on these freshly created regions. In this notebook, we will show you how to use the automatic dyadic-clustering algorithm of AntakIA.

> For more complete tutorials, please refer to the [AntakIA __with GUI__ tutorial](antakia_CH_gui.ipynb) or the [AntakIA __without GUI__ tutorial](antakia_CH_no_gui.ipynb).
> 
> For more information about AntakIA, please refer to the [AntakIA documentation](https://ai-vidence.github.io/antakia/) or go to [AI-vidence's website](https://ai-vidence.com/).

## Context :

__Let's pretend that we are a bank that needs to find a way to explain to its customers their choice of giving or not a credit to a person.__

We will use the _German Credit dataset_, described [here](https://online.stat.psu.edu/stat857/node/222/). We already trained a machine-learning model that will predict if a person will get a credit or not.

__The main issue is the following :__ we want to explain to our customers our decision! We can't just show them the machine-learning model, because it is a _black-box_ model. This is where AntakIA comes in handy.

We start by __importing the necessary libraries__.

In [None]:
import pandas as pd 

Then, we __define our X and y values__, and we import a dataframe of __explanatory values__, which we pre-computed.

In [None]:
df = pd.read_csv('../data/german_credit.csv')

X = df.iloc[:,1:] # the dataset
X.columns = [i.replace(' ','_') for i in X.columns]
Y = df.iloc[:,0] # the target variable
SHAP = pd.read_csv('../data/german_credit_shap.csv').drop('Unnamed: 0',axis=1) # the SHAP values

We define and train a __model__, in our case the __RidgeClassifier__ from __scikit-learn__.

In [None]:
from sklearn.linear_model import RidgeClassifier
model = RidgeClassifier(random_state=9)
model.fit(X, Y)
print('model fitted')

We can then import __antakia__ !
We will define all the antakia objects necessary to use the user interface. To understand it better, see [this notebook](antakia_CH_gui.ipynb), another example with more details, or [this one](antakia_utils.ipynb) to understand the multiple objects of the package.

In [None]:
import antakia

### Instanciating everything we need!

In [None]:
dataset = antakia.Dataset(X, model = model, y=Y)
atk = antakia.AntakIA(dataset, import_explanation=SHAP)

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.gaussian_process import GaussianProcessClassifier

my_sub_models = [DecisionTreeClassifier(), RandomForestClassifier(), GaussianProcessClassifier()]

## Launching the __`GUI`__
#### _Follow my instructions !_
* Launch the GUI using the `startGUI` method of the `AntakIA` object, instaciated as `atk` here. We chose as sub-models to choose from classifiers of course !
* Let's tweak the projection settings ! Change the settings of the Explanatory Space projections. Move the cursos of the `FP ratio` to the right, so that `FP ratio = 5`, and click "Validate",
* Four little groups of points appear ! With the lasso (already selected), select the group of points on the right. In the bottom part of the GUI, you will se that this cluster contains 207 points, representing 20.7% of the whole dataset.
* Let's find the rules that best describe this region ! Go to the second tab, "Selection adjustment", and click on "Skope-Rules". 3 rules appear on both spaces. Let's see what they are saying :
    * `1.0 <= Account_Balance <= 2.5`, so Account Balance in {1, 2}, so according to the [description of the dataset](https://online.stat.psu.edu/stat857/node/222/), it means __people with 'no running account'__ or __people with 'no balance or debit'__.
    * `22.5 <= Duration_of_Credit_(month) <= 72.0`, so people that borrow money for a pretty long time.
    * `0.0 <= Payment_Status_of_Previous_Credit <= 2.5` so Payment Status of Previous Credit in {0, 1, 2}, meaning __hesitant payment of previous credits, problematic running account__ or __no previous credits.__

        _We won't change that for the moment, but you can explore the dataset by changing some of the rules or add some._
        
* Go to the third tab "Choice of the sub-model", and choose one. For example, the `DecisionTreeClassifier` from scikit-learn!
* Go top the forth tab "Overview of the regions" and click on "Validate the selection".

    _Here is you first region! Made of 165 points, so 16.5% of the dataset_.
    
### __Now, go to the end of the notebook to see what we learned from this!__

In [None]:
atk.startGUI(sub_models = my_sub_models)

__Using AntakIA, we found out that people with :__
__<p style="text-align: center;">Barely used bank accounts, <br> long time credits, <br> hesitant previous payments <p>__
    
are explained __the same way__ by the global model. From now on, we might train a simpler model on this part of the dataset, such as the sub-model we chose early on.

With this simpler model, we can now give further explanations to the people while giving them a response for their credit. Job done !

## List if usefull links

- [AntakIA documentation](https://ai-vidence.github.io/antakia/) - The official documentation of AntakIA
- [AntakIA GitHub repository](https://github.com/AI-vidence/antakia/tree/main) - The GitHub repository of AntakIA. Do not forget to __star__ it if you like it!
- [AntakIA video tutorials](https://www.youtube.com/@AI-vidence) - The YouTube channel of AI-vidence, with video tutorials on AntakIA!
- [AI-vidence's website](https://ai-vidence.com/) - The website of AI-vidence, the company behind AntakIA

[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/AI-vidence/antakia/main/docs/img/Logo-AI-vidence.png" alt="AI-vidence" width="200px"/> 

 ***