<a id="top"></a>
# AntakIA tutorial
***
### Learning about the utils of AntakIA!

__<p style="color:orange; text-align: center;">WARNING: This notebook is not exhaustive! Please refer to the [AntakIA documentation](https://ai-vidence.github.io/antakia/) for more information.</p>__

AntakIA helps you understand and explain your _black-box_ machine-learning models, by identifying the most relevant way of segregating your dataset and the best surrogate models to apply on these freshly created regions. In this notebook, we will show you how to use the automatic dyadic-clustering algorithm of AntakIA.

> This notebook is a tutorial on how to manipulate the different objects and functions of AntakIA. If you want a usage example, please refer to :
> - [this tutorial](antakia_gui.ipynb) if you want to use the GUI,
> - [this one](antakia_no_gui.ipynb) if you don't want to use the GUI.
>
> For more information about AntakIA, please refer to the [AntakIA documentation](https://ai-vidence.github.io/antakia/) or go to [AI-vidence's website](https://ai-vidence.com/).

__In this notebook, you will learn to manipulate different objects and concepts of AntakIA, such as:__
- [The Dataset object](#dataset)
- [The Potatoes](#potatoes)
- [The AntakIA object](#antakia)
- [The GUI object](#gui)
- [The save and load functions](#save_load)

Every data-science prject begins with importing the necessary libraries.

In [1]:
import pandas as pd
import antakia # of course!

Then, the dataset. Ours is [this one](https://inria.github.io/scikit-learn-mooc/python_scripts/datasets_california_housing.html); it can be found in the `data` folder of this repository.

In [2]:
df = pd.read_csv('../data/california_housing/california_housing.csv').drop(['Unnamed: 0'], axis=1)
X = df.iloc[:,0:8] # the dataset
Y = df.iloc[:,9] # the target variable

We train an XGBoost model that we will use to predict the price of a house.

In [3]:
from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor(random_state = 9)
model.fit(X, Y)
print('model fitted')

model fitted


***
# The Dataset object <a id="dataset"></a>

Every AntakIA usage starts by instatiating a [`Dataset`](https://ai-vidence.github.io/antakia/documentation/dataset/). It takes as input at least a pandas dataframe for the X values and a model.

In [4]:
dataset = antakia.Dataset(X, model = model, y=Y)
print(f'Size of the dataset: {len(dataset)}')

Size of the dataset: 20640


### The dataset object contains a bunch of interesting methods:
- `frac`, to get a fraction of the dataset

In [5]:
dataset.frac(0.01)
print(f'Size of the fractionated dataset: {len(dataset)}')

Size of the fractionated dataset: 206


- `improve`, which displays a widget to modify the dataset. For each feature, you can change its name, its type, its comment and if it is sensible or not.
You also have the access to the general informations of the dataset.

In [None]:
dataset.improve()

***
# The AntakIA object <a id="antakia"></a>
We then use the [`AntakIA`](https://ai-vidence.github.io/antakia/documentation/antakia/) class to create an AntakIA object. This is the main object of the package: all the objects are linked though it.

In [7]:
atk = antakia.AntakIA(dataset)

You can access to its attributes using the methods corresponding. You can set them the same way. Here a few examples:

In [8]:
atk.getRegions()
atk.getSaves()
atk.getExplanations()

atk.newRegion(antakia.Potato(atk)) # we will see what a Potato is later...
atk.resetRegions()

The AntakIA object contains the explanatory values. You can import them with `import_explanation` when instaciating the object, or with `importExplanation` after.

If you want to compute other explanations (for now, Shap and Lime values are avalable), you can use the appropriate methods:

In [None]:
atk.computeSHAP() # or atk.computeLIME()

You may start the GUI using the `startGUI` method. See [this tutorial](antakia_gui.ipynb) for more information!

In [None]:
atk.startGUI(explanation='SHAP')

Without the GUI, you may want to compute our automatic clustering:

In [11]:
atk.computeDyadicClustering(explanation='SHAP')

***
## The Potato object <a id="potatoes"></a>

An AntakIA Potato is a selection of points from the dataset. A Potato is linked to an Antakia object. You can define or access its different attributes like so:

In [12]:
potato = antakia.Potato(atk, [])

potato.setIndexes([1,2,3,7,8,55,77,99]) # indexes of the observations to explain

potato.apply_skope(explanation='SHAP') # to find the rules that explain the observations

import sklearn.linear_model as lm
potato.setSubModel(lm.LinearRegression())

potato.getVSdata().head()

AntakIA ERROR : No rules found for this precision and recall


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
1,5.8541,22.0,6.448105,0.988468,1871.0,3.082372,33.66,-117.97
2,1.9417,32.0,7.647059,2.240642,434.0,2.320856,40.28,-124.25
3,1.5933,37.0,3.998331,1.046745,2489.0,4.155259,32.69,-117.11
7,2.3549,18.0,5.646209,1.288809,618.0,2.231047,38.43,-120.55
8,4.2292,33.0,6.39375,1.021875,1018.0,3.18125,33.89,-118.01


You might also want to import a potato from a file:

In [13]:
from antakia.potato import potatoFromJson
#potato = potatoFromJson(atk, 'a json file')

That's it !
***

## List if usefull links

- [AntakIA documentation](https://ai-vidence.github.io/antakia/) - The official documentation of AntakIA
- [AntakIA GitHub repository](https://github.com/AI-vidence/antakia/tree/main) - The GitHub repository of AntakIA. Do not forget to __star__ it if you like it!
- [AntakIA video tutorials](https://www.youtube.com/@AI-vidence) - The YouTube channel of AI-vidence, with video tutorials on AntakIA!
- [AI-vidence's website](https://ai-vidence.com/) - The website of AI-vidence, the company behind AntakIA

[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/AI-vidence/antakia/main/docs/img/Logo-AI-vidence.png" alt="AI-vidence" width="200px"/> 

 ***