<a id="top"></a>
# AntakIA tutorial
***
### Using AntakIA with the GUI!

AntakIA helps you understand and explain your _black-box_ machine-learning models, by identifying the most relevant way of segregating your dataset and the best surrogate models to apply on these freshly created regions. In this notebook, we will show you how to use the automatic dyadic-clustering algorithm of AntakIA.

> This notebook is a tutorial on how to use AntakIA with the GUI. If you want to use the GUI, please refer to the [AntakIA without GUI tutorial](antakia_CH_no_gui.ipynb).
> 
> For more information about AntakIA, please refer to the [AntakIA documentation](https://ai-vidence.github.io/antakia/) or go to [AI-vidence's website](https://ai-vidence.com/).

__In this notebook, you will learn how to:__
- Create a dataset object from a CSV file
- Instanciate an AntakIA object
- Run the GUI to explore the dataset, the model, define regions and apply sub-models
- Visualize the results

## Context :

__Let's pretend that we are a real estate agent and that we want to predict the price of a house based on its characteristics.__ We have a dataset of more than 20000 blocks of houses, each block being described by 8 features (e.g. medium income of the owners, number of rooms, etc.). We also have the price of each block of houses. We already trained a machine-learning model (in our case, a simple XGBoost) that will predict the price of a house based on its characteristics. This is very helpful to estimate the price of a house that we want to sell !

__The main issue is the following :__ we want to explain to our customers why their house is worth a certain price. We can't just show them the machine-learning model, because it is a _black-box_ model. We need to find a way to explain the price of a house based on its characteristics. This is where AntakIA comes in handy !

We start by importing the necessary libraries.

In [1]:
import pandas as pd 

In [2]:
# df = pd.DataFrame({'Col1':[4.1, 23.43], 'Col2':['a', 'w'], 'Col3':[1, 8]})
# df.dtypes

Then, our dataset. Ours is [this one](https://inria.github.io/scikit-learn-mooc/python_scripts/datasets_california_housing.html); it can be found in the `data` folder of this repository.

In [3]:
df = pd.read_csv('../data/california_housing.csv').drop(['Unnamed: 0'], axis=1)

After cleaning a bit our data, we want to specifically focus on __San Francisco__ and its surroundings.

In [4]:
# Remove outliers:
df = df.loc[df['Population']<10000] 
df = df.loc[df['AveOccup']<6]
df = df.loc[df['AveBedrms']<1.5]
df = df.loc[df['HouseAge']<50]

# Only San Francisco :
df = df.loc[(df['Latitude']<38.07)&(df['Latitude']>37.2)]
df = df.loc[(df['Longitude']>-122.5)&(df['Longitude']<-121.75)]

Note that we already computed some explanatory values (in our case, SHAP values) and saved them in the CSV file. This is not necessary, as AntakIA can do it, but it will save us some computation time!

In [5]:
X = df.iloc[:,0:8] # the dataset
y = df.iloc[:,9] # the target variable
shapValues = df.iloc[:,[10,11,12,13,14,15,16,17]] # the SHAP values

We also have a trained XGBoost model that we will use to predict the price of a house.

In [6]:
from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor(random_state = 9)
model.fit(X, y)
print('model fitted')

model fitted


__Let's now import `antakia`!__

In [7]:
from antakia.data import ExplanationMethod

from antakia.antakia import AntakIA
atk = AntakIA([X, shapValues], y, model, [ExplanationMethod.NONE, ExplanationMethod.SHAP])

Output(layout=Layout(border_bottom='1px solid black', border_left='1px solid black', border_right='1px solid b…

  @numba.jit()
  @numba.jit()
  @numba.jit()
  @numba.jit()


Output(layout=Layout(border_bottom='1px solid black', border_left='1px solid black', border_right='1px solid b…

Output(layout=Layout(border_bottom='1px solid black', border_left='1px solid black', border_right='1px solid b…

Output(layout=Layout(border_bottom='1px solid black', border_left='1px solid black', border_right='1px solid b…

Output(layout=Layout(border_bottom='1px solid black', border_left='1px solid black', border_right='1px solid b…

In [8]:
atk = AntakIA([X, shapValues], y, model, [ExplanationMethod.NONE, ExplanationMethod.SHAP])

In [9]:
atk.startGUI()

<bound method WidgetGraph.get_widget of <antakia.gui_utils.WidgetGraph object at 0x16e07ee90>>

VBox(children=(AppBar(children=[Layout(children=[Image(value=b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x0e…



AttributeError: 'GUI' object has no attribute 'atk'