<a id="top"></a>
# AntakIA tutorial
***
### Using AntakIA with the GUI!

AntakIA helps you understand and explain your _black-box_ machine-learning models, by identifying the most relevant way of segregating your dataset and the best surrogate models to apply on these freshly created regions. In this notebook, we will show you how to use the automatic dyadic-clustering algorithm of AntakIA.

> This notebook is a tutorial on how to use AntakIA with the GUI. If you want to use the GUI, please refer to the [AntakIA without GUI tutorial](antakia_CH_no_gui.ipynb).
> 
> For more information about AntakIA, please refer to the [AntakIA documentation](https://ai-vidence.github.io/antakia/) or go to [AI-vidence's website](https://ai-vidence.com/).

__In this notebook, you will learn how to:__
- Create a dataset object from a CSV file
- Instanciate an AntakIA object
- Run the GUI to explore the dataset, the model, define regions and apply sub-models
- Visualize the results

## Context :

__Let's pretend that we are a real estate agent and that we want to predict the price of a house based on its characteristics.__ We have a dataset of more than 20000 blocks of houses, each block being described by 8 features (e.g. medium income of the owners, number of rooms, etc.). We also have the price of each block of houses. We already trained a machine-learning model (in our case, a simple XGBoost) that will predict the price of a house based on its characteristics. This is very helpful to estimate the price of a house that we want to sell !

__The main issue is the following :__ we want to explain to our customers why their house is worth a certain price. We can't just show them the machine-learning model, because it is a _black-box_ model. We need to find a way to explain the price of a house based on its characteristics. This is where AntakIA comes in handy !

We start by importing the necessary libraries.

In [2]:
import pandas as pd 

Then, our dataset. We use [this Californian housing dataset](https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset) from Scikit-learn repo.

It the `data` folder of this repository, you'll see `california_housing.csv`: it already has its SHAP values computed so you won't have to wait for their calculation.
⚠ Check you have Git Large File Storage (LGS) installed to use our dataset, see : https://git-lfs.com

In [3]:
df = pd.read_csv('../data/california_housing.csv').drop(['Unnamed: 0'], axis=1)

# If you prefer to start with the genuine fresh dataset  :
# from sklearn.datasets import fetch_california_housing
# df=fetch_california_housing(as_frame=True).frame

KeyError: "['Unnamed: 0'] not found in axis"

After cleaning a bit our data, we want to specifically focus on __San Francisco__ and its surroundings.

In [None]:
# Remove outliers:
df = df.loc[df['Population']<10000] 
df = df.loc[df['AveOccup']<6]
df = df.loc[df['AveBedrms']<1.5]
df = df.loc[df['HouseAge']<50]

# Only San Francisco :
df = df.loc[(df['Latitude']<38.07)&(df['Latitude']>37.2)]
df = df.loc[(df['Longitude']>-122.5)&(df['Longitude']<-121.75)]

Note that we already computed some explanatory values (in our case, SHAP values) and saved them in the CSV file. This is not necessary, as AntakIA can do it, but it will save us some computation time!

In [None]:
X = df.iloc[:,0:8] # the dataset
Y = df.iloc[:,9] # the target variable
SHAP = df.iloc[:,[10,11,12,13,14,15,16,17]] # the SHAP values

We also have a trained XGBoost model that we will use to predict the price of a house.

In [None]:
from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor(random_state = 9)
model.fit(X, Y)
print('model fitted')

__Let's now import `antakia`!__

In [None]:
import antakia

## 1. Creating the dataset object

We first use the [`Dataset`](https://ai-vidence.github.io/antakia/documentation/dataset/) class to create a dataset object. This object will be used to store the data and the machine-learning model.

In [None]:
dataset = antakia.Dataset(X, model = model, y=Y)
print(f'Size of the data we want to explore: {len(dataset)} lines')

## 2. Creating the AntakIA object
We then use the [`AntakIA`](https://ai-vidence.github.io/antakia/documentation/antakia/) class to create an AntakIA object. This is the main object of the package!
This is where we import our explanatory values (in our case, SHAP values).

In [None]:
atk = antakia.AntakIA(dataset, import_explanation = SHAP)

## 3. Launching the GUI!
We can now launch the GUI! We will use it to manually define regions, explore our data and model, or run the automatic dyadic-clustering algorithm.

Before using the GUI, we recommend you to do the following things:
- Take a look at its documentation [here](https://ai-vidence.github.io/antakia/documentation/gui/). `GUI` is a specific AntakIA object!
- Read the [User guide](https://ai-vidence.github.io/antakia/usage/) section of the documentation.
- Watch the video tutorials on [AI-vidence's website](https://ai-vidence.com/) or on [YouTube](https://www.youtube.com/@AI-vidence).

In [None]:
atk.startGUI()

__You may retrieve your results by using the following commands:__

In [None]:
print(atk.gui.getSelection()) # the selection is an attribute of the GUI. It is a Potato object!

In [None]:
atk.getRegions() # get the regions created using the GUI. A region is a list of antakia Potatoes.

In [None]:
atk.getSaves() # get the saves created using the GUI. A save is a list of regions.

In [None]:
atk.getExplanations() # get the explanation values (Imported, SHAP, LIME), so as to save them locally and use them later. Their computation can be long!

That's it ! You now know how to use AntakIA with GUI. If you don't want to use the GUI, please refer to the [AntakIA with no GUI tutorial](antakia_no_gui.ipynb).
***

## List if usefull links

- [AntakIA documentation](https://ai-vidence.github.io/antakia/) - The official documentation of AntakIA
- [AntakIA GitHub repository](https://github.com/AI-vidence/antakia/tree/main) - The GitHub repository of AntakIA. Do not forget to __star__ it if you like it!
- [AntakIA video tutorials](https://www.youtube.com/@AI-vidence) - The YouTube channel of AI-vidence, with video tutorials on AntakIA!
- [AI-vidence's website](https://ai-vidence.com/) - The website of AI-vidence, the company behind AntakIA

[Top of Page](#top)
<img style="float: right;" src="https://raw.githubusercontent.com/AI-vidence/antakia/main/docs/img/Logo-AI-vidence.png" alt="AI-vidence" width="200px"/> 

 ***