# XAISUITE INSTALLATION AND USE TUTORIAL

## Installing XAISuite

In [2]:
!pip install XAISuite

Collecting XAISuite
  Downloading XAISuite-0.6-py3-none-any.whl (9.3 kB)


Installing collected packages: XAISuite
Successfully installed XAISuite-0.6


## Importing and Using XAISuite

In [3]:
from xaisuite import*

Let's look at the documentation for key XAISuite functions. Alternatively, you could check our [documentation webpage](https://11301858.github.io/XAISuite/v0.6.0-beta/index.html)

In [4]:
help(train_and_explainModel)

Help on function train_and_explainModel in module xaisuite.xaichooser:

train_and_explainModel(model: str, tabular_data: omnixai.data.tabular.Tabular, x_ai: list, indexList: list = [], scale: bool = True, scaleType: str = 'StandardScaler', addendum: str = '', verbose: bool = False, **modelSpecificArgs)
    A function that attempts to train and explain a particular sklearn model.
    Parameters:
    model:str | Name of Model
    tabular_data:Tabular | Tabular object representing data set to be used in training
    x_ai:list | List of explanatory models to be used
    indexList:list = [] | Specific test data instance to be explained, by default empty (indicating all instances should be explained)
    scale:bool = True | Whether data should be scaled before training
    scaleType:str = "StandardScaler" | Default Scaler type. Example: Use "MinMaxScaler" for MultinomialNB model.
    addendum:str = "" | Added string to explanation files in case multiple models are being trained and explained

In [6]:
help(load_data_CSV)

Help on function load_data_CSV in module xaisuite.dataLoader:

load_data_CSV(data: str, target: str, cut: Union[str, list] = None) -> omnixai.data.tabular.Tabular
    A function that creates a omnixai.data.tabular.Tabular object instance representing a particular dataset.
    Parameters:
    data:str | Pathname for the CSV file where the dataset is found.
    target:str | Target variable used for training data
    cut: Union[str, list] = None | Variables that should be ignored in training
    
    Returns:
    tabular_data: Tabular | Tabular object instance representing 'data'



In [7]:
help(load_data_sklearn)

Help on function load_data_sklearn in module xaisuite.dataLoader:

load_data_sklearn(datastore: dict, target: str, cut: Union[str, list] = None) -> omnixai.data.tabular.Tabular
    A function that creates a omnixai.data.tabular.Tabular object instance representing a particular sklearn dataset for demoing.
    Parameters:
    datastore:dict | A dictionary object containing the data
    target:str | Target variable used for training data
    cut: Union[str, list] = None | Variables that should be ignored in training
    
    Returns:
    tabular_data: Tabular | Tabular object instance representing 'data'



In [8]:
help(compare_explanations)

Help on function compare_explanations in module xaisuite.analyzer:

compare_explanations(filenames: list, verbose=False)
    A function that analyzes and compares the explanations generated by train_and_explainModel.
    Parameters:
    filenames:list | File names with explanations (of the form "Explainer ImportanceScores - Model Target.csv")
    
    Returns:
    Nothing



Any supervised machine learning model from sklearn is supported, as long as you use it on the right dataset and for the right purpose. Models outside of sklearn may or may not be supported. If you need to pass in a specific argument to the model itself, just enter them as arguments to ``train_and_explainModel`` after the regular arguments. If you are using sklearn datasets, make sure to ``from sklearn.datasets import*`` Following are examples of correct uses of XAISuite functions.

In [9]:
from sklearn.datasets import*

In [11]:
models = ["LogisticRegression", "SVC", "GaussianNB", "MultinomialNB", "SGDClassifier", "KNeighborsClassifier", "DecisionTreeClassifier", "RandomForestClassifier", "GradientBoostingClassifier", "LinearRegression", "SGDRegressor", "Kernel Ridge", "ElasticNet", "BayesianRidge", "GradientBoostingRegressor", "SVR"]

In [12]:
data = ["load_diabetes()", "load_iris()", "fetch_california_housing()", "load_digits()"] # Need to add 

For most functions, we can just plug in the model and dataset. We choose explainers ``lime`` and ``shap``. The addendum argument is in case you use the same model and explainer for different datasets and need to differentiate / prevent overwrite.

In [None]:
# Train all the models on the all the datasets. There will be some errors, which we'll address in following cells.
for i in range (len(data)):
    for j in range (len(models)):
        try:
            train_and_explainModel(models[j], load_data_sklearn(eval(data[i]), 'target'), ["lime", "shap"], addendum = " " + data[i])
        except:
            continue
    print(data[i] + "is finished.")
    time.sleep(10)

Some models will not work with the above code. Why? Because they need different/extra arguments.

For ``SVC``, we can set ``probability = True`` to get results

In [None]:
train_and_explainModel("SVC", load_data_sklearn(eval(data[0]), 'target'), ["lime", "shap"], addendum = " " + data[0], probability = True) #Note model argument probability = True

For ``SGDClassifier``, we can set the loss function to ``loss = "modifier_huber"``

In [None]:
train_and_explainModel("SGDClassifier", load_data_sklearn(eval(data[0]), 'target'), ["lime", "shap"], addendum = " " + data[0], loss = "modified_huber") #Note model argument loss = "modifier_huber"

``MultinomialNB`` only takes in positive values, so we cannot use the default ``StandardScaler``. Instead, we use ``MinMaxScaler``

In [None]:
train_and_explainModel("MultinomialNB", load_data_sklearn(eval(data[0]), 'target'), ["lime", "shap"], scaleType = "MinMaxScaler", addendum = " " + data[i])

Now, it's time to compare the explanations we have generated. 

In [None]:
for i in range (len(data)):
    for j in range (len(models)):
        try:
            print ("Let's compare SHAP and LIME explanations using " + models[j] + " trained on " + data[i] + " dataset: ")
            compare_explanations(["shap ImportanceScores - " + models[j]+" target.csv", "lime ImportanceScores - " + models[j]+" target" + " " + data[i]."csv"])
        except:
            print(models[j] + " on " + data[i] + " failed. ")
            continue

We hope this tutorial is helpful. If you have any questions, please open an issue on Github, and we'll try to reply as soon as possible.