# Classification example with Thetis

Thetis can evaluate AI systems that perform classification tasks. In this example, we demonstrate how to evaluate and rate an AI model using a basic classification example from [scikit-learn](https://scikit-learn.org/). The instructions below should be easy to adapt to your own use case.

## Set up the environment

If you haven't done so already, install Thetis using pip:

```shell
$ pip install thetis
```

For this example, you can use the demo license located within the same directory as this notebook.
This license only works for our demonstration dataset with the exact configuration provided in this notebook.
Use the license file [demo_license_classification.dat](https://raw.githubusercontent.com/EFS-OpenSource/Thetis/main/examples/demo_license_classification.dat).

Place the license file either in the working directory of your application or at:

- Windows: `<User>/AppData/Local/Thetis/license.dat`
- Unix: `~/.local/thetis/license.dat`

## Increase logging verbosity

To obtain detailed runtime information about Thetis, run the following cell. This will add a logging handler to the Thetis logger, increasing the application's verbosity.

In [None]:
import logging
import os


# Configure root logger as catch-all logging config
logger = logging.getLogger("Thetis")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
logger.addHandler(handler)

## Prepare example data and model

To start with our basic classification example, we need to load some data. In this tutorial, we use the
["Adult" dataset](https://www.openml.org/search?type=data&sort=runs&id=179&status=active).
This dataset demonstrates a prediction task that determines whether a person earns over 50K a year.
Let's load the dataset using tools from the scikit-learn library.

*Note:* If your machine is behind a proxy server, downloading the example data as shown here may not work.
If that is the case for you, check out our [detection example](detection.ipynb) instead.

In [None]:
import pandas as pd
from sklearn.datasets import fetch_openml

testset_size = 10000

# use "fetch_openml" by scikit-learn to load "Adult" dataset from OpenML
dataset, target = fetch_openml(data_id=1590, return_X_y=True, parser="auto")

df_train, df_test = dataset.iloc[:-testset_size], dataset.iloc[-testset_size:]
target_train, target_test = target.iloc[:-testset_size], target.iloc[-testset_size:]

# drop columns with sensitive attributes from classifier input and convert categorical attributes to one-hot
df_train_cleared = df_train.drop(columns=["education", "race", "sex", "native-country", "relationship", "marital-status"])
df_test_cleared = df_test.drop(columns=["education", "race", "sex", "native-country", "relationship", "marital-status"])

# convert categorical columns to class codes with integer representation
categorical_columns = ["workclass", "occupation"]
df_train_cleared[categorical_columns] = df_train_cleared[categorical_columns].apply(lambda col: pd.Categorical(col).codes)
df_test_cleared[categorical_columns] = df_test_cleared[categorical_columns].apply(lambda col: pd.Categorical(col).codes)

This yields two [Pandas](https://pandas.pydata.org/) data frames with a reduced set of information.

In the next step, we train a simple Random Forest classifier on the training data using scikit-learn.
We then use the trained model to make predictions on the test data:

In [None]:
from sklearn.ensemble import RandomForestClassifier

# initialize a Random Forest classifier and fit to training data
classifier = RandomForestClassifier(verbose=True)
classifier.fit(pd.get_dummies(df_train_cleared), target_train)

# finally, make predictions on the validation dataset
confidence = classifier.predict_proba(pd.get_dummies(df_test_cleared))
labels = classifier.predict(pd.get_dummies(df_test_cleared))

## Represent data as pandas DataFrames

Thetis expects two Pandas data frames to run an evaluation:

* **Annotations**: A `pd.DataFrame` with ground truth information about the dataset. The column `target` is required and should hold the ground truth target information. Additionally, columns for sensitive attributes are expected if they have been configured for fairness evaluation.
* **Predictions**: A `pd.DataFrame` with the AI predictions for each sample in the dataset. The columns `labels` and `confidence` are required and should hold information about the predicted label and the respective prediction probability (model uncertainty or confidence).

Note that the indices of the data frames for annotations and predictions must match.

In [None]:
# use sensitive attributes during safety evaluation
annotations = pd.DataFrame({"target": target_test, "race": df_test["race"], "sex": df_test["sex"]})
predictions = pd.DataFrame({"labels": labels, "confidence": confidence[:, 1]}, index=annotations.index)

Optionally, you can read/write the `pd.DataFrame` instances in CSV format:

In [None]:
# optional: store prediction and ground truth data on disk
annotations.to_csv("adult_annotations.csv")
predictions.to_csv("adult_predictions.csv")

# optional: load prediction and ground truth data from disk
# important: specify "index_col" since Thetis matches the predictions/annotations by their indices
loaded_annotations = pd.read_csv("adult_annotations.csv", index_col=0)
loaded_predictions = pd.read_csv("adult_predictions.csv", index_col=0)

## Run Thetis to analyze and evaluate the AI system

You can download the [demo configuration file](https://raw.githubusercontent.com/EFS-OpenSource/Thetis/main/examples/demo_config_classification.yaml) for this example from our repository. For detailed information on Thetis configuration, refer to the [Configuration](https://efs-opensource.github.io/Thetis/configuration.html) section.

In addition to generating the report in PDF format, which we display below, Thetis also returns its findings, final rating, and recommendations for mitigation strategies as a JSON-like dictionary. We capture this dictionary as `result` and access it as follows:

* `result[<aspect>]` contains a sub-dictionary with results for each aspect of the analysis, e.g. 'fairness' or 'uncertainty'.
* `result[<aspect>]['rating_score']` contains the rating as a score from 0 to 10.
* `result[<aspect>]['rating_enum']` contains the rating as a grade, which can be `'GOOD'`, `'MEDIUM'`, or `'BAD'`, depending on the rating score.
* `result[<aspect>]['recommendations']` contains findings regarding possible issues and recommendations for mitigation.

In [None]:
from thetis import thetis


result = thetis(
   config="demo_config_classification.yaml",
   annotations=annotations,
   predictions=predictions,
   output_dir="./output",
   license_file_path="demo_license_classification.dat"
)

In [None]:
from IPython.display import IFrame
IFrame("./output/report.pdf", width=800, height=1024)