## Welcome to Aequitas

The Aequitas toolkit is a flexible bias-audit utility for algorithmic decision-making models, accessible via Python API, command line interface (CLI), and through our [web application](http://aequitas.dssg.io/). 

Use Aequitas to evaluate model performance across several bias and fairness metrics, and utilize the [most relevant metrics](https://dsapp.uchicago.edu/wp-content/uploads/2018/05/metrictree-1200x750.png) to your process in model selection.

Aequitas will help you:

- Understand where any error-related biases exist in your model(s)
- Compare the level of bias between groups in your sample population (bias disparity)
- Visualize absolute bias metrics and their related disparities for rapid comprehension and decision-making

Our goal is to support informed and equitable action for both machine learnining practitioners and the decision-makers who rely on them.

Read the [documentation](https://dssg.github.io/aequitas/).

Aequitas is compatible with: **Python 3.6+**

## Aequitas Priorities
- **Flexibility:** Use Aequitas in the way that makes the most sense for your existing processes. Pass model results as a Python dataframe; in CSV format via Python API, [web application](http://aequitas.dssg.io/), upload a CSV of model results to the Aequitas or the [CLI](https://dssg.github.io/aequitas/CLI.html); or from database tables defined in a [configuration YAML](https://dssg.github.io/aequitas/CLI.html) via CLI. 

- **Customization:** The relevant bias and fairness metrics change depending on the type of project or intervention you're planning for. Configure Aequitas to generate the metrics and visualizations most important to your analyses, or choose view all calculated bias metrics.

- **Clarity:** Aequitas provides a full picture of commonly used bias metrics in your models, how they differ across groups (disparities), and contextual information on the statistical significance of each calculated disparity. Visualization methods allow for rapid comparison between groups and models.

<a id='getting_started'></a>

# Getting started with *aequitas-report* 

With aequitas-report uncovering bias is as simple as running a single command on a csv.


## Input machine learning predictions

After [installing on your computer](./installation.html)

Run `aequitas-report` on [COMPAS data](https://github.com/dssg/aequitas/tree/master/examples): 
```
aequitas-report --input compas_for_aequitas.csv
```


| score     | label_value| race | sex | age_cat |
| --------- |------------| -----| --- | ------- |
|   0       | 1          | African-American | Male | 25 - 45 |
|   1       | 1          | Native American | Female | Less than 25 |


Input data has slightly different requirements depending on whether you are using Aequitas via the webapp, CLI or Python package. In general, input data is a single table with the following columns:

- `score`
- `label_value` (for error-based metrics only)
- at least one attribute e.g. `race`, `sex` and `age_cat` (attribute categories defined by user)

Find specific input data requirements for [Python API](#python_input), [web app](#webapp_input), and [CLI](#cli_input) below.

Additionally, disparity is always defined in relation to a reference group. By default, Aequitas uses majority as the reference. [Defining a reference group](./config.html)

## Get bias measures tailored to your problem

### The Bias Report output

The Bias Report produces a pdf that returns descriptive interpretation of the results along with three sets of tables. 

* Fairness Measures Results
* Bias Metrics Results
* Group Metrics Results

Additionally, a csv is produced that contains the relevant data. More information about output [here](./output_data.html).

### Commandline output

In the command line you will see The Bias Report, which returns counts for each attribute by group and then computes various fairness metrics. This is the same information that is captured in the csv output. 


```

                    
                    ___                    _ __            
                   /   | ___  ____ ___  __(_) /_____ ______
                  / /| |/ _ \/ __ `/ / / / / __/ __ `/ ___/
                 / ___ /  __/ /_/ / /_/ / / /_/ /_/ (__  ) 
                /_/  |_\___/\__, /\__,_/_/\__/\__,_/____/  
                              /_/                          



____________________________________________________________________________

                      Bias and Fairness Audit Tool
____________________________________________________________________________





Welcome to Aequitas-Audit
Fairness measures requested: Statistical Parity,Impact Parity,FDR Parity,FPR Parity,FNR Parity,FOR Parity
model_id, score_thresholds 1 {'rank_abs': [3317]}
COUNTS::: race
African-American    3696
Asian                 32
Caucasian           2454
Hispanic             637
Native American       18
Other                377
dtype: int64
COUNTS::: sex
Female    1395
Male      5819
dtype: int64
COUNTS::: age_cat
25 - 45            4109
Greater than 45    1576
Less than 25       1529
dtype: int64
audit: df shape from the crosstabs: (11, 26)
get_disparity_major_group()
number of rows after bias majority ref group: 11
Any NaN?:  False
bias_df shape: (11, 38)
Fairness Threshold: 0.8
Fairness Measures: ['Statistical Parity', 'Impact Parity', 'FDR Parity', 'FPR Parity', 'FNR Parity', 'FOR Parity']

... 
```

## Getting Started with Input Data

<a id='cli_input'></a>

### Input data for CLI

The CLI accepts csv files and also accomodates database calls defined in Configuration files.

![](_static/CLI_input.jpg)

#### `score`
By default, Aequitas CLI assumes the `score` column is a binary decision (0 or 1). Alternatively, the `score` column can contain the score (e.g. the output from a logistic regression applied to the data). In this case, the user sets a threshold to determine the binary decision. See [configurations](./config.html) for more on thresholds.

#### `label_value`

As with the webapp, this is the ground truth value of a binary decision. The data must be binary 0 or 1.

#### attributes e.g. `race`, `sex`, `age`,`income`

Group columns can be categorical or continuous. If categorical, Aequitas will produce crosstabs with bias metrics for each group_level. If continuous, Aequitas will first bin the data into quartiles.

#### `model_id`

`model_id` is an identifier tied to the output of a specific model. With a `model_id` column you can test the bias of multiple models at once. This feature is available using the CLI or the Python package.


#### Reserved column names:

* `id`
* `model_id`
* `entity_id`
* `rank_abs`
* `rank_pct`

<a id='python_input'></a>

### Input data for Python package

Python input data can be handled identically to CLI by using `preprocess_input_df()`. Otherwise, you must discretize continuous attribute columns prior to passing the data to `Group().get_crosstabs()`.

```{python}

from Aequitas.preprocessing import preprocess_input_df()

# *input_data* matches CLI input data norms.
df, _ = preprocess_input_df(*input_data*)

```
![](_static/python_input.jpg)


#### `score`
By default, Aequitas CLI assumes the `score` column is a binary decision (0 or 1). Alternatively, the `score` column can contain the score (e.g. the output from a logistic regression applied to the data). In this case, the user sets a threshold to determine the binary decision. See [configurations](./config.html) for more on thresholds. Threshholds are set in a dictionary passed to `get_crosstabs()`.

#### `label_value`
As with the webapp, this is the ground truth value of a binary decision. The data must be binary 0 or 1.

#### attributes e.g. `race`, `sex`, `age`,`income` 

Group columns can be categorical or continuous. If categorical, Aequitas will produce crosstabs with bias metrics for each group_level. If continuous, Aequitas will first bin the data into quartiles.

If you plan to bin or discritize continuous features manually, note that `get_crosstabs()` expects attribute columns to be type string. This excludes pandas 'categorical' data type, which is the default output of certain pandas discritizing functions. You can recast 'categorical' columns to strings as follows:

```
df['categorical_type'] = df['categorical_type'].astype(str)
```

#### `model_id`

`model_id` is an identifier tied to the output of a specific model. With a `model_id` column you can test the bias of multiple models at once. This feature is available using the CLI or the Python package.

#### Reserved column names:

* `id`
* `model_id`
* `entity_id`
* `rank_abs`
* `rank_pct`

<a id='webapp_input'></a>

### Input data for Webapp

The webapp requires a single CSV with columns for a binary `score`, a binary `label_value` and an arbitrary number of attribute columns. Each row is associated with a single observation.

![](_static/webapp_input.jpg)

#### `score`

Aequitas webapp assumes the `score` column is a binary decision (0 or 1).

#### `label_value`

This is the ground truth value of a binary decision. The data again must be binary 0 or 1.

#### attributes e.g. `race`, `sex`, `age`,`income`

Group columns can be categorical or continuous. If categorical, Aequitas will produce crosstabs with bias metrics for each group_level. If continuous, Aequitas will first bin the data into quartiles and then create crosstabs with the newly defined categories.

[Return to Getting Started](#getting_started)