## The Data

In this example, we use the [German Credit Dataset](https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)) and examine the protected variable  `age <= 25` to avoid the model giving higher credit risk to people with young age. The data has been processed so that the features are human-readable. You can download the processed data [here](https://github.com/AitoDotAI/aitoai-ai-fairness/blob/master/data/german_credit_rating.ndjson)

## Detect the data bias and the model bias

Aito offers the [Relate API](https://aito.ai/docs/api/#post-api-v1-relate) the find the relationships between different feature. In this case, we want to examine the relationships between the people of young age and having a bad credit rating using the following query:
```json
{
    "from": "german_credit_rating",
    "where": {
        "age": {"$lt": 25}
    },
    "relate": {"credit_rating": "bad"},
    "select": ["related", "condition", "lift", "ps", "fs"]
}
```


We can run the query using the [AitoClient](https://aitodotai.github.io/aito-python-tools/api/aito_client.html?highlight=aitoclient#) from the [Aito Python SDK](https://aitodotai.github.io/aito-python-tools/index.html)

In [None]:
!pip install aitoai

In [1]:
from aito.sdk.aito_client import AitoClient
import json

In [2]:
client = AitoClient('https://public-1.api.aito.ai', 'bvss2i2dIkaWUfBCdzEO89LpxUkwO3A24hYg8MBq')
relate_query = {
    "from": "german_credit_rating",
    "where": {
        "age": {"$lt": 25}
    },
    "relate": {"credit_rating": "bad"},
    "select": ["related", "condition", "lift", "ps", "fs"]
}
resp = client.request('POST', '/api/v1/_relate', relate_query)
print(json.dumps(resp, indent=4))

{
    "offset": 0,
    "total": 1,
    "hits": [
        {
            "related": {
                "credit_rating": {
                    "$has": "bad"
                }
            },
            "condition": {
                "age": {
                    "$lt": 25
                }
            },
            "lift": 1.344021803525694,
            "ps": {
                "p": 0.3003992015968064,
                "pOnCondition": 0.4037430767078182,
                "pOnNotCondition": 0.2818548581580362,
                "pCondition": 0.1490361738672945
            },
            "fs": {
                "f": 300.0,
                "fOnCondition": 61.0,
                "fOnNotCondition": 239.0,
                "fCondition": 149.0,
                "n": 1000.0
            }
        }
    ]
}


We will go through each field of the query result to see what Aito has discovered:

- `"lift": 1.344021803525694`:
    - You can read more about lift [here](https://aito.ai/docs/api/#p-vs-lift)
    - This means that people with age of less than or equal to 25 will be approximately 34% more likely to get a bad credit rating
    
- `"fs"` - frequencies:
    - `"f": 300.0`: 300 samples with bad credit rating in the data
    - `"fOnCondition": 61.0`: 61 samples with bad credit rating and of young age
    - `"fOnNotCondition": 239.0`: 239 samples with bad credit rating and not of young age
    - `"fCondition": 149.0`: 149 samples of young age
    - `"n": 1000.0`: 1000 total samples
    - From the frequencies, we can observer that young people are not mis represented in the training data. People of young age takes about 20% (61/300) of the number of samples with bad credit rating and among the people of young age, roughly 41% (61/149) of them have bad credit rating.
- `"ps"` - estimate probabilities:
    - `"p": 0.3003992015968064`: Base probability of getting a bad credit rating
    - `"pOnCondition": 0.4037430767078182`: P(bad crediting rating | age <= 25)
    - `"pOnNotCondition"`: 0.2818548581580362`: P(bad crediting rating | age > 25)
    - `"pCondition": 0.1490361738672945`: P(age <= 25)

One common metric to measure the AI system fairness is the [statistical parity difference metric](https://dl.acm.org/doi/10.1145/2090236.2090255).
It is the difference of the rate of favorable outcomes received by the "protected" group and the non-protected gorup. In this case, it can be formularized as:
```
P(bad credit rating | age <= 25) - P(bad credit rating | age > 25) = pOnCondition - pOnNotCondition
```
The ideal value of statistical parity difference is 0 and a value in the range of [-0.1, 0.1] is considered to be fair. In this example, we got a value of 0.12 which is demonstrates that Aito estimation is not too far off the standard.