# Fairness datasets

This notebook showcases the datasets available in the module. We will take a glance at the task each one of them proposes as well as the format.

The data sets were downloaded from the github repository https://github.com/i-gallegos/Fair-LLM-Benchmark.

Reference: Gallegos, I. O., Rossi, R. A., Barrow, J., Tanjim, M. M., Kim, S., Dernoncourt, F., ... & Ahmed, N. K. (2024). Bias and fairness in large language models: A survey. Computational Linguistics, 1-79.

Preprint: https://arxiv.org/abs/2309.00770.


Without further ado let us make the necessary imports:

In [2]:
import sys
import os

sys.path.append(os.getcwd() + '/..' )
from FairnessDatasets.FairnessDatasets import BiasDataLoader

For starters we can take a look to the available datasets with the empty initialization of `BiasDataLoader` method:

In [2]:
BiasDataLoader()

Available datasets:

BBQ
BEC-Pro
BOLD
BUG
CrowS-Pairs
GAP
HolisticBias
StereoSet
UnQover
WinoBias+
WinoBias
Winogender


The arguments of `BiasDataLoader` are `dataset`, the name of the dataset (shown above), `config` (which further specifies the dataset) and `format` (accepts either `raw` for raw pd/txt format, `pt` for PyTorch dataset, `hf` for hugging face data set). If we input an empty config when many options are available, the function automatically tells us the available options:

In [4]:
BiasDataLoader(dataset = 'BBQ')

Available configurations:

Age
Disability_Status
Gender_indentity
Nationality
Physical_appearance
Race_ethnicity
Race_x_gender
Race_x_SES
Religion
SES
Sexual_orientation
all


We can now insert any of the configurations to extract the corresponding data set:

In [None]:
ageBBQ = BiasDataLoader(dataset = 'BBQ', config = 'Age')
print(ageBBQ['data'][0])

We can also change the format of the outputs with the `format` parameter:

In [None]:
ageBBQraw = BiasDataLoader(dataset = 'BBQ', config = 'Age', format = 'raw')
print(ageBBQraw.head())

   example_id  question_index question_polarity context_condition category  \
0           0               1               neg             ambig      Age   
1           1               1               neg          disambig      Age   
2           2               1            nonneg             ambig      Age   
3           3               1            nonneg          disambig      Age   
4           4               1               neg             ambig      Age   

                                         answer_info  \
0  {'ans0': ['grandfather', 'old'], 'ans1': ['gra...   
1  {'ans0': ['grandfather', 'old'], 'ans1': ['gra...   
2  {'ans0': ['grandfather', 'old'], 'ans1': ['gra...   
3  {'ans0': ['grandfather', 'old'], 'ans1': ['gra...   
4  {'ans0': ['grandmother', 'old'], 'ans1': ['gra...   

                                 additional_metadata  \
0  {'subcategory': 'None', 'stereotyped_groups': ...   
1  {'subcategory': 'None', 'stereotyped_groups': ...   
2  {'subcategory': 'None',

In [18]:
print(type(ageBBQraw))
print(type(ageBBQ))

<class 'pandas.core.frame.DataFrame'>
<class 'dict'>
