Pre-processing for adult, german and compas dataset #149

vsahil · 2020-02-22T22:08:42Z

I am trying to use AIF360 tool in one of my projects. I have facing problem in understanding the purpose of pre-processing, say, german credit dataset as described in the file AIF360/aif360/algorithms/preprocessing/optim_preproc_helpers/data_preproc_functions.py.
Why is the custom processing described here needed in several algorithms like optimum pre-processing, meta classifier, reject classification. On the other hand, several algorithms do not require this for eg. adversarial debiasing, disparate impact remover, reweighing. Can you please help me understand the purpose?

Thank you.

bonejay · 2020-03-03T15:20:41Z

It is even worse. Disparate Impact Remover does not even work with a preprocessed dataset, so you can not compare the different algorithms on the same data.

vsahil · 2020-03-03T22:22:44Z

Well, I got disparate impact remover working with German credit dataset (no pre-processing). My only concern is that if some techniques require/or are given pre-processed data, then we can't compare across them which is bad.

bonejay · 2020-03-03T23:42:08Z

Literally in the boat haha. My issue is that some others algos do not like the unprocessed, whole datasets (likely cause it is one hot encoded and too large). I had issues on the Learning Fair Representations, where I tried the complete unprocessed datasets. Instead of giving me actual results it either made the dataset consits solely of 0s or every row was the same value.

nrkarthikeyan · 2020-03-04T19:17:42Z

I believe the first comment by @vsahil is based on the notebooks in the examples folder. The reason why we choose to this custom preprocessing for optimized preprocessing is because we want small number of features and small number of categories per feature. It just gives more compact datasets (in terms of features). This is mainly for computational/statistical reasons since the probability estimates used in optimized pre-processing will not be good otherwise. For the other notebooks, sometimes we choose to use smaller datasets to limit runtime, else travis will not be able to execute them fast enough during testing.

Also, generally speaking, the notebooks are just an example of how we can implement various bias mitigation methods, and are by no means the only way to do things. Other approaches of loading data can be definitely tried out. However, like in all ML/data science approaches, choosing the right kind of data pre-processing for the algorithm is question may take a few trials of experimentation.

Hope this helps. If you have specific problems, please provide concrete reproducible examples and raise issues.

frios2020 · 2020-06-01T00:14:44Z

Hi I have a simple question I am using GermanDataset
GermanDataSet is a class?? Where can I see the parameteres that it receive?
Can i get the dataset German in a dataframe using this package and extract values of 'age' column
I am trying it but it didnt return nothing
thank you

nrkarthikeyan closed this as completed Mar 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-processing for adult, german and compas dataset #149

Pre-processing for adult, german and compas dataset #149

vsahil commented Feb 22, 2020

bonejay commented Mar 3, 2020

vsahil commented Mar 3, 2020

bonejay commented Mar 3, 2020

nrkarthikeyan commented Mar 4, 2020

frios2020 commented Jun 1, 2020 •

edited

Pre-processing for adult, german and compas dataset #149

Pre-processing for adult, german and compas dataset #149

Comments

vsahil commented Feb 22, 2020

bonejay commented Mar 3, 2020

vsahil commented Mar 3, 2020

bonejay commented Mar 3, 2020

nrkarthikeyan commented Mar 4, 2020

frios2020 commented Jun 1, 2020 • edited

frios2020 commented Jun 1, 2020 •

edited