Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-processing for adult, german and compas dataset #149

Closed
vsahil opened this issue Feb 22, 2020 · 5 comments
Closed

Pre-processing for adult, german and compas dataset #149

vsahil opened this issue Feb 22, 2020 · 5 comments

Comments

@vsahil
Copy link

vsahil commented Feb 22, 2020

I am trying to use AIF360 tool in one of my projects. I have facing problem in understanding the purpose of pre-processing, say, german credit dataset as described in the file AIF360/aif360/algorithms/preprocessing/optim_preproc_helpers/data_preproc_functions.py.
Why is the custom processing described here needed in several algorithms like optimum pre-processing, meta classifier, reject classification. On the other hand, several algorithms do not require this for eg. adversarial debiasing, disparate impact remover, reweighing. Can you please help me understand the purpose?

Thank you.

@bonejay
Copy link

bonejay commented Mar 3, 2020

It is even worse. Disparate Impact Remover does not even work with a preprocessed dataset, so you can not compare the different algorithms on the same data.

@vsahil
Copy link
Author

vsahil commented Mar 3, 2020

Well, I got disparate impact remover working with German credit dataset (no pre-processing). My only concern is that if some techniques require/or are given pre-processed data, then we can't compare across them which is bad.

@bonejay
Copy link

bonejay commented Mar 3, 2020

Literally in the boat haha. My issue is that some others algos do not like the unprocessed, whole datasets (likely cause it is one hot encoded and too large). I had issues on the Learning Fair Representations, where I tried the complete unprocessed datasets. Instead of giving me actual results it either made the dataset consits solely of 0s or every row was the same value.

@nrkarthikeyan
Copy link
Collaborator

I believe the first comment by @vsahil is based on the notebooks in the examples folder. The reason why we choose to this custom preprocessing for optimized preprocessing is because we want small number of features and small number of categories per feature. It just gives more compact datasets (in terms of features). This is mainly for computational/statistical reasons since the probability estimates used in optimized pre-processing will not be good otherwise. For the other notebooks, sometimes we choose to use smaller datasets to limit runtime, else travis will not be able to execute them fast enough during testing.

Also, generally speaking, the notebooks are just an example of how we can implement various bias mitigation methods, and are by no means the only way to do things. Other approaches of loading data can be definitely tried out. However, like in all ML/data science approaches, choosing the right kind of data pre-processing for the algorithm is question may take a few trials of experimentation.

Hope this helps. If you have specific problems, please provide concrete reproducible examples and raise issues.

@frios2020
Copy link

frios2020 commented Jun 1, 2020

Hi I have a simple question I am using GermanDataset
GermanDataSet is a class?? Where can I see the parameteres that it receive?
Can i get the dataset German in a dataframe using this package and extract values of 'age' column
I am trying it but it didnt return nothing
thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants