New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-processing for adult, german and compas dataset #149
Comments
It is even worse. Disparate Impact Remover does not even work with a preprocessed dataset, so you can not compare the different algorithms on the same data. |
Well, I got disparate impact remover working with German credit dataset (no pre-processing). My only concern is that if some techniques require/or are given pre-processed data, then we can't compare across them which is bad. |
Literally in the boat haha. My issue is that some others algos do not like the unprocessed, whole datasets (likely cause it is one hot encoded and too large). I had issues on the Learning Fair Representations, where I tried the complete unprocessed datasets. Instead of giving me actual results it either made the dataset consits solely of 0s or every row was the same value. |
I believe the first comment by @vsahil is based on the notebooks in the examples folder. The reason why we choose to this custom preprocessing for optimized preprocessing is because we want small number of features and small number of categories per feature. It just gives more compact datasets (in terms of features). This is mainly for computational/statistical reasons since the probability estimates used in optimized pre-processing will not be good otherwise. For the other notebooks, sometimes we choose to use smaller datasets to limit runtime, else travis will not be able to execute them fast enough during testing. Also, generally speaking, the notebooks are just an example of how we can implement various bias mitigation methods, and are by no means the only way to do things. Other approaches of loading data can be definitely tried out. However, like in all ML/data science approaches, choosing the right kind of data pre-processing for the algorithm is question may take a few trials of experimentation. Hope this helps. If you have specific problems, please provide concrete reproducible examples and raise issues. |
Hi I have a simple question I am using GermanDataset |
I am trying to use AIF360 tool in one of my projects. I have facing problem in understanding the purpose of pre-processing, say, german credit dataset as described in the file AIF360/aif360/algorithms/preprocessing/optim_preproc_helpers/data_preproc_functions.py.
Why is the custom processing described here needed in several algorithms like optimum pre-processing, meta classifier, reject classification. On the other hand, several algorithms do not require this for eg. adversarial debiasing, disparate impact remover, reweighing. Can you please help me understand the purpose?
Thank you.
The text was updated successfully, but these errors were encountered: