Add a bias detector based on optimal transport #434

jmarecek · 2023-01-22T11:07:49Z

There is an increasing interest in estimating bias in terms of Wasserstein-2 distances (or related distances between measures, often computable using optimal-transport algorithms), cf.
http://proceedings.mlr.press/v97/gordaliza19a.html (ICML 2019)
https://academic.oup.com/imaiai/article-abstract/8/4/817/5586771 (Information and Inference, 2019)
https://openreview.net/forum?id=-welFirjMss (NeurIPS 2022)
Following some discussions with Rahul Nair (rahul.nair@ie.ibm.com), we would like to contribute a new detector, with the same signature as the original mdss detector of @Adebayo-Oshingbesan:
https://github.com/Trusted-AI/AIF360/blob/master/aif360/detectors/mdss_detector.py

Could we ask @hoffmansc for a code review, please?

Closes #433

hoffmansc · 2023-02-13T15:39:58Z

Hi, @jmarecek. Thanks for your work!

I have a couple requests:

Can you include an ot_bias_scan function in aif360/sklearn/detectors/ as well? The functionality should be the same but the signature is a bit different. This is just for consistency. See aif360/sklearn/detectors/detectors.py for an example.
Can you import your function in the __init__.py files for aif360/detectors/ and aif360/sklearn/detectors/? Again, for consistency. This will allow us to import the function directly from the subpackage instead of the source file.
I think you'll need to add ot to the dependency list. You can do this by adding an entry to the extras dict in setup.py (you can name it 'OptimalTransport') as well as requirements.txt.
We also need a unit test file for this PR. Please see the tests section of the contribution guide for more information.
For the demo notebook, could you add some text explaining what the ot_matrix means, how it should be used, etc.? Also, can you make it clear what the difference between this and MDSS bias scan is in the text? The description just seems copy-pasted from the other notebook.
Finally, this isn't a big deal but we use Google-style docstrings for this project. I believe the formatting you used causes some minor issues when built with Sphinx.

At a high level, though, there seems to be a lot of repeated code here. Is there a way we can reuse the MDSS bias scan code for this? For example, how does this differ from adding another "scoring function"? That seems much cleaner to me and would cut down considerably on the checklist above.

jmarecek · 2023-04-15T19:10:29Z

Hi @hoffmansc,

I hope all is well. Illia has pushed the unit tests, doc strings, etc. It would be great to revisit this.

I don't see how the MDSS scan could be extended to work with OT (or many other types of bias).

Illia and I would be happy to jump on a call, should this be faster?

Thanks!
Jakub

rahulnair23 · 2023-05-15T10:31:24Z

Hi @jmarecek @Illia-Kryvoviaz - I've reviewed this as well and happy to have a call to discuss some details here. In addition to the points raised above by @hoffmansc (FYI - point 1 is still open, and points 2 and 3 have only been partially completed), please also consider:

The output of the bias detector here appears to only return the cost of the transport. This in itself lacks context. Could you include additional context, for example, similar to the output of MDSS scan?
The demo notebook here contains the references to the MDSS paper. Is there a reference paper for use of optimal transport for bias detection you can cite here?
The demo notebook on the COMPASS dataset ends with the cost of the transport which isn't illustrative as a bias scan. Are the additional steps you can include to show use?

Thank you.

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Signed-off-by: Illia-Kryvoviaz <113794017+Illia-Kryvoviaz@users.noreply.github.com>

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Illia-Kryvoviaz · 2023-07-11T12:49:21Z

Hello @hoffmansc!
We updated the code based on your feedback. Could you please check out the changes and the argument ordering question?

As for detector vs metric, we argue that this should be an independent detector, rather than a metric, because it is too time-consuming to be a metric or run the subgroup analysis similar to MDSS.

hoffmansc · 2023-07-11T15:44:28Z

Thanks for your updates. I'm not sure I understand your point about detector vs. metric. It's fine if the metric is slow. The question is which classification better fits the algorithm and I think if it doesn't identify a subgroup like MDSS it shouldn't be called a detector. Metrics are simply anything which takes in data (predictions, ground truth, features, groups, etc.) and returns a numerical score (either per-group or "reduced" by taking the difference between groups, for example). Am I misinterpreting your algorithm?

jmarecek · 2023-07-11T15:51:25Z

Like I mentioned above, there are a variety of scoring functions within the "Optimal transport perspective", incl.
https://en.wikipedia.org/wiki/Wasserstein_metric Wp for integers p,
https://pythonot.github.io/auto_examples/gromov/plot_gromov.html for various integers and metrics on the underlying spaces,
fused Gromov-Wasserstein variants, etc. So I don't really think it is a single scoring function.

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

hoffmansc · 2023-07-11T20:22:06Z

I understand there can be multiple distance/scoring functions. I'm not saying there needs to be any changes to the function just that it would be better suited under aif360.metrics instead of aif360.detectors. Even though it has options with respect to scoring function, the output is still a numerical value not a subgroup.

krvarshney · 2023-07-11T20:26:13Z

Sam, your logic for it to be under metrics makes sense to me.

jmarecek · 2023-07-12T06:18:16Z

Hi Kush -- so what about this: we contribute the OT-based metrics to aif360.metrics, and we add some "SimpleDetector" to detectors, which would not try to run the subgroup scan. That way, the current functionality of the code would be easy to access. Would that work? Jakub

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

rahulnair23 · 2023-07-14T12:46:42Z

Thanks for the call and discussion today @hoffmansc @jmarecek . As concluded, from an end-user point of view the OT-based measures are better suited under aif360.metrics (as opposed to aif360.detectors).

hoffmansc · 2023-07-14T12:54:18Z

Actually, to be clear, it's only strictly necessary to put it under aif360.sklearn.metrics. This is the most straightforward thing to implement. It is optional to put it under aif360.metrics (i.e., the class-based Metric interface). The scikit-learn style functions are preferred going forward and the Metric class is only supported for backwards compatibility.

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Illia-Kryvoviaz · 2023-07-14T16:10:33Z

Hello @hoffmansc!
We've put the ot_bias_scan under the aif360.sklearn.metrics. Additionally, we've renamed the ot_detector to ot_metric and moved the code from the aif360.detectors folder to the aif360.metrics folder - it would seem more logical to have it there now. Could you please review the changes? Thank you!

hoffmansc

This looks good for now, thanks. I made one suggestion -- renaming the function to ot_distance. If this is acceptable, I can merge it.

Down the line, I would like this to integrate with aif360.sklearn.metrics.intersection()/difference() instead of how it handles prot_attr currently but I can make a separate PR with those changes and ask for your review.

aif360/metrics/ot_metric.py

aif360/sklearn/detectors/detectors.py

aif360/sklearn/metrics/metrics.py

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Illia-Kryvoviaz · 2023-07-21T20:20:32Z

Hello @hoffmansc!
All changes were implemented as you suggested.

hoffmansc · 2023-07-21T21:11:33Z

@Illia-Kryvoviaz It looks like one of the tests is failing. This line should be pos_label=fav

AIF360/tests/test_ot_metric.py

Line 136 in bb2ec68

ot_distance(p, q, favorable_value=fav)

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Illia-Kryvoviaz · 2023-07-21T21:35:54Z

Yes, you are right @hoffmansc. Now it looks right.

Signed-off-by: Divya <divyajyothig96@gmail.com>

Signed-off-by: Venkata Meghana Achanta <vachanta@usc.edu> Signed-off-by: meghana009 <meghanaachanta09@gmail.com>

Illia-Kryvoviaz force-pushed the master branch from 6909467 to ac9784a Compare January 22, 2023 11:25

hoffmansc self-requested a review February 13, 2023 14:58

Illia-Kryvoviaz force-pushed the master branch 2 times, most recently from 8397ee4 to 7271964 Compare March 24, 2023 13:30

Illia-Kryvoviaz force-pushed the master branch 12 times, most recently from c6c90fd to fef1bc9 Compare June 7, 2023 14:38

Updated all required files

8e42e67

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Illia-Kryvoviaz force-pushed the master branch from 105e811 to 8e42e67 Compare June 7, 2023 14:53

Some minor changes in files and updated tests

7a85647

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Illia-Kryvoviaz force-pushed the master branch from db7c6db to 7a85647 Compare June 7, 2023 15:04

Deleted extra prints

0d276aa

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Illia-Kryvoviaz force-pushed the master branch from 29c53a0 to 0d276aa Compare June 7, 2023 15:13

Simplifying the ot notebook and correcting some mistypes

a6faacd

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Illia-Kryvoviaz force-pushed the master branch from ba9e25c to a6faacd Compare June 10, 2023 17:49

Added more examples to ot notebook

2db5f83

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Illia-Kryvoviaz force-pushed the master branch from 809b7d4 to 2db5f83 Compare June 12, 2023 12:10

Update __init__.py

6d77b72

Signed-off-by: Illia-Kryvoviaz <113794017+Illia-Kryvoviaz@users.noreply.github.com>

updated comments, demo_ot_detector.ipynb

9c46f9a

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Illia-Kryvoviaz added 5 commits July 11, 2023 19:14

ot_detector: removed str arguments

872f112

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

ot_detector: added cost_matrix as a named parameter, minor changes

d280b4e

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

ot_detector: minor changes

aa24084

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

added outputs to demo_ot_detector

f587762

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

ot_detector: changed default scoring to Wasserstein1

2094b47

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

moved OT from detector to metric

47a2678

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

Illia-Kryvoviaz added 2 commits July 14, 2023 17:27

renamed ot_detector to ot_metric

6acd270

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

reworked demo_ot_metric to use aif360.sklearn definition

6155f99

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

hoffmansc requested changes Jul 21, 2023

View reviewed changes

Illia-Kryvoviaz added 2 commits July 21, 2023 22:05

renamed ot_bias_scan to ot_distance, minor changes

3d3f039

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

detectors.py: reset changes

bb2ec68

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

test_ot_metric: minor changes

10c7ec8

Signed-off-by: Illia-Kryvoviaz <illiakryvoviaz@gmail.com>

hoffmansc approved these changes Jul 23, 2023

View reviewed changes

hoffmansc merged commit 502ff47 into Trusted-AI:master Jul 23, 2023
9 checks passed

andrewklayk pushed a commit to andrewklayk/AIF360 that referenced this pull request Sep 8, 2023

Add a bias detector based on optimal transport (Trusted-AI#434)

66a2634

divyagaddipati pushed a commit to divyagaddipati/AIF360 that referenced this pull request Sep 22, 2023

Add a bias detector based on optimal transport (Trusted-AI#434)

12a9da6

Signed-off-by: Divya <divyajyothig96@gmail.com>

meghana009 pushed a commit to meghana009/AIF360 that referenced this pull request Sep 22, 2023

Add a bias detector based on optimal transport (Trusted-AI#434)

23dd19d

Signed-off-by: Venkata Meghana Achanta <vachanta@usc.edu> Signed-off-by: meghana009 <meghanaachanta09@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a bias detector based on optimal transport #434

Add a bias detector based on optimal transport #434

jmarecek commented Jan 22, 2023

hoffmansc commented Feb 13, 2023

jmarecek commented Apr 15, 2023 •

edited

Loading

rahulnair23 commented May 15, 2023

Illia-Kryvoviaz commented Jul 11, 2023

hoffmansc commented Jul 11, 2023

jmarecek commented Jul 11, 2023

hoffmansc commented Jul 11, 2023

krvarshney commented Jul 11, 2023

jmarecek commented Jul 12, 2023

rahulnair23 commented Jul 14, 2023

hoffmansc commented Jul 14, 2023

Illia-Kryvoviaz commented Jul 14, 2023

hoffmansc left a comment

Illia-Kryvoviaz commented Jul 21, 2023

hoffmansc commented Jul 21, 2023

Illia-Kryvoviaz commented Jul 21, 2023

Add a bias detector based on optimal transport #434

Add a bias detector based on optimal transport #434

Conversation

jmarecek commented Jan 22, 2023

hoffmansc commented Feb 13, 2023

jmarecek commented Apr 15, 2023 • edited Loading

rahulnair23 commented May 15, 2023

Illia-Kryvoviaz commented Jul 11, 2023

hoffmansc commented Jul 11, 2023

jmarecek commented Jul 11, 2023

hoffmansc commented Jul 11, 2023

krvarshney commented Jul 11, 2023

jmarecek commented Jul 12, 2023

rahulnair23 commented Jul 14, 2023

hoffmansc commented Jul 14, 2023

Illia-Kryvoviaz commented Jul 14, 2023

hoffmansc left a comment

Choose a reason for hiding this comment

Illia-Kryvoviaz commented Jul 21, 2023

hoffmansc commented Jul 21, 2023

Illia-Kryvoviaz commented Jul 21, 2023

jmarecek commented Apr 15, 2023 •

edited

Loading