Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fairness Aware Counterfactuals for Subgroups #457

Merged
merged 53 commits into from
Mar 11, 2024

Conversation

phantom-duck
Copy link
Contributor

@phantom-duck phantom-duck commented Jun 29, 2023

@hoffmansc

Hello! We would like to contribute the implementation for the paper "Fairness Aware Counterfactuals for Subgroups".

Fairness Aware Counterfactuals for Subgroups (FACTS), a framework for auditing subgroup fairness through counterfactual explanations. We aim to (a) formulate different aspects of the difficulty of individuals in certain subgroups to achieve recourse, i.e. receive the desired outcome, either at the micro level, considering members of the subgroup individually, or at the macro level, considering the subgroup as a whole, and (b) introduce notions of subgroup fairness that are robust, if not totally oblivious, to
the cost of achieving recourse.

For the implementation, we expose to the user a set of functions which perform the main steps of our algorithm and which we have grouped in the wrapper class FairnessAwareSubgroupCounterfactuals in the facts submodule's __init__.

Closes #456

phantom-duck and others added 18 commits June 29, 2023 15:55
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
…etrics_aggr

Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
…ates

Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
@phantom-duck
Copy link
Contributor Author

Amended last commit - had forgotten to sign off.

We are looking forward to your feedback!

@phantom-duck
Copy link
Contributor Author

phantom-duck commented Aug 29, 2023

@hoffmansc (code review request)

Hello again! I would just like to remind you, when you get a chance, to take a look at the pull request and inform us of any corrections or improvements that may be needed; or merge the pull request if you believe it is adequate. In any case, we are looking forward to your feedback!

@hoffmansc
Copy link
Collaborator

Hi. First of all, thanks for contributing, this is great stuff! I have a couple comments before we get into the specifics of the code.

I see you’ve placed the code under algorithms.postprocessing. This subpackage is for bias mitigation algorithms which transform predictions and return fairer predictions. To me, this seems different than what FACTS is doing. I think there should be three places where the contributions from this work get exposed: sklearn.detectors, sklearn.metrics, and sklearn.explainers (as a side note, we can focus on the aif360.sklearn subpackage, which will be much more straightforward for this work).

explainers is the clearest match in my mind. Currently, there is one other PR under review that will also go here (#450). Basically, your whole reporting function can go here. The idea here is to offer textual or visual explanations for bias measurements/metrics.

Correspondingly, I think there should be a function under metrics which just outputs the unfairness score as a number, i.e., contribution (b) you mention. This could either be the highest score or possibly given a specific subgroup. This way, we can easily compare classifiers directly.

This brings me to the final location, detectors. We currently have one detector, MDSS, and the idea behind this class is to discover subgroups which are maximally advantaged/disadvantaged. From my understanding of FACTS, you similarly generate subgroups and further divide them according to a protected attribute. The outputs here would simply be the subgroup and score, not the entire explanation.

Essentially, I’m trying to think about the various ways a user might want to utilize this work as well as how they would discover it according to the taxonomy we have in AIF360.

Feel free to disagree if I’m misunderstanding anything. Once we settle these high-level questions, I’ll take a closer look at the code and make specific comments.

@phantom-duck
Copy link
Contributor Author

phantom-duck commented Sep 4, 2023

Note: we have edited the original comment; we thought that some phrases were a poor choice of words for conveying our thoughts.

Hello again, and thank you very much for the reply!

Firstly, we definitely agree that FACTS is not a good fit for algorithms.postprocessing.

That being said, on a higher level, we think of FACTS primarily as a detector. Although tightly connected to the metrics and the respective explanations, the primary goal of our work is to discover subgroups of the population where the model exhibits bias. Our framework implements 6 detectors:

  • Equal Effectiveness
  • Equal Choice for Recourse
  • Equal Effectiveness within Budget
  • Equal Cost of Effectiveness
  • Fair Effectiveness-Cost Trade-Off
  • Equal (Conditional) Mean Recourse

When someone chooses a detector, the metric of the chosen detector is used to score the unfairness of all subgroups available and ranks them in decreasing order of their unfairness (more unfair subgroups appear higher in the ranking). On the explanation side of things, the reporting that appears explains why the subgroups have the specific unfairness score based on the metric of the detector. It is there for interpretability and transparency of our algorithm's results. Thus, each detector has its own metric that scores subgroups and its own explainer that explains the unfairness based on the scores of the metric.

Looking forward to your view on the matter! In the meantime, we will make sure to rework all of this before the more detailed review you mentioned.

Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Credits to @dsachar.

Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Copy link

@rahulnair23 rahulnair23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @phantom-duck - thanks! This is looking good. Please see some general comments.

aif360/sklearn/detectors/facts/formatting.py Show resolved Hide resolved
examples/demo_FACTS.ipynb Outdated Show resolved Hide resolved
aif360/sklearn/detectors/facts/__init__.py Outdated Show resolved Hide resolved
def bias_scan(
self,
metric: str = "equal-effectiveness",
viewpoint: str = "macro",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does viewpoint mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to the notions of "macro viewpoint" and "micro viewpoint" defined in section 2.2 of the paper. As a short explanation, consider a set of actions A and a subgroup (cohort / set of individuals) G. Metrics with the macro viewpoint interpretation are constrained to always apply one action from A to the entire G, while metrics with the micro interpretation are allowed to give each individual in G the min-cost action from A which changes the individual's class.

I should mention, though, that not all combinations of metric and viewpoint are valid, e.g. "Equal Choice for Recourse" only has a macro interpretation. Do you think this should be reflected in the code somehow?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you just add a sentence to this effect in the docstring?

@hoffmansc
Copy link
Collaborator

Thanks for reviewing, @rahulnair23.

One other thing, @phantom-duck: would it be possible to combine the fit(), bias_scan(), and print_recourse_report() methods into a single function without the need for a class? This will better match the detectors API. Right now, I don't really see the point in making three separate calls when only one has outputs. Is it likely that users would run multiple metrics after fitting? The function can have a verbose flag to show the progress bars and a print_recourse_report flag to show the report otherwise just output the highest scoring subgroup and score like MDSS. If it's easier, you could also just keep the class and implement a wrapper function which does this.

@phantom-duck
Copy link
Contributor Author

phantom-duck commented Nov 17, 2023

Hello to both! Thank you for the comments, and sorry for the delayed response.

@rahulnair23 Mainly, thank you very much for the detailed review!! For the comments I have replied individually, with the exception of the demo notebook, where the points you raise are interesting and seem trickier to me (especially 2) so I will get back on that soon.

@hoffmansc Ok, thanks for this! Because it is not perfectly clear to me what exactly is the structure of the detectors API. For example, I do not see how the signature of the MDSS function sklearn.detectors.detectors.bias_scan() could be applied in our case. The way I see it, the function you describe would need to take as arguments a dataset X and the trained model clf mainly, plus some other more specific parameters. Would something along these lines be okay for what you are thinking?

More importantly, yes, exactly, in our mind, it is entirely possible that the user would run multiple metrics after fitting! Firstly, because fitting takes much longer to run, while the metrics run almost instantaneously. Secondly, because there are several metrics, and with a few extra parameters; so, we would expect (at least some) users to choose a few different overall parameterizations, based on the metrics' descriptions we give in the demo (or in the paper) and domain intuition, and inspect their results. These could very well yield different unfair subgroups.

Due to this, I think it best to keep the class, and of course implement an additional wrapper function as you suggest.

Thank you both again very much!

I will get back to you soon with the modifications and / or further comments. Meanwhile, it goes without saying that we are always available for further discussion.

@hoffmansc
Copy link
Collaborator

@hoffmansc Ok, thanks for this! Because it is not perfectly clear to me what exactly is the structure of the detectors API. For example, I do not see how the signature of the MDSS function sklearn.detectors.detectors.bias_scan() could be applied in our case. The way I see it, the function you describe would need to take as arguments a dataset X and the trained model clf mainly, plus some other more specific parameters. Would something along these lines be okay for what you are thinking?

Whatever you need as inputs is fine. I was more referring to the output and function vs. class method(s) paradigm matching MDSS.

As a separate note, is there a more specific/descriptive name than bias_scan? Just to differentiate it from the existing detector. Or we could rename that one to MDSS_bias_scan and this to FACTS_bias_scan or similar, I suppose.

More importantly, yes, exactly, in our mind, it is entirely possible that the user would run multiple metrics after fitting! Firstly, because fitting takes much longer to run, while the metrics run almost instantaneously. Secondly, because there are several metrics, and with a few extra parameters; so, we would expect (at least some) users to choose a few different overall parameterizations, based on the metrics' descriptions we give in the demo (or in the paper) and domain intuition, and inspect their results. These could very well yield different unfair subgroups.

Due to this, I think it best to keep the class, and of course implement an additional wrapper function as you suggest.

This makes sense, thanks.

@phantom-duck
Copy link
Contributor Author

phantom-duck commented Nov 22, 2023

@hoffmansc Hello again!

Ok, that is great, then I will add the wrapper function and get back to you!

As for the name of bias_scan, I do not have something much better in mind. The original intention was to match the name of the MDSS function, but in a separate class. Perhaps find_most_biased_subgroups or something along those lines would be more informative? Maybe I will think it a bit more. Nonetheless, FACTS_bias_scan also seems perfectly ok to me. I leave it up to you.

They define at one point a global warning filter, which then influences
all Deprecation warnings.

Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
easy2hard_name_map was a dictionary with aliases to metric names, used
in the wrapper class of the method, FACTS. Being unnecessary and
slightly confusing, it was removed, and the names changed accordingly.

Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Added the function FACTS_bias_scan, as an alternative API, closer to
the API of the other detectors.

Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
@phantom-duck
Copy link
Contributor Author

@hoffmansc The wrapper function is now implemented! I have also added a small snippet of example invocation in the FACTS_demo notebook. Check it out if you please, and inform me of any other changes that need to be made.

PS: for now I have named it FACTS_bias_scan, but as I said other suggestions are more than welcome.

Several points of the final output of the method were not clear, e.g. why the unfairness score equals a certain value

Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
@phantom-duck
Copy link
Contributor Author

phantom-duck commented Feb 8, 2024

@hoffmansc @rahulnair23 Hello again to both! Just a quick reminder, whenever you have a chance, to review the latest updates in the pull request and let me know of any necessary changes. We are always at your disposal for anything further.

PS: albeit a little late, have a healthy, happy, and fruitful 2024! :)

@rahulnair23
Copy link

Looks good to me.

@rahulnair23
Copy link

Hi @hoffmansc - is this good to merge?

Copy link
Collaborator

@hoffmansc hoffmansc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than these tiny things, it looks good to me too.

aif360/sklearn/detectors/__init__.py Outdated Show resolved Hide resolved
aif360/sklearn/detectors/__init__.py Outdated Show resolved Hide resolved
Specifically, re-exported the function from sklearn.detectors.__init__

Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
@phantom-duck
Copy link
Contributor Author

@hoffmansc

Hello! That is great, thank you!

I have addressed your small comments. I believe everything should be okay, but as I mentioned above for the 3rd one, perhaps you should take a glimpse to make sure it is sufficient.

Other than that, I have one last remark which I would like to ask you about. The new wrapper method FACTS_bias_scan currently lacks any documentation. Until now I assumed this was ok because it simply wraps the FACTS class and its methods. Is this true, or should we add a little something?

@hoffmansc
Copy link
Collaborator

Other than that, I have one last remark which I would like to ask you about. The new wrapper method FACTS_bias_scan currently lacks any documentation. Until now I assumed this was ok because it simply wraps the FACTS class and its methods. Is this true, or should we add a little something?

Yes, that's a good point. It would be great if you can add that. It should basically be copy-paste. Thanks!

Signed-off-by: phantom-duck <37215464+phantom-duck@users.noreply.github.com>
@phantom-duck
Copy link
Contributor Author

phantom-duck commented Mar 10, 2024

@hoffmansc Hello again!

This is also now done, so I believe everything should be in order. We are looking forward to the addition of our contribution to the aif360 code base! Of course, we are still at your disposal for anything further in the meantime.

@hoffmansc hoffmansc merged commit a4d18f2 into Trusted-AI:main Mar 11, 2024
9 checks passed
@phantom-duck
Copy link
Contributor Author

phantom-duck commented Mar 15, 2024

@hoffmansc @rahulnair23

Hello again. Firstly, thank you both very much for your guidance and cooperation in bringing our contribution to the toolkit!

We also have one question. We would like to be able to provide links to the FACTS section from the official documentation of aif360. It will be added there eventually right? Would it be possible for you to give an estimation as to when will this happen?

Thank you again very much for everything! :)

@phantom-duck phantom-duck deleted the master branch April 1, 2024 11:59
@hoffmansc
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add implementation for Fairness Aware Counterfactuals for Subgroups
4 participants