Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multidimensional subset scanning (MDSS) for bias in classifiers #238

Merged
merged 8 commits into from
Jun 4, 2021

Conversation

aknvictor
Copy link

This PR includes implementation of bias scan: Identifying Significant Predictive Bias in Classifiers" https://arxiv.org/abs/1611.08292 as a classification metric and example notebooks.

The underlying optimization method MDSS can be used with both parametric (eg. Bernoulli and Poisson) and non-parametric (eg. Berk Jones) scoring functions. All three are implemented in this PR.

Victor Akinwande added 2 commits April 20, 2021 10:52
…ias in classifiers. Includes Bernoulli, Poisson and Berk Jones scoring functions, and example notebooks.

Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>
…ias in classifiers. Includes Bernoulli, Poisson and Berk Jones scoring functions, and example notebooks.

Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>
@lgtm-com
Copy link

lgtm-com bot commented Apr 20, 2021

This pull request introduces 2 alerts when merging d4dd16c into 746e763 - view on LGTM.com

new alerts:

  • 1 for First parameter of a method is not named 'self'
  • 1 for Unused local variable

Victor Akinwande added 2 commits April 20, 2021 13:17
Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>
Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>
Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>
Copy link
Collaborator

@nrkarthikeyan nrkarthikeyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Could you please look at the minor comments and address them (mostly documentation related)


def bisection_q_mle(score_function: ScoringFunction, observed_sum: float, probs: np.array, **kwargs):
"""
Computes the q which maximizes score (q_mle).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is q?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative hypothesis which the MDSS optimization finds evidence for is that there is a multiplicative factor q in the odds for a subgroup of the data. For a subset of records, we can obtain the MLE of q (which is defined uniquely depending on the scoring function) and use this to score that subset.

aif360/metrics/mdss_classification_metric.py Show resolved Hide resolved
aif360/metrics/mdss_classification_metric.py Show resolved Hide resolved
from sklearn.metrics import make_scorer as _make_scorer, recall_score
from sklearn.metrics import multilabel_confusion_matrix
from sklearn.neighbors import NearestNeighbors
from sklearn.utils import check_X_y
from sklearn.exceptions import UndefinedMetricWarning

from aif360.sklearn.utils import check_groups

from aif360.metrics.mdss.ScoringFunctions import Bernoulli
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only Bernoulii is imported? What about Berk Jones and Poisson? Are they not incorporated yet?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original paper on Bias Scan of classifiers used Bernoulli. For a straightforward application of MDSS as a classification metric Bernoulli makes the most sense. Advanced users will be able to use other scoring functions as they please.

Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>
Victor Akinwande added 2 commits June 3, 2021 17:15
Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>
Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>
@nrkarthikeyan nrkarthikeyan merged commit a6c3942 into Trusted-AI:master Jun 4, 2021
Illia-Kryvoviaz pushed a commit to Illia-Kryvoviaz/AIF360 that referenced this pull request Jun 7, 2023
Multidimensional subset scanning (MDSS) for bias in classifiers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants