Multidimensional subset scanning (MDSS) for bias in classifiers #238

aknvictor · 2021-04-20T17:13:48Z

This PR includes implementation of bias scan: Identifying Significant Predictive Bias in Classifiers" https://arxiv.org/abs/1611.08292 as a classification metric and example notebooks.

The underlying optimization method MDSS can be used with both parametric (eg. Bernoulli and Poisson) and non-parametric (eg. Berk Jones) scoring functions. All three are implemented in this PR.

…ias in classifiers. Includes Bernoulli, Poisson and Berk Jones scoring functions, and example notebooks. Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>

lgtm-com · 2021-04-20T18:14:00Z

This pull request introduces 2 alerts when merging d4dd16c into 746e763 - view on LGTM.com

new alerts:

1 for First parameter of a method is not named 'self'
1 for Unused local variable

Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>

nrkarthikeyan

Looks great. Could you please look at the minor comments and address them (mostly documentation related)

aif360/metrics/mdss/ScoringFunctions/BerkJones.py

aif360/metrics/mdss/ScoringFunctions/ScoringFunction.py

nrkarthikeyan · 2021-06-01T03:10:27Z

aif360/metrics/mdss/ScoringFunctions/optim.py

+
+def bisection_q_mle(score_function: ScoringFunction, observed_sum: float, probs: np.array, **kwargs):
+    """
+    Computes the q which maximizes score (q_mle).


The alternative hypothesis which the MDSS optimization finds evidence for is that there is a multiplicative factor q in the odds for a subgroup of the data. For a subset of records, we can obtain the MLE of q (which is defined uniquely depending on the scoring function) and use this to score that subset.

aif360/metrics/mdss_classification_metric.py

nrkarthikeyan · 2021-06-01T03:14:48Z

aif360/sklearn/metrics/metrics.py

 from sklearn.metrics import make_scorer as _make_scorer, recall_score
 from sklearn.metrics import multilabel_confusion_matrix
 from sklearn.neighbors import NearestNeighbors
 from sklearn.utils import check_X_y
 from sklearn.exceptions import UndefinedMetricWarning

 from aif360.sklearn.utils import check_groups
-
+from aif360.metrics.mdss.ScoringFunctions import Bernoulli


Why only Bernoulii is imported? What about Berk Jones and Poisson? Are they not incorporated yet?

The original paper on Bias Scan of classifiers used Bernoulli. For a straightforward application of MDSS as a classification metric Bernoulli makes the most sense. Advanced users will be able to use other scoring functions as they please.

Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>

Multidimensional subset scanning (MDSS) for bias in classifiers

Victor Akinwande added 2 commits April 20, 2021 10:52

Implements multidimensional subset scanning and uses it to identify b…

9860dc7

…ias in classifiers. Includes Bernoulli, Poisson and Berk Jones scoring functions, and example notebooks. Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>

Implements multidimensional subset scanning and uses it to identify b…

d4dd16c

…ias in classifiers. Includes Bernoulli, Poisson and Berk Jones scoring functions, and example notebooks. Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>

Victor Akinwande added 2 commits April 20, 2021 13:17

fixed lgtm and lint error

4950254

Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>

changed scan method from bias_scan to scan

5ad83b8

Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>

nrkarthikeyan requested a review from hoffmansc May 3, 2021 13:26

fixed possible divide by 0 error in berkjones scoring function

c498806

Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>

nrkarthikeyan approved these changes Jun 1, 2021

View reviewed changes

improved documentation

73d9132

Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>

aknvictor force-pushed the master branch from 6c25f44 to 73d9132 Compare June 3, 2021 19:55

Victor Akinwande added 2 commits June 3, 2021 17:15

improved documentation

0bd7dae

Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>

fixed typo and assertion error

f9154e9

Signed-off-by: Victor Akinwande <victor.akinwande1@ibm.com>

nrkarthikeyan merged commit a6c3942 into Trusted-AI:master Jun 4, 2021

Illia-Kryvoviaz pushed a commit to Illia-Kryvoviaz/AIF360 that referenced this pull request Jun 7, 2023

Merge pull request Trusted-AI#238 from Viktour19/master

7c9c22f

Multidimensional subset scanning (MDSS) for bias in classifiers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multidimensional subset scanning (MDSS) for bias in classifiers #238

Multidimensional subset scanning (MDSS) for bias in classifiers #238

aknvictor commented Apr 20, 2021

lgtm-com bot commented Apr 20, 2021

nrkarthikeyan left a comment

nrkarthikeyan Jun 1, 2021

aknvictor Jun 4, 2021

nrkarthikeyan Jun 1, 2021

aknvictor Jun 4, 2021

Multidimensional subset scanning (MDSS) for bias in classifiers #238

Multidimensional subset scanning (MDSS) for bias in classifiers #238

Conversation

aknvictor commented Apr 20, 2021

lgtm-com bot commented Apr 20, 2021

nrkarthikeyan left a comment

Choose a reason for hiding this comment

nrkarthikeyan Jun 1, 2021

Choose a reason for hiding this comment

aknvictor Jun 4, 2021

Choose a reason for hiding this comment

nrkarthikeyan Jun 1, 2021

Choose a reason for hiding this comment

aknvictor Jun 4, 2021

Choose a reason for hiding this comment