DOC Changing the dataset from Adult to Diabetes Hospital in Getting S…

…tarted (#1310) Co-authored-by: Hilde Weerts <24417440+hildeweerts@users.noreply.github.com> Co-authored-by: Roman Lutz <romanlutz13@gmail.com> Co-authored-by: MiroDudik <mdudik@gmail.com> Co-authored-by: Richard Edgar (Microsoft) <riedgar@microsoft.com>
fairlearn · Apr 4, 2024 · 23789b2 · 23789b2
1 parent ee71adc
commit 23789b2
Show file tree

Hide file tree

Showing 5 changed files with 151 additions and 94 deletions.
diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -68,7 +68,8 @@ is about binary classification, but we similarly support regression.
 Prerequisites
 ^^^^^^^^^^^^^
 
-In order to run the code samples in the Quickstart tutorial, you need to install the following dependencies:
+In order to run the code samples in the Quickstart tutorial, you need to 
+install the following dependencies:
 
 .. code-block:: bash
 
@@ -77,35 +78,52 @@ In order to run the code samples in the Quickstart tutorial, you need to install
 Loading the dataset
 ^^^^^^^^^^^^^^^^^^^
 
-For this example we use the
-`UCI adult dataset <https://archive.ics.uci.edu/ml/datasets/Adult>`_ where the
-objective is to predict whether a person makes more (label 1) or less (0)
-than $50,000 a year.
+For this example, we use a `clinical dataset <https://archive.ics.uci.edu/dataset/296/diabetes+130-us+hospitals+for+years+1999-2008>`_ 
+of hospital re-admissions over a ten-year period (1998-2008) for 
+diabetic patients across 130 different hospitals in the U.S. This scenario 
+builds upon prior research on how racial disparities impact health care 
+resource allocation in the U.S. For an in-depth analysis of this dataset,
+review the `SciPy tutorial <https://github.com/fairlearn/talks/tree/main/2021_scipy_tutorial>`_
+that the Fairlearn team presented in 2021.
+
+We will use machine learning to predict whether an individual in the dataset 
+is readmitted to the hospital within 30 days of hospital release. 
+A hospital readmission within 30 days can be viewed as a proxy that the 
+patients needed more assistance at the release time.
+In the next section, we build a classification model to accomplish the 
+prediction task.
 
 .. doctest:: quickstart
 
     >>> import numpy as np
     >>> import pandas as pd
     >>> import matplotlib.pyplot as plt
-    >>> from fairlearn.datasets import fetch_adult
-    >>> data = fetch_adult(as_frame=True)
-    >>> X = pd.get_dummies(data.data)
-    >>> y_true = (data.target == '>50K') * 1
-    >>> sex = data.data['sex']
-    >>> sex.value_counts()
-    sex
-    Male      32650
-    Female    16192
+    >>> from fairlearn.datasets import fetch_diabetes_hospital
+    >>> data = fetch_diabetes_hospital(as_frame=True)
+    >>> X = data.data
+    >>> X.drop(columns=["readmitted", "readmit_binary"], inplace=True)
+    >>> y = data.target
+    >>> X_ohe = pd.get_dummies(X)
+    >>> race = X['race']
+    >>> race.value_counts()
+    race
+    Caucasian          76099
+    AfricanAmerican    19210
+    Unknown             2273
+    Hispanic            2037
+    Other               1506
+    Asian                641
     Name: count, dtype: int64
 
-.. figure:: auto_examples/images/sphx_glr_plot_quickstart_selection_rate_001.png
-    :target: auto_examples/plot_quickstart_selection_rate.html
+.. figure:: _images/sphx_glr_plot_quickstart_counts_001.png
+    :target: auto_examples/plot_quickstart_counts.html
     :align: center
 
+
 Evaluating fairness-related metrics
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Firstly, Fairlearn provides fairness-related metrics that can be compared
+First, Fairlearn provides fairness-related metrics that can be compared
 between groups and for the overall population. Using existing metric
 definitions from
 `scikit-learn <https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics>`_
@@ -115,38 +133,57 @@ we can evaluate metrics for subgroups within the data as below:
     :options:  +NORMALIZE_WHITESPACE
 
     >>> from fairlearn.metrics import MetricFrame
-    >>> from sklearn.metrics import accuracy_score
+    >>> from sklearn.metrics import accuracy_score, balanced_accuracy_score
     >>> from sklearn.tree import DecisionTreeClassifier
-    >>>
+    >>> from sklearn.model_selection import train_test_split
+    >>> np.random.seed(42)  # set seed for consistent results
+    >>> X_train, X_test, y_train, y_test, A_train, A_test = train_test_split(X_ohe, y, race, random_state=123)
     >>> classifier = DecisionTreeClassifier(min_samples_leaf=10, max_depth=4)
-    >>> classifier.fit(X, y_true)
+    >>> classifier.fit(X_train, y_train)
     DecisionTreeClassifier(...)
-    >>> y_pred = classifier.predict(X)
-    >>> mf = MetricFrame(metrics=accuracy_score, y_true=y_true, y_pred=y_pred, sensitive_features=sex)
+    >>> y_pred = (classifier.predict_proba(X_test)[:,1] >= 0.1)
+    >>> mf = MetricFrame(metrics=accuracy_score, y_true=y_test, y_pred=y_pred, sensitive_features=A_test)
     >>> mf.overall
-    0.8443...
+    0.514...
     >>> mf.by_group
-    sex
-    Female    0.9251...
-    Male      0.8042...
+    race
+    AfricanAmerican    0.530935
+    Asian              0.658683
+    Caucasian          0.503535
+    Hispanic           0.612524
+    Other              0.591549
+    Unknown            0.574576
     Name: accuracy_score, dtype: float64
 
-Additionally, Fairlearn has lots of other standard metrics built-in, such as
-selection rate, i.e., the percentage of the population which have '1' as
-their label:
+Note that our decision threshold for positive predictions is 0.1.
+In practice, this threshold would be driven by risk or capacity 
+considerations. For this example, we set the threshold based on the risk 
+of readmission. The threshold of 0.1 corresponds to saying that a
+10% risk of readmission is viewed as sufficient for referral to a 
+post-discharge care program. 
+Fairlearn has many standard metrics built-in, such as
+false negative rate, i.e., the rate of occurrence of negative classifications
+when the true value of the outcome label is positive. 
+In the context of this dataset, the false positive rate captures the 
+individuals who in reality would be readmitted to the hospital, but 
+the model does not predict that outcome.
 
 .. doctest:: quickstart
     :options:  +NORMALIZE_WHITESPACE
 
-    >>> from fairlearn.metrics import selection_rate
-    >>> sr = MetricFrame(metrics=selection_rate, y_true=y_true, y_pred=y_pred, sensitive_features=sex)
-    >>> sr.overall
-    0.1638...
-    >>> sr.by_group
-    sex
-    Female    0.0635...
-    Male      0.2135...
-    Name: selection_rate, dtype: float64
+    >>> from fairlearn.metrics import false_negative_rate
+    >>> mf = MetricFrame(metrics=false_negative_rate, y_true=y_test, y_pred=y_pred, sensitive_features=A_test)
+    >>> mf.overall
+    0.309...
+    >>> mf.by_group
+    race
+    AfricanAmerican    0.296089
+    Asian              0.500000
+    Caucasian          0.308555
+    Hispanic           0.307692
+    Other              0.333333
+    Unknown            0.420000
+    Name: false_negative_rate, dtype: float64
 
 Fairlearn also allows us to quickly plot these metrics from the
 :class:`fairlearn.metrics.MetricFrame`
@@ -156,7 +193,7 @@ Fairlearn also allows us to quickly plot these metrics from the
     :start-after: # Analyze metrics using MetricFrame
     :end-before: # Customize plots with ylim
 
-.. figure:: auto_examples/images/sphx_glr_plot_quickstart_001.png
+.. figure:: _images/sphx_glr_plot_quickstart_001.png
     :target: auto_examples/plot_quickstart.html
     :align: center
 
@@ -167,35 +204,45 @@ Mitigating disparity
 If we observe disparities between groups we may want to create a new model
 while specifying an appropriate fairness constraint. Note that the choice of
 fairness constraints is crucial for the resulting model, and varies based on
-application context. If selection rate is highly relevant for fairness in this
-contrived example, we can attempt to mitigate the observed disparity using the
-corresponding fairness constraint called Demographic Parity. In real world
+application context. Since both false positives and false negatives are relevant for fairness in this
+hypothetical example, we can attempt to mitigate the observed disparity using the
+fairness constraint called Equalized Odds, which bounds disparities in both types of error. In real world
 applications we need to be mindful of the sociotechnical context when making
 such decisions. The Exponentiated Gradient mitigation technique used fits the
-provided classifier using Demographic Parity as the objective, leading to
-a vastly reduced difference in selection rate:
+provided classifier using Equalized Odds as the constraint and a suitably weighted Error Rate
+as the objective, leading to a vastly reduced difference in accuracy:
 
 .. doctest:: quickstart
     :options:  +NORMALIZE_WHITESPACE
 
-    >>> from fairlearn.reductions import DemographicParity, ExponentiatedGradient
-    >>> np.random.seed(0)  # set seed for consistent results with ExponentiatedGradient
-    >>>
-    >>> constraint = DemographicParity()
+    >>> from fairlearn.reductions import ErrorRate, EqualizedOdds, ExponentiatedGradient
+    >>> objective = ErrorRate(costs={'fp': 0.1, 'fn': 0.9})
+    >>> constraint = EqualizedOdds(difference_bound=0.01)
     >>> classifier = DecisionTreeClassifier(min_samples_leaf=10, max_depth=4)
-    >>> mitigator = ExponentiatedGradient(classifier, constraint)
-    >>> mitigator.fit(X, y_true, sensitive_features=sex)
+    >>> mitigator = ExponentiatedGradient(classifier, constraint, objective=objective)
+    >>> mitigator.fit(X_train, y_train, sensitive_features=A_train)
     ExponentiatedGradient(...)
-    >>> y_pred_mitigated = mitigator.predict(X)
-    >>>
-    >>> sr_mitigated = MetricFrame(metrics=selection_rate, y_true=y_true, y_pred=y_pred_mitigated, sensitive_features=sex)
-    >>> sr_mitigated.overall
-    0.1661...
-    >>> sr_mitigated.by_group
-    sex
-    Female    0.1552...
-    Male      0.1715...
-    Name: selection_rate, dtype: float64
+    >>> y_pred_mitigated = mitigator.predict(X_test)
+    >>> mf_mitigated = MetricFrame(metrics=accuracy_score, y_true=y_test, y_pred=y_pred_mitigated, sensitive_features=A_test)
+    >>> mf_mitigated.overall
+    0.5251...
+    >>> mf_mitigated.by_group
+    race
+    AfricanAmerican    0.524358
+    Asian              0.562874
+    Caucasian          0.525588
+    Hispanic           0.549902
+    Other              0.478873
+    Unknown            0.511864
+    Name: accuracy_score, dtype: float64
+
+Note that :class:`ExponentiatedGradient` does not have a `predict_proba`
+method, but we can adjust the target decision threshold by specifying
+(possibly unequal) costs for false positives and false negatives.
+In our example we use the cost of 0.1 for false positives and 0.9 for false negatives.
+Without fairness constraints, this would exactly correspond to
+referring patients with the readmission risk of 10% or higher
+(as we used earlier).
 
 
 What's next?

diff --git a/docs/user_guide/installation_and_version_guide/v0.10.0.rst b/docs/user_guide/installation_and_version_guide/v0.10.0.rst
@@ -4,4 +4,4 @@ v0.10.0
 * Added bootstrapping to :class:`MetricFrame`, along with a new section
   in the user guide
 * Added intersectionality in mental health care example notebook.
-* Added user guide for :class:`fairlearn.postprocessing.ThresholdOptimizer`
+* Added user guide for :class:`fairlearn.postprocessing.ThresholdOptimizer`
diff --git a/examples/plot_quickstart.py b/examples/plot_quickstart.py
@@ -8,8 +8,9 @@
 """
 
 import pandas as pd
-from fairlearn.datasets import fetch_adult
+from fairlearn.datasets import fetch_diabetes_hospital
 from sklearn.metrics import accuracy_score, precision_score
+from sklearn.model_selection import train_test_split
 from sklearn.tree import DecisionTreeClassifier
 
 # %%
@@ -21,14 +22,19 @@
     selection_rate,
 )
 
-data = fetch_adult()
-X = pd.get_dummies(data.data)
-y_true = (data.target == ">50K") * 1
-sex = data.data["sex"]
+data = fetch_diabetes_hospital(as_frame=True)
+X = data.data
+X.drop(columns=["readmitted", "readmit_binary"], inplace=True)
+y_true = data.target
+X_ohe = pd.get_dummies(X)
+race = X['race']
+
+X_train, X_test, y_train, y_test, \
+    A_train, A_test = train_test_split(X_ohe, y_true, race, random_state=123)
 
 classifier = DecisionTreeClassifier(min_samples_leaf=10, max_depth=4)
-classifier.fit(X, y_true)
-y_pred = classifier.predict(X)
+classifier.fit(X_train, y_train)
+y_pred = classifier.predict(X_test)
 
 # Analyze metrics using MetricFrame
 metrics = {
@@ -40,7 +46,7 @@
     "count": count,
 }
 metric_frame = MetricFrame(
-    metrics=metrics, y_true=y_true, y_pred=y_pred, sensitive_features=sex
+    metrics=metrics, y_true=y_test, y_pred=y_pred, sensitive_features=A_test
 )
 metric_frame.by_group.plot.bar(
     subplots=True,

diff --git a/examples/plot_quickstart_counts.py b/examples/plot_quickstart_counts.py
@@ -0,0 +1,29 @@
+# Copyright (c) Microsoft Corporation and Fairlearn contributors.
+# Licensed under the MIT License.
+
+"""
+=============
+Value counts
+=============
+"""
+from fairlearn.datasets import fetch_diabetes_hospital
+import matplotlib.pyplot as plt
+
+
+# %%
+fig, ax = plt.subplots()
+
+data = fetch_diabetes_hospital(as_frame=True)
+X = data.data
+X.drop(columns=["readmitted", "readmit_binary"], inplace=True)
+y_true = data.target
+race = X['race']
+
+df = race.value_counts().reset_index()
+
+ax.bar(df["race"], df["count"])
+ax.set_title('Counts by race')
+ax.tick_params(axis='x', labelrotation=45)
+
+plt.tight_layout()
+plt.show()
diff --git a/examples/plot_quickstart_selection_rate.py b/examples/plot_quickstart_selection_rate.py