---
title: "Limits of the Quantitative Approach to Bias and Fairness"
bibliography: refs.bib
---

## Introduction

In a thought-provoking 2022 speech at Princeton University, Arvind Narayanan asserted that "currently quantitative methods are primarily used to justify the status quo. I would argue that they do more harm than good" [@narayanan2022limits, 25]. This provocative claim challenges foundational assumptions about how researchers study and address discrimination in algorithmic systems. As data-driven systems increasingly influence critical decisions across society—from hiring to lending to criminal justice—the methods we use to evaluate their fairness have profound consequences. This essay examines Narayanan's position in conversation with other scholarly perspectives to assess whether quantitative approaches to discrimination truly cause more harm than good.

By analyzing Narayanan's critique alongside key scholarly works including _Fairness and Machine Learning_ [@barocasFairnessMachineLearning2023], _Data Feminism_ [@dignazioDataFeminism2023], and other contributions to the field, I will evaluate the strengths and limitations of quantitative methods in addressing algorithmic bias. Through discussion of both successful and disappointing case studies, I will argue that while Narayanan's critique identifies crucial shortcomings in current practice, quantitative methods remain essential tools for fairness work when deployed with appropriate critical awareness and integrated into more holistic frameworks for understanding discrimination.

## Narayanan's Position

Narayanan's critique of quantitative methods centers on how they can obscure rather than reveal discrimination. He outlines seven key limitations in current quantitative approaches to discrimination:

First, the null hypothesis in quantitative research typically assumes no discrimination exists, placing the burden of proof on those claiming discrimination. This framing is "not a logical inevitability... [but] a choice" [@narayanan2022limits, 7] that inherently favors the status quo. When researchers assume no discrimination exists until proven otherwise, they create a structural barrier to detecting bias.

Second, most quantitative methods rely on snapshot datasets that fail to capture how discrimination compounds over time. Using a mathematical model, Narayanan demonstrates how "a 2.5% difference in quarterly performance reviews" can compound over 20 years to create "a 7-fold difference" in CEO demographics [@narayanan2022limits, 9-10]. Such subtle discrimination falls "far below the threshold that's detectable by quantitative methods" in a single timeframe.

Third, data about minoritized groups are often collected by the very institutions suspected of discrimination. This creates conflicts of interest where organizations control what data are collected and released, potentially hiding their own biases.

Fourth, quantitative research tends to advance by "explaining away discrimination" [@narayanan2022limits, 12]. Academic incentives reward researchers who find omitted variables that can account for disparities, effectively controlling for the attributes that constitute discrimination itself.

Fifth, quantitative approaches often identify the wrong locus of intervention, suggesting that minoritized groups—rather than discriminatory systems—need fixing. Narayanan examines a study of gender pay gaps among Uber drivers that framed the 7% earnings gap as stemming from women's choices rather than systemic factors, while ignoring that female drivers were "2.7 times as likely to drop off the platform" [@narayanan2022limits, 13-14].

Sixth, researchers cling to an objectivity illusion, despite making "at least 10-20 subjective choices" in a typical paper [@narayanan2022limits, 16]. This illusion of neutrality masks how value judgments shape research design, variable selection, and interpretation.

Finally, through performativity, narrow statistical definitions of discrimination become operationalized as the only recognized forms of harm. When these metrics become the basis for policy, they limit what counts as actionable discrimination.

These limitations lead Narayanan to conclude that quantitative methods often "justify racism and excuse inaction" [@narayanan2022limits, 3] by offering technical smokescreens that obscure structural discrimination.

## The Benefits of Quantitative Methods

Despite these limitations, quantitative methods have demonstrated unique value in addressing discrimination when thoughtfully applied. In chapter 3 of _Fairness and Machine Learning_, Barocas, Hardt, and Narayanan detail how formal statistical criteria can detect and mitigate bias in algorithmic systems. They outline several technical fairness notions, including independence (demographic parity), separation (equal error rates), and sufficiency (calibration by group) [@barocasFairnessMachineLearning2023].

One particularly beneficial application referenced by Narayanan himself is Fishbane, Ouss, and Shah's [-@fishbane2020behavioral] study of failure-to-appear rates in the New York City court system. This research exemplifies how quantitative methods can identify concrete intervention points that reduce disparities. In this study, researchers examined why defendants frequently missed court dates for low-level offenses, with particular attention to racial disparities in these rates.

From a technical perspective (Chapter 3), the study employed error rate analysis to measure disparities in court appearance rates across demographic groups. They analyzed patterns in failures to appear, identifying differential impacts on marginalized communities. Their approach relates to what Barocas, Hardt, and Narayanan call separation—ensuring "equal false negative rates" by identifying when different groups experience unequal burdens from the same system [@barocasFairnessMachineLearning2023].

Their findings were revelatory: many defendants missed court dates not due to willful evasion but because of confusing summons forms and lack of reminders. The researchers then tested interventions by redesigning the forms and implementing text message reminders, which "drastically reduced the rate at which people failed to appear in court" [@narayanan2022limits, 24].

From a moral perspective (Chapter 4), this study embodied what Barocas, Hardt, and Narayanan describe as procedural fairness—ensuring all defendants had equal practical ability to comply with court requirements regardless of their resources or background. The intervention also addressed representational fairness by redesigning forms to be understandable regardless of education level or prior system knowledge [@barocasFairnessMachineLearning2023].

Rather than treating failure to appear as evidence of individual moral failing (which might justify harsher penalties), the researchers recognized it as a system design problem—reflecting the moral principle that individuals should not be penalized for navigating poorly designed bureaucratic processes. This shifts the moral framing from individual culpability to systemic responsibility.

This case exemplifies how quantitative methods can productively expose discrimination when they: (1) identify concrete intervention points, (2) focus on system-level rather than individual-level problems, and (3) lead to practical solutions that reduce disparities.

## The Limitations of Quantitative Methods

While some quantitative approaches succeed, others reinforce the problems Narayanan identifies. A particularly disappointing study Narayanan highlights is "The Gender Earnings Gap in the Gig Economy: Evidence from over a Million Rideshare Drivers" [@cook2018gender], which examined the pay gap between male and female Uber drivers.

In technical terms from Chapter 3 of _Fairness and Machine Learning_, the study focused on disparate impact (the 7% earnings difference) while neglecting the more significant disparate treatment that might explain why female drivers were 2.7 times more likely to leave the platform. The researchers attributed the earnings gap solely to three factors: where drivers chose to drive, men's greater experience on the platform, and men's tendency to drive faster.

The authors claimed no gender discrimination existed in the platform's algorithm or rider behavior. But as Narayanan notes, their analysis "explains away discrimination" by treating these factors as neutral preferences rather than potential responses to discriminatory conditions: "part of that is because some neighborhoods aren't safe for women" and "some women face harassment" [@narayanan2022limits, 14].

From a moral perspective (Chapter 4), the study reflects what Barocas, Hardt, and Narayanan call the "difference principle" view of discrimination—treating statistical disparities as morally neutral if they can be explained by apparent preferences or choices. This neglects the broader moral context in which those choices occur. The study treats driver decisions as freely made preferences rather than constrained responses to structural inequities [@barocasFairnessMachineLearning2023].

This exemplifies what @selbst2019fairness call the "framing trap"—where researchers narrowly define problems in ways that make technical solutions seem sufficient. By focusing on measurable earnings rather than broader experiences of discrimination, the study frames fairness as satisfied by statistical parity alone.

D'Ignazio and Klein would identify this as a failure to "consider context" and "examine power"—two key principles of Data Feminism [@dignazioDataFeminism2023]. The researchers' choice to focus on the smaller pay disparity while ignoring the larger retention disparity reveals how quantitative approaches can selectively measure what supports existing power structures while overlooking more significant indicators of systemic problems.

This case demonstrates how quantitative methods, when divorced from context and power analysis, can produce misleading conclusions that "justify the status quo rather than challenge it" [@corbettDaviesMeasureFairness2018]. The technical transparency of numbers creates an illusion of objectivity that masks underlying value judgments about what constitutes discrimination and what counts as evidence.

## Additional Scholarly Perspectives

Three additional scholarly sources provide important perspectives on the tension between quantitative methods and substantive fairness. @corbettDaviesMeasureFairness2018 explore "The Measure and Mismeasure of Fairness" by showing how statistical fairness metrics can produce paradoxical outcomes—situations where satisfying a formal fairness criterion may actually harm the very groups it aims to protect. For example, requiring equal false positive rates across racial groups in pretrial risk assessment might lead to higher overall detention rates for marginalized groups if baseline risk levels differ.

@selbst2019fairness identify five "abstraction traps" in fair ML research where technical fixes fail to account for nuanced social realities. Their insight that technical tools often divorce data from their social origins aligns with Narayanan's critiques of snapshot datasets and decontextualized analysis. They argue that "fairness" becomes dangerous when reduced to a purely technical problem, divorced from the social contexts that give it meaning.

@green2022substantive articulates a crucial distinction between "formal" and "substantive" algorithmic fairness. While formal fairness focuses on satisfying statistical criteria, substantive fairness requires examining whether an algorithm contributes to genuine justice outcomes. Green argues that even mathematically "fair" algorithms can encode and perpetuate injustice if they naturalize existing inequalities. This distinction parallels Narayanan's concern that quantitative methods often identify the wrong locus of intervention.

_Data Feminism_ [@dignazioDataFeminism2023] offers the most comprehensive framework for addressing Narayanan's concerns. Their principles of "examine power," "challenge power," and "consider context" directly respond to the limitations Narayanan identifies. Through examples like the Anti-Eviction Mapping Project (Chapter 5) and their critique of "Big Dick Data" projects that "ignore context, fetishize size, and inflate their technical and scientific capabilities" (Chapter 6, p. 4), they demonstrate how attention to power dynamics can transform quantitative methods from tools of oppression into instruments of liberation.

## Taking a Position

In light of these analyses, I find myself in qualified agreement with Narayanan's assertion that "quantitative methods are primarily used to justify the status quo" and often "do more harm than good" [@narayanan2022limits, 25]. The evidence from case studies and scholarly analyses shows how traditional applications of quantitative methods can indeed mask discrimination rather than expose it.

However, my agreement comes with important qualifications. First, Narayanan himself acknowledges that the problem lies not with quantitative methods per se, but with how they are deployed. He states, "If things were different—if the 79 percent of engineers at Google who are male were specifically trained in structural oppression before building their data systems... then their overrepresentation might be very slightly less of a problem" [@narayanan2022limits, 13]. This suggests that quantitative methods, when embedded in a framework that addresses power and context, could help advance justice.

Second, cases like the court appearance study demonstrate that quantitative methods can identify concrete intervention points for reducing disparities. When paired with a critical understanding of systemic injustice, numbers can provide actionable evidence for meaningful reform.

The path forward, I believe, requires integrating quantitative precision with what @green2022substantive calls "substantive algorithmic fairness"—an approach that goes beyond statistical metrics to consider whether algorithms contribute to substantive justice. This means reversing the null hypothesis to assume discrimination exists unless proven otherwise, collecting longitudinal data that captures compounding effects, focusing on systemic rather than individual-level interventions, engaging directly with affected communities, and treating quantitative evidence as one component in a broader justice framework. By shifting these fundamental assumptions and practices, quantitative methods can become tools for exposing rather than obscuring discrimination.

As D'Ignazio and Klein assert, "the data never, ever 'speak for themselves'" [@dignazioDataFeminism2023, Chapter 6]. Acknowledging this truth allows us to use quantitative methods more responsibly—not as neutral arbiters of truth, but as tools that must be wielded with care, context, and a commitment to challenging rather than reinforcing existing power structures.

## Conclusion

Narayanan's critique of quantitative methods provides a vital caution against simplistic faith in data-driven solutions to discrimination. The limitations he identifies—from problematic null hypotheses to the illusion of objectivity—reveal how seemingly neutral methods can entrench injustice.

Yet certain quantitative approaches, like the court appearance study, demonstrate potential for meaningful reform. The key difference lies not in whether numbers are used, but in how they are contextualized, what questions they seek to answer, and whether they serve to challenge or legitimize existing power structures.

As D'Ignazio and Klein remind us in _Data Feminism_, data work is always political. The question is not whether to use quantitative methods, but how to use them in ways that acknowledge power, consider context, and work toward substantive justice. By approaching quantitative methods with a critical eye and integrating them within a broader commitment to equity, we can transform them from tools that primarily justify the status quo into instruments of lasting change.