# Abolish Everything!: Fairness in Ratings

## Introduction

[**Abolish Everything!**](https://nebula.tv/abolish) is a comedy show streaming on Nebula. The platform describes the show thusly: *On Abolish Everything!, comedian-abolitionists roast a pet peeve they want banned from society, while a panel of improvisers dubbed “The Political Establishment” argue on behalf of the status quo. The audience chooses their champion each episode, each of whom will come back to compete in the season finale.*

The formula of each episode is the following:
* The host, Chandler Dean, introduces the show and presents eight invited comedians, four of them as "abolitionists" and four of them as "political establishment members". (For the finale, there are instead eight "abolitionists".)
* Chandler Dean, standing in front of a lectern, next to a screen, gives a 5-minute comedic presentation about something he would like to "abolish".
* The "political establishment" and Chandler Dean banter for 5 minutes about his presentation’s subject.
* Chandler Dean introduces a guest "abolitionist", who gives their own 5-minute presentation followed by 5 minutes of banter with the "political establishment". This is repeated until every "abolitionist" has had their turn.
* All "abolitionists" appear in front of the audience and Chandler Dean asks the audience to select their champion.
* The audience’s two top choices are determined by the volume of their clapping and cheering for each "abolitionist".
* The audience is asked to clap and cheer again for their two top choices. The winner is determined by the volume of their clapping and cheering. On every episode except the finale, the winner is invited to the finale.

While watching the show, I wondered if this way of selecting a winner was fair. My main concern was that the order of presentations would affect the audience’s selection. Would the first presentation have a bigger impact on them? With the latest presentations fresher in their memories, would audience members be more likely to select one of the last two "abolitionists" as their champion? Now that the finale has been released, I decided to check what the evidence says.

My **hypothesis** is the following: **the odds of a given abolitionist being declared the winner of an episode of Abolish Everything! depends on the order in which the presentations where given**.

You may read the *Data* and *Testing the Null Hypothesis* sections if you are interested in the details or go directly to the [*Results*](#results) section where I discuss my findings.

## Data

Here is a list of the episodes, subjects, abolitionists, runner-ups and winners:

* **Episode 1** speakerphones (Alice Morales), people not holding doors open (Liz Hynes), varmints (Ben Doyle, runner-up), answering "what’s up?" with "not much" (Ikechukwu Ufomadu, winner)
* **Episode 2** beeps (Annie Rauwerda, winner), stretching (Kyle Gordon), not wishing people happy birthday at their birthday party (Rima Parikh), dentistry (Adam Chase, runner-up)
* **Episode 3** national athems at sporting events (Amy Muller, winner), Cybertrucks (Randall Otis, runner-up), her boyfriend wanting to name her kid Jazz (Chan Bennett), harmful porn (Dan Toomey)
* **Episode 4** confidence (Josh Gondelman), planes (Ena Da, runner-up), Liquid Death (Jeremy Kaplowitz), "getting involved in hoopla" (Augusta Chapman, winner)
* **Episode 5** English (Lucas Arnold), slow walkers (Maggie Mae Fish), Tiktok couples (Vannessa Jackson, winner), his girlfriend telling him to turn off the big light (Jonathan van Halem, runner-up)
* **Episode 6** hot medical doctors (Sila Pulh), sunscreen (Dorian Debose, runner-up), lecterns (Matt Krol), mascots that are people (Maeve Dunigan)
* **Episode 7** her twin sister also having a stroke (Kenice Mobley), reaction creators (Foreign Man in a Foreign Land), Meryl Streep naming her daughter Mamie (Henry Block, winner), Uber drivers slamming his penis in car doors for playing the harmonica (Ryan Ciecwisz, runner-up)
* **Episode 8** sports betting except for himself (Michael Kandel), men not looking for a serious relationship kissing them on the forehead (Carson Olshansky), not letting him buy unnecessary school supplies from Staples (Graham Techler, runner-up), corporeal forms (Charu Sinha, winner)
* **Finale** being too cool (Augusta Chapman), only taking a little piece of cake at a party (Amy Muller), young adults sections in bookstore (Vannessa Jackson),  Earth’s moon (Ikechukwu Ufomadu), songs containing instructions (Henri Block), acronyms and initialisms (Annie Rauwerda), cars (Charu Sinha, runner-up), winging it (Maeve Dunigan, winner)

I will create a DataFrame to store the data relevant to this analysis.

In [84]:
# Import the necessary libraries
import scipy.stats as stats
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [86]:
data = {
    "episode": [1, 2, 3, 4, 5, 6, 7, 8, 9], #episode 9 is listed as "finale" on Nebula
    "abolitionists": [4, 4, 4, 4, 4, 4, 4, 4, 8],
    "runner-up":[3, 4, 2, 2, 4, 2, 4, 3, 7],
    "winner": [4, 1, 1, 4, 3, 4, 3, 4, 8] #corresponds to the number of the presenter who won, starting from 1
}
df = pd.DataFrame(data)
df

Unnamed: 0,episode,abolitionists,runner-up,winner
0,1,4,3,4
1,2,4,4,1
2,3,4,2,1
3,4,4,2,4
4,5,4,4,3
5,6,4,2,4
6,7,4,4,3
7,8,4,3,4
8,9,8,7,8


## Testing the Null Hypothesis

### Establishing the Null Hypothesis, Significance Level and Test

In order to determine if my hypothesis that **the odds of a given abolitionist being declared the winner of an episode of Abolish Everything! depends on the order in which the presentations where given** is correct, I will enunciate a [null hypothesis](https://researchbasics.education.uconn.edu/null-and-alternative-hypotheses/) and attempt to disprove it.

Null hypothesis: **the odds of a given abolitionist being declared the winner of an episode are not affected by the order in which the presentations where given**.

The **significance level** (α, alpha) of an hypothesis test is set depending on the field of study and the acceptable risk of rejecting a correct null hypothesis. For instance, medical studies typically set very small significance levels.

Considering the low risk posed by incorrectly accusing a comedy show’s manner of declaring a winner of being unfair and the difficulty of attaining a low p-value with a very small sample set, I am setting the significance level at **0.85** (85%). In other words, I will reject my null hypothesis if the **p-value** from the test is **inferior or equal to 0.15**.

Many statistical tests exist. Since my null hypothesis implies that every single position in the presentation order should correspond to the same odds of winning, I will test it the same way I would test the hypothesis that a dice is perfectly balanced: with a **chi-squared test**.

### Adjusting for the Finale

The season finale featured 8 presentations instead of 4. To match the rest of the dataset, I have decided to divide the finale’s presentations into pairs. The first two presentations are pair 1, the next two are pair 2, and so on. Therefore the winner and runner-up values of that particular episode can be integer values from 1 to 4.

In [143]:
def pair(pres_order):
    return (pres_order + 1) // 2

df['runner-up'] = df.apply(lambda row: pair(row['runner-up']) if row['abolitionists'] == 8 else row['runner-up'], axis=1)
df['winner'] = df.apply(lambda row: pair(row['winner']) if row['abolitionists'] == 8 else row['winner'], axis=1)
df

Unnamed: 0,episode,abolitionists,runner-up,winner
0,1,4,3,4
1,2,4,4,1
2,3,4,2,1
3,4,4,2,4
4,5,4,4,3
5,6,4,2,4
6,7,4,4,3
7,8,4,3,4
8,9,8,4,4


### Test

I will start by creating a Numpy array listing the number of winners per position in the presentation order.

In [161]:
num_pos = 4 # number of possible positions

In [171]:
winner_counts = df['winner'].value_counts().to_numpy()
winner_counts

array([5, 2, 2], dtype=int64)

I’ll add a zero to the array for any position that never won a show.

In [173]:
winner_counts = np.pad(winner_counts, (0, max(0, num_pos - len(winner_counts))), mode='constant', constant_values=0)
winner_counts

array([5, 2, 2, 0], dtype=int64)

I will need to compare these counts to the expected counts (expected counts correspond to a situation in which the null hypothesis perfectly fits the data), so I’ll create a Numpy array for them.

In [186]:
expected_winner_counts = ([winner_counts.sum() * (1 / num_pos)]) * num_pos
expected_winner_counts

[2.25, 2.25, 2.25, 2.25]

Now, I can perform a **chi-squared goodness of fit test**.

In [195]:
winner_chi2, winner_p = stats.chisquare(winner_counts, expected_winner_counts)

print(f"Chi-square statistic: {winner_chi2:.4f}")
print(f"P-value: {winner_p:.4f}")

Chi-square statistic: 5.6667
P-value: 0.1290


Since I set my significance level to 0.85 and 1 - 0.85 is equal to 0.15, which is larger than my p-value of 0.1290, **I** can **reject** my **null hypothesis** and thus conclude that my initial hypothesis, "**the odds of a given abolitionist being declared the winner of an episode of Abolish Everything! depends on the order in which the presentations where given**", is correct.

### Runner-Up Test

Let’s try the same thing for runner-ups.

My hypothesis: "the odds of a given abolitionist being the runner-up to the winner in an episode of Abolish Everything! depends on the order in which the presentations where given"

The null hypothesis: "the odds of a given abolitionist being declared the winner of an episode are not affected by the order in which the presentations where given"

In [208]:
ru_counts = df['runner-up'].value_counts().to_numpy()
ru_counts = np.pad(ru_counts, (0, max(0, num_pos - len(ru_counts))), mode='constant', constant_values=0)
ru_counts

array([4, 3, 2, 0], dtype=int64)

In [212]:
expected_ru_counts = ([ru_counts.sum() * (1 / num_pos)]) * num_pos

ru_chi2, ru_p = stats.chisquare(ru_counts, expected_ru_counts)

print(f"Chi-square statistic: {ru_chi2:.4f}")
print(f"P-value: {ru_p:.4f}")

Chi-square statistic: 3.8889
P-value: 0.2737


The p-value of 0.2737 is much higher than 0.15, which means that I can not reject the null hypothesis.

## Results

### Test Results

I have determined that my hypothesis is correct, with a p-value of 0.1290. **The odds of winning Abolish Everything! depend on the order of presentations.**

### Possible Biases

1. I picked a relatively **low significance level**. I felt like I had to, because of the **small sample size**. Still, a p-value of 0.1290 indicates that there is a 12.90% chance that the game is completely fair and that the first season just happened to randomly see more abolitionists who presented last win.
2. It is possible that, rather than the odds of winning depending on the order of presentations, **both** the **factors** I based my analysis on, the odds of winning and the order of presentation are **dependant on a third factor I did not take into account**.
3. My **personal opinion** may have affected my perception of the results of my statistical analysis. In fact, this entire report may or may not be my way of explaining why my favorite presentations did not always win their respective shows when they CLEARLY SHOULD HAVE. :P

### Suggestion for a Different and Hopefully Fairer System

I believe that evaluating each candidate after their presentation and banter with the establishment could make the show fairer. A way of doing this would be to provide a link and/or project a QR code and ask the attendance to access it to rate the abolitionist. This could happen on a dedicated app or website or even on an existing online polling or form tool, like Google Forms.

# Conclusion

The team behind Abolish Everything! should consider changing their way of determining winners as there is a high chance that their current method is unfair. That said, wether they do or not, I encourage fans of stand-up comedy who are subscribed or are thinking of subscribing to Nebula to [check out the show](https://nebula.tv/abolish).

I am thankful for the moments of laughter it procured and the comedians it made me discover. I intend to watch season 2.

# References

* [Abolish Everything! on Nebula](https://nebula.tv/abolish)
* [Goodness of fit on Wikipedia](https://en.wikipedia.org/wiki/Goodness_of_fit)
* [Null and Alternative Hypotheses on Educational Research Basics by Del Siegle](https://researchbasics.education.uconn.edu/null-and-alternative-hypotheses/)