# "Crushing White Advantage" games

## Time formats

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

data = pd.read_csv('data.csv')


# Filter for "Crushing White Advantage" games
cw_data = data[data["Elo_Dif_Range"] == "Crushing White Advantage"].copy()

# Create a binary outcome column: Loss vs Non-Loss
cw_data["Outcome"] = np.where(cw_data["Result"] == "0-1", "Loss", "Non-Loss")

# Build the contingency table: Time Format x Outcome
contingency_table = pd.crosstab(cw_data["Time_format"], cw_data["Outcome"])

# Display the contingency table
print("Contingency Table (Crushing White Advantage):\n")
print(contingency_table)

# Perform the chi-square test and get expected frequencies
chi2, p_value, dof, expected = chi2_contingency(contingency_table)

# Check the chi-square assumption: all expected counts >= 5
if (expected >= 5).all():
    print("\nAll expected frequencies are >= 5. Proceeding with chi-square test.")
    print(f"Chi-square = {chi2:.4f}, p-value = {p_value:.4f}, degrees of freedom = {dof}")
else:
    print("\nWarning: Some expected frequencies are < 5. Chi-square test assumption violated.")

Contingency Table (Crushing White Advantage):

Outcome      Loss  Non-Loss
Time_format                
 blitz         67       222
 bullet        23       135
 classical     12        47
 rapid         11       232

All expected frequencies are >= 5. Proceeding with chi-square test.
Chi-square = 37.2465, p-value = 0.0000, degrees of freedom = 3


The result tell us that, under a “Crushing White Advantage,” the chance of losing (vs. not losing) does depend on the time format. In other words, losses after a crushing edge are not equally common in Bullet, Blitz, Rapid and Classical: at least one of those formats has a significantly different loss‐rate.

In [2]:
cw = data[data["Elo_Dif_Range"] == "Crushing White Advantage"].copy()
#  Clean up Time_format labels
cw["Time_format_clean"] = cw["Time_format"].str.strip().str.lower()

# Binary time-format: Blitz vs Other
cw["FormatGroup"] = np.where(cw["Time_format_clean"] == "blitz", "Blitz", "Other")

# Binary outcome: Loss vs Non-Loss
cw["Outcome"] = np.where(cw["Result"] == "0-1", "Loss", "Non-Loss")

#  Build the 2×2 contingency table
table = pd.crosstab(cw["FormatGroup"], cw["Outcome"])
print("Contingency table (Crushing White Advantage):\n", table, "\n")

# Chi-square test + check expected counts
chi2, p, dof, expected = chi2_contingency(table)

exp_df = pd.DataFrame(expected, index=table.index, columns=table.columns)
print("Expected counts:\n", exp_df, "\n")

if (expected >= 5).all():
    print(f"χ²({dof}) = {chi2:.4f},  p = {p:.4f}")
else:
    print(" Some expected counts are < 5; χ² may not be valid.")

Contingency table (Crushing White Advantage):
 Outcome      Loss  Non-Loss
FormatGroup                
Blitz          67       222
Other          46       414 

Expected counts:
 Outcome           Loss    Non-Loss
FormatGroup                       
Blitz        43.600801  245.399199
Other        69.399199  390.600801 

χ²(1) = 23.0619,  p = 0.0000


Loss‐rates differ dramatically. When White has a crushing edge, players still lose about 23.2 % of Blitz games versus only 10.0 % of all other time formats.

Highly significant association. The χ² statistic of 23.06 with 1 degree of freedom yields p < 0.0001, so we can reject the null hypothesis that the loss probability is the same in Blitz and “Other” formats.

## Openings

In [3]:
# Clean up Opening names and binarize: Sicilian Defense vs Other
cw["Opening_clean"] = cw["Opening_name"].str.strip().str.lower()
cw["OpeningGroup"] = np.where(
    cw["Opening_clean"] == "sicilian defense", 
    "Sicilian Defense", 
    "Other"
)

#  Binary outcome: Loss vs Non‐Loss
cw["Outcome"] = np.where(cw["Result"] == "0-1", "Loss", "Non-Loss")

#  Build the 2×2 contingency table
table_opening = pd.crosstab(cw["OpeningGroup"], cw["Outcome"])
print("Contingency table (Crushing White Advantage):\n", table_opening, "\n")

# Run chi‐square test and check expected counts
chi2, p_chi2, dof, expected = chi2_contingency(table_opening)

exp_df = pd.DataFrame(expected, index=table_opening.index, columns=table_opening.columns)
print("Expected counts:\n", exp_df, "\n")

if (expected >= 5).all():
    print(f"χ²({dof}) = {chi2:.4f},  p = {p_chi2:.4f}")
else:
    print("Some expected counts are < 5; χ² may not be valid.")


Contingency table (Crushing White Advantage):
 Outcome           Loss  Non-Loss
OpeningGroup                    
Other               91       531
Sicilian Defense    22       105 

Expected counts:
 Outcome                Loss    Non-Loss
OpeningGroup                           
Other             93.839786  528.160214
Sicilian Defense  19.160214  107.839786 

χ²(1) = 0.4052,  p = 0.5244


Loss‐rates: Under a crushing White advantage, players still lose about 17.3 % [ 22/(22+105) ] of Sicilian‐Defense games versus 14.6 % of all other openings.

No significant difference: A χ² of 0.405 with p = 0.524 means we cannot reject the null hypothesis of equal loss‐rates. In plain terms, loss‐probability under a crushing edge is statistically the same in Sicilian‐Defense as in the rest.

# "Crushing Black Advantage" games

## Time formats

In [4]:
cb = data[data["Elo_Dif_Range"] == "Crushing Black Advantage"].copy()

#  Clean up Time_format labels
cb["Time_format_clean"] = cb["Time_format"].str.strip().str.lower()

#  Binary time-format: Blitz vs Other
cb["FormatGroup"] = np.where(cb["Time_format_clean"] == "blitz", "Blitz", "Other")

# Binary outcome: Loss vs Non-Loss (Black loses when Result == "1-0")
cb["Outcome"] = np.where(cb["Result"] == "1-0", "Loss", "Non-Loss")

# Build the 2×2 contingency table
table_cb = pd.crosstab(cb["FormatGroup"], cb["Outcome"])
print("Contingency table (Crushing Black Advantage):\n")
print(table_cb)

# Chi-square test + check expected counts
chi2, p, dof, expected = chi2_contingency(table_cb)
exp_df = pd.DataFrame(expected, index=table_cb.index, columns=table_cb.columns)

print("\nExpected counts:\n")
print(exp_df)

if (expected >= 5).all():
    print(f"\nχ²({dof}) = {chi2:.4f},  p = {p:.4f}")
else:
    print("\n Some expected counts are < 5; χ² may not be valid.")


Contingency table (Crushing Black Advantage):

Outcome      Loss  Non-Loss
FormatGroup                
Blitz          62       220
Other          58       405

Expected counts:

Outcome           Loss    Non-Loss
FormatGroup                       
Blitz        45.422819  236.577181
Other        74.577181  388.422819

χ²(1) = 10.9143,  p = 0.0010


Interpretation

Loss‐rates differ significantly. When Black has a crushing edge, Black still loses about 22.0 % [ (62/(62+220) ] of Blitz games versus only 12.5 % in all other formats.

Highly significant association. With p = 0.0010 (< 0.05), we can reject the null of equal loss‐probabilities: under a crushing Black advantage, the format (Blitz vs Other) significantly affects the chance of Black blundering into a loss.

## Openings

In [5]:
# Clean Opening names
cb["Opening_clean"] = cb["Opening_name"].str.strip().str.lower()

# Binarize OpeningGroup: Sicilian Defense vs Other
cb["OpeningGroup"] = np.where(cb["Opening_clean"] == "sicilian defense",
                              "Sicilian Defense", "Other")

# Outcome: Loss vs Non-Loss (Black loses when Result == "1-0")
cb["Outcome"] = np.where(cb["Result"] == "1-0", "Loss", "Non-Loss")

# Build the 2×2 contingency table
table_opening_cb = pd.crosstab(cb["OpeningGroup"], cb["Outcome"])
print("Contingency table (Crushing Black Advantage - Sicilian vs Other):\n")
print(table_opening_cb)

# Chi-square test + expected counts check
chi2, p, dof, expected = chi2_contingency(table_opening_cb)
exp_df = pd.DataFrame(expected, index=table_opening_cb.index, columns=table_opening_cb.columns)

print("\nExpected counts:\n")
print(exp_df)

if (expected >= 5).all():
    print(f"\nχ²({dof}) = {chi2:.4f},  p = {p:.4f}")
else:
    print("\nSome expected counts are < 5; χ² may not be valid. Falling back to Fisher's exact:")


Contingency table (Crushing Black Advantage - Sicilian vs Other):

Outcome           Loss  Non-Loss
OpeningGroup                    
Other               87       481
Sicilian Defense    33       144

Expected counts:

Outcome                Loss    Non-Loss
OpeningGroup                           
Other             91.489933  476.510067
Sicilian Defense  28.510067  148.489933

χ²(1) = 0.8730,  p = 0.3501


Loss‐rates are similar. Under a crushing Black advantage, Black still loses about 18.6 % of Sicilian Defense games versus 15.3 % of all other openings.

No significant association. With p = 0.3501 (> 0.05), we cannot reject the null hypothesis of equal loss‐probabilities. In plain terms, Sicilian Defense does not show a statistically different loss‐rate compared to other openings when Black is crushingly better.