# 2024: Week 28 - Wimbledon Special

July 10, 2024

Challenge by: Jenny Martin

Carl's Tour de France challenge last week has inspired a sports themed Preppin' Data month! This week the focus turns to Wimbledon. I thought it would be interesting to understand the crossover of Singles and Doubles Champions. Carl thought there wouldn't be many players who had become champions as both Singles players and Doubles players, but I thought the data could prove otherwise. Who's right? You'll have to complete the challenge to find out!

### Inputs

The data for the Wimbledon Champions is found in 3 separate tables:

1. Singles Champions 

![1](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhaP-6nLtLHwYxt4ADWhNNWdNx3nBumY_5vmEUZzn1G2bTCK6-75WACc1MWaUQ6KjLegILy4xSdSq5Xye9jYrOxhXovJZjYk6136I5UVxEnnvbMXXq6GokO6_aROoD4tCUJdooKXWgvqxkUVzG6hyphenhyphenZuhAUfYRoWBYB8CDM94P5JmA5CdjPiv6XrjS2g_HUe/s612/Screenshot%202024-07-08%20143623.png)

Taken from Britannica

2. Doubles Champions 

![2](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaQ8u4yzz_W3Hv-c8iEhkoUZiE2_dGLw17Wbmt1FQ4UkanKNFVs33eouv22rNkUMAkjCY2R75GHACZnBUxnkbyDRtifVO0Q9u1HvQFoAg4kJhex1d1uryq4sIsvxqRQagOhlULbN4JYPEQAkWbHvtG0SrSp7TWQi9kwV32jZ3IIgNJezZyQUVeManE44Ni/s920/Screenshot%202024-07-08%20143802.png)

Taken from Britannica

3. Mixed Doubles Champions 

![3](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOCPHGry-u5KXWoB619fOz61h827Hh9JrKrXZl6tTOziAsPfLP1pNs-JKwms99DoA5hi018EFHgy-2Oe5Lg6q7kON0n8BMQA8mlFibQ0Vsbggj0CVCCJGioeTfdmzzcv3k6eMPps8Nro6SbYGLs1KOPGeioxI37QrknwBDAUuC5OnlJ4-6loXm9A5wbzog/s1120/Screenshot%202024-07-08%20143919.png)

Taken from Wikipedia
### Requirements
- Input the data
- Filter out the years where the championship did not take place
- Ensure the Year field is numeric
- Reshape the data so there is a row for each Champion, for each Year, even where there are 2 winners in the Doubles
- Make sure it's clear which tournament they were the Champion of:
- Either Men's Singles, Women's Singles, Men's Doubles, Women's Doubles or Mixed Doubles
- For each Champion, calculate the most recent win across tournaments
- Bring the data together so it's clear for each Champion how many of each tournament they've won
- Filter the data to only include Champions who have won both Singles and Doubles tournaments
- Create a calculation for the Total Championships each Champion has won
- Rank the Champions in descending order of their Total Championships 
- Create a field to indicate the Gender of each Champion
- Output the data

### Output

![4](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMmX5obL4sB5F28Mq5NZee_-PzWNPdfoGADCVy-KFmBrgiPLf2GMG5Ym3RqPjI7g4nfCVT94c7S5eRn1rErgnIPI9jwrgaklRvRVAYzRG5q1-wu141necV3OUbpiNEJ-9c0Bzv5juUITYzWcQ_r81m2rwefXYqxljJ3hmNb6svVYw_pKk1oAVBuQiHOS1j/s1310/Screenshot%202024-07-08%20143149.png)

- 10 fields
- Rank
- Champion
- Gender
- Total Championships
- Women's Singles
- Men's Singles
- Women's Doubles
- Mixed Doubles
- Men's Doubles
- Most Recent Win
- 58 rows (59 including headers)

In [140]:
import pandas as pd

# Read the Excel file
xls = pd.ExcelFile('Wimbledon Champions.xlsx')

# List all sheet names
sheet_names = xls.sheet_names
print(sheet_names)

['Singles Champions', 'Doubles Champions', 'Mixed Doubles Champions']


In [141]:
# Read the Singles Champions sheet into a DataFrame
single_df = pd.read_excel(xls, sheet_name='Singles Champions')
single_df

Unnamed: 0,year,men,women
0,1877,Spencer W. Gore (U.K.),
1,1878,Frank Hadow (U.K.),
2,1879,John T. Hartley (U.K.),
3,1880,John T. Hartley (U.K.),
4,1881,Willie Renshaw (U.K.),
...,...,...,...
138,2023,Carlos Alcaraz (Spain),Marketa Vondrousova (Czech.)
139,1Tournament canceled because of World War I.,,
140,2Tournament canceled because of World War II.,,
141,3Open since 1968.,,


In [142]:
# Filter out records where 'men' or 'women' columns have 'not held' as value
single_df = single_df[(single_df['men'] != 'not held') & (single_df['women'] != 'not held')]

# Filter out records where both 'men' and 'women' columns are null
single_df = single_df.dropna(subset=['men', 'women'], how='all')

# Convert the 'year' column to numeric, forcing errors to NaN
single_df['year'] = pd.to_numeric(single_df['year'], errors='coerce')

# Drop rows with NaN values in 'year' column
single_df = single_df.dropna(subset=['year'])

# Convert 'year' column to integer type
single_df['year'] = single_df['year'].astype(int)
single_df

Unnamed: 0,year,men,women
0,1877,Spencer W. Gore (U.K.),
1,1878,Frank Hadow (U.K.),
2,1879,John T. Hartley (U.K.),
3,1880,John T. Hartley (U.K.),
4,1881,Willie Renshaw (U.K.),
...,...,...,...
133,2018,Novak Djokovic (Serbia),Angelique Kerber (Ger.)
134,2019,Novak Djokovic (Serbia),Simona Halep (Rom.)
136,2021,Novak Djokovic (Serbia),Ashleigh Barty (Austl.)
137,2022,Novak Djokovic (Serbia),Elena Rybakina (Kazakh.)


In [143]:
# Reshape the single_df DataFrame
single_df_melted = single_df.melt(id_vars=['year'], value_vars=['men', 'women'], var_name='tournament', value_name='champion')

# Drop rows with NaN values in the 'champion' column
single_df_melted = single_df_melted.dropna(subset=['champion'])

# Change the values in the 'tournament' column
single_df_melted['tournament'] = single_df_melted['tournament'].replace({'men': "Men's Singles", 'women': "Women's Singles"})

single_df_melted

Unnamed: 0,year,tournament,champion
0,1877,Men's Singles,Spencer W. Gore (U.K.)
1,1878,Men's Singles,Frank Hadow (U.K.)
2,1879,Men's Singles,John T. Hartley (U.K.)
3,1880,Men's Singles,John T. Hartley (U.K.)
4,1881,Men's Singles,Willie Renshaw (U.K.)
...,...,...,...
267,2018,Women's Singles,Angelique Kerber (Ger.)
268,2019,Women's Singles,Simona Halep (Rom.)
269,2021,Women's Singles,Ashleigh Barty (Austl.)
270,2022,Women's Singles,Elena Rybakina (Kazakh.)


In [144]:
# Read the Double Champions sheet into a DataFrame
double_df = pd.read_excel(xls, sheet_name='Doubles Champions')
double_df

Unnamed: 0,year,men,women
0,1879,"Robert Erskine, Herbert Lawford",
1,1880,"Willie Renshaw, Ernest Renshaw",
2,1881,"Willie Renshaw, Ernest Renshaw",
3,1882,"John Hartley, R.T. Richardson",
4,1883,"C.W. Grinstead, C.E. Welldon",
...,...,...,...
135,2022,"Matthew Ebden, Max Purcell","Barbora Krejcikova, Katerina Siniakova"
136,2023,"Wesley Koolhof, Neal Skupski","Hsieh Su-wei, Barbora Strycova"
137,1Tournament canceled because of World War I.,,
138,2Tournament canceled because of World War II.,,


In [145]:
# Filter out records where 'men' or 'women' columns have 'not held' as value
double_df = double_df[(double_df['men'] != 'not held') & (double_df['women'] != 'not held')]

# Filter out records where both 'men' and 'women' columns are null
double_df = double_df.dropna(subset=['men', 'women'], how='all')

# Convert the 'year' column to numeric, forcing errors to NaN
double_df['year'] = pd.to_numeric(double_df['year'], errors='coerce')

# Drop rows with NaN values in 'year' column
double_df = double_df.dropna(subset=['year'])

# Convert 'year' column to integer type
double_df['year'] = double_df['year'].astype(int)
double_df

Unnamed: 0,year,men,women
0,1879,"Robert Erskine, Herbert Lawford",
1,1880,"Willie Renshaw, Ernest Renshaw",
2,1881,"Willie Renshaw, Ernest Renshaw",
3,1882,"John Hartley, R.T. Richardson",
4,1883,"C.W. Grinstead, C.E. Welldon",
...,...,...,...
131,2018,"Mike Bryan, Jack Sock","Barbora Krejcikova, Katerina Siniakova"
132,2019,"Juan Sebastian Cabal, Robert Farah","Hsieh Su-wei, Barbora Strycova"
134,2021,"Nikola Mektic, Mate Pavic","Hsieh Su-wei, Elise Mertens"
135,2022,"Matthew Ebden, Max Purcell","Barbora Krejcikova, Katerina Siniakova"


In [146]:
# Reshape the double_df DataFrame
double_df_melted = double_df.melt(id_vars=['year'], value_vars=['men', 'women'], var_name='tournament', value_name='champion')

# Drop rows with NaN values in the 'champion' column
double_df_melted = double_df_melted.dropna(subset=['champion'])

# Split the 'champion' column by ',' and explode the DataFrame
double_df_melted['champion'] = double_df_melted['champion'].str.split(',')
double_df_melted = double_df_melted.explode('champion')

# Remove leading and trailing spaces from the 'champion' column
double_df_melted['champion'] = double_df_melted['champion'].str.strip()

# Change the values in the 'tournament' column
double_df_melted['tournament'] = double_df_melted['tournament'].replace({'men': "Men's Doubles", 'women': "Women's Doubles"})


double_df_melted

Unnamed: 0,year,tournament,champion
0,1879,Men's Doubles,Robert Erskine
0,1879,Men's Doubles,Herbert Lawford
1,1880,Men's Doubles,Willie Renshaw
1,1880,Men's Doubles,Ernest Renshaw
2,1881,Men's Doubles,Willie Renshaw
...,...,...,...
265,2021,Women's Doubles,Elise Mertens
266,2022,Women's Doubles,Barbora Krejcikova
266,2022,Women's Doubles,Katerina Siniakova
267,2023,Women's Doubles,Hsieh Su-wei


In [147]:
# Read the Mixed Doubles Champions sheet into a DataFrame
mixed_double_df = pd.read_excel(xls, sheet_name='Mixed Doubles Champions')
mixed_double_df

Unnamed: 0,Year,Champions,Runners-up,Score[2]
0,1913,Hope Crisp\nAgnes Tuckey,James Cecil Parke\nEthel Larcombe,"3–6, 5–3 retired"
1,1914,James Cecil Parke\nEthel Larcombe,Anthony Wilding\nMarguerite Broquedis,"4–6, 6–4, 6–2"
2,1915,No competition (due to World War I),,
3,1916,,,
4,1917,,,
...,...,...,...,...
106,2019,Ivan Dodig\nLatisha Chan,Robert Lindstedt\nJeļena Ostapenko,"6–2, 6–3"
107,2020,No competition (due to COVID-19 pandemic)[3],,
108,2021,Neal Skupski\nDesirae Krawczyk,Joe Salisbury\nHarriet Dart,"6–2, 7–6(7–1)"
109,2022,Neal Skupski(2)\nDesirae Krawczyk(2),Matthew Ebden\nSamantha Stosur,"6–4, 6–3"


In [148]:
# Filter out records with null Score[2]
mixed_double_df = mixed_double_df[mixed_double_df['Score[2]'].notna()]
mixed_double_df

Unnamed: 0,Year,Champions,Runners-up,Score[2]
0,1913,Hope Crisp\nAgnes Tuckey,James Cecil Parke\nEthel Larcombe,"3–6, 5–3 retired"
1,1914,James Cecil Parke\nEthel Larcombe,Anthony Wilding\nMarguerite Broquedis,"4–6, 6–4, 6–2"
6,1919,Randolph Lycett\nElizabeth Ryan,Albert Prebble\nDorothea Lambert Chambers,"6–0, 6–0"
7,1920,Gerald Patterson\nSuzanne Lenglen,Randolph Lycett\nElizabeth Ryan,"7–5, 6–3"
8,1921,Randolph Lycett(2)\nElizabeth Ryan(2),Max Woosnam\nPhyllis Howkins,"6–3, 6–1"
...,...,...,...,...
105,2018,Alexander Peya\nNicole Melichar,Jamie Murray\nVictoria Azarenka,"7–6(7–1), 6–3"
106,2019,Ivan Dodig\nLatisha Chan,Robert Lindstedt\nJeļena Ostapenko,"6–2, 6–3"
108,2021,Neal Skupski\nDesirae Krawczyk,Joe Salisbury\nHarriet Dart,"6–2, 7–6(7–1)"
109,2022,Neal Skupski(2)\nDesirae Krawczyk(2),Matthew Ebden\nSamantha Stosur,"6–4, 6–3"
