# World Marathon Majors - Winners

The world marathon majors consist of six major city marathons (https://en.wikipedia.org/wiki/World_Marathon_Majors):

List of all historic winners can be found via their individual wikipedia pages:

    - Tokyo (https://en.wikipedia.org/wiki/Tokyo_Marathon)
    - Boston (https://en.wikipedia.org/wiki/List_of_winners_of_the_Boston_Marathon)
    - London (https://en.wikipedia.org/wiki/List_of_winners_of_the_London_Marathon)
    - Berlin (https://en.wikipedia.org/wiki/Berlin_Marathon)
    - Chicago (https://en.wikipedia.org/wiki/List_of_winners_of_the_Chicago_Marathon)
    - New York (https://en.wikipedia.org/wiki/List_of_winners_of_the_New_York_City_Marathon)
    
Using the Wikipedia API, lets see if we can collect and compile a list of winners for each race on both the male and female runners and wheelchair athletes races and process to an easy to read form.

In [1]:
import pandas as pd
import wikipedia as wp
import re
import numpy as np

### Find each page and make sure each exists

In [2]:
Tokyo_page=wp.page('Tokyo Marathon')
Boston_page=wp.page('List of winners of the Boston Marathon')
London_page=wp.page('List of winners of the London Marathon')
Berlin_page=wp.page('Berlin Marathon')
Chicago_page=wp.page('List of winners of the Chicago Marathon')
New_York_page=wp.page('List of winners of the New York City Marathon')

pages=[Tokyo_page,Boston_page,London_page,Berlin_page,Chicago_page,New_York_page]

for page in pages:
    print(page.title)

Tokyo Marathon
List of winners of the Boston Marathon
List of winners of the London Marathon
Berlin Marathon
List of winners of the Chicago Marathon
List of winners of the New York City Marathon


For each page, we'll use the pandas read_html() method to get all relevant tables.

We first need to identify which tables we need by manually inspecting the wikipedia pages for each race.

## The London Marathon

https://en.wikipedia.org/wiki/List_of_winners_of_the_London_Marathon

In [3]:
London_page.summary

"The London Marathon, one of the six World Marathon Majors, has been contested by men and women annually since 29 March 1981. Set over a largely flat course around the River Thames, the marathon is 26.2 miles (42.2 km) in length and generally regarded as a competitive and unpredictable event, and conducive to fast times.The inaugural marathon had 7,741 entrants, 6,255 of whom completed the race. The first Men's Elite Race was tied between American Dick Beardsley and Norwegian Inge Simonsen, who crossed the finish line holding hands in 2 hours, 11 minutes, 48 seconds. The first Women's Elite Race was won by Briton Joyce Smith in 2:29:57. In 1983, the first wheelchair races took place. Organized by the British Sports Association for the Disabled (BASD), 19 people competed and 17 finished. Gordon Perry of the United Kingdom won the Men's Wheelchair Race, coming in at 3:20:07, and Denise Smith, also of the UK, won the Women's Wheelchair Race in 4:29:03.Twenty athletes representing the Unit

By manually inspecting this page, we see that the results we're looking for are the first to fourth tables on the page.

In [4]:
html = wp.page("List of winners of the London Marathon").html()
male_elite_London = pd.read_html(html)[0]
female_elite_London = pd.read_html(html)[1]
male_wheel_London = pd.read_html(html)[2]
female_wheel_London = pd.read_html(html)[3]

In [5]:
male_elite_London.head()

Unnamed: 0,0,1,2,3,4
0,Year,Athlete,Nationality,Time(h:m:s),Notes
1,1981,Dick Beardsley (Tie),United States,2:11:48,Course record
2,Inge Simonsen (Tie),Norway,,,
3,1982,Hugh Jones,United Kingdom,2:09:24,Course record
4,1983,Mike Gratton,United Kingdom,2:09:43,


### Cleaning up

London is famous for having a tie in the men's race on its first edition in 1981. This is reflected in the first table taken from the wiki page so let's clean that up first. It's only one instance, so let's clean this manually.

In [6]:
male_elite_London.iloc[1][1]='Dick Beardsley and Inge Simonson (Tie)'
male_elite_London.iloc[1][2]='United States and Norway'
male_elite_London=male_elite_London.reindex(male_elite_London.index.drop(2)).reset_index(drop=True)
male_elite_London.head()

Unnamed: 0,0,1,2,3,4
0,Year,Athlete,Nationality,Time(h:m:s),Notes
1,1981,Dick Beardsley and Inge Simonson (Tie),United States and Norway,2:11:48,Course record
2,1982,Hugh Jones,United Kingdom,2:09:24,Course record
3,1983,Mike Gratton,United Kingdom,2:09:43,
4,1984,Charlie Spedding,United Kingdom,2:09:57,


Now that initial cleaning is done, let's standardise these tables somewhat making the year the index and change the column headings out of the first row. 

Let's also convert that time to datetime format.

In [7]:
def standardise_table(df):
    df.columns = df.iloc[0]
    df=df.reindex(df.index.drop(0))
    df=df.set_index('Year')
    df['Time(h:m:s)']=pd.to_datetime(df['Time(h:m:s)'],format='%H:%M:%S').dt.time
    return(df)

In [8]:
male_elite_London=standardise_table(male_elite_London)
female_elite_London=standardise_table(female_elite_London)
male_wheel_London=standardise_table(male_wheel_London)
female_wheel_London=standardise_table(female_wheel_London)

male_elite_London.head()

Unnamed: 0_level_0,Athlete,Nationality,Time(h:m:s),Notes
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1981,Dick Beardsley and Inge Simonson (Tie),United States and Norway,02:11:48,Course record
1982,Hugh Jones,United Kingdom,02:09:24,Course record
1983,Mike Gratton,United Kingdom,02:09:43,
1984,Charlie Spedding,United Kingdom,02:09:57,
1985,Steve Jones,United Kingdom,02:08:16,Course record


## Boston Marathon

https://en.wikipedia.org/wiki/List_of_winners_of_the_Boston_Marathon

In [9]:
Boston_page.summary

'The Boston Marathon is an annual marathon held in the Greater Boston area in Massachusetts. The event is held on Patriots Day, the third Monday of April. The Boston Marathon has been held annually since 1897 and is the oldest annual marathon in the world.'

In [126]:
html = wp.page("List of winners of the Boston Marathon").html()
male_elite_Boston = pd.read_html(html)[1]
female_elite_Boston = pd.read_html(html)[2]
male_wheel_Boston = pd.read_html(html)[3]
female_wheel_Boston = pd.read_html(html)[4]

In [127]:
male_wheel_Boston.head()

Unnamed: 0,0,1,2,3,4
0,Year,Athlete,Country/State,Time,Notes
1,1975,"Hall, RobertRobert Hall",United States United States (MA),2:58:00,
2,1976,zzzNone,,,
3,1977,"Hall, RobertRobert Hall",United States United States (MA),2:40:10,2nd victory
4,1978,"Murray, GeorgeGeorge Murray",United States United States (FL),2:26:57,


There was no mens wheelchair race in 1976 so we'll remove that from this table.

In [128]:
male_wheel_Boston=male_wheel_Boston.reindex(male_wheel_Boston.index.drop(2)).reset_index(drop=True)
male_wheel_Boston.head()

Unnamed: 0,0,1,2,3,4
0,Year,Athlete,Country/State,Time,Notes
1,1975,"Hall, RobertRobert Hall",United States United States (MA),2:58:00,
2,1977,"Hall, RobertRobert Hall",United States United States (MA),2:40:10,2nd victory
3,1978,"Murray, GeorgeGeorge Murray",United States United States (FL),2:26:57,
4,1979,"Archer, KenKen Archer",United States United States (OH),2:38:59,


For both the mens and women's wheelchair race, there are also time containing citations. We'll need to remove these as well so we can convert the time to datetime format. We can do that with a simple regex.

In [129]:
male_wheel_Boston.tail()

Unnamed: 0,0,1,2,3,4
39,2014,Ernst van Dyk,South Africa,1:20:36,10th victory
40,2015,Marcel Hug,Switzerland,1:29:53,
41,2016,Marcel Hug,Switzerland,1:24:01,2nd victory
42,2017,Marcel Hug,Switzerland,1:18:03,3rd victory
43,2018,Marcel Hug,Switzerland,1:46:26[6],4th victory


In [130]:
def remove_citation(time):
    return(re.sub('\[.*?\]','',str(time)))

In [131]:
male_wheel_Boston[3]=male_wheel_Boston[3].apply(remove_citation)
female_wheel_Boston[3]=female_wheel_Boston[3].apply(remove_citation)
male_wheel_Boston.tail()

Unnamed: 0,0,1,2,3,4
39,2014,Ernst van Dyk,South Africa,1:20:36,10th victory
40,2015,Marcel Hug,Switzerland,1:29:53,
41,2016,Marcel Hug,Switzerland,1:24:01,2nd victory
42,2017,Marcel Hug,Switzerland,1:18:03,3rd victory
43,2018,Marcel Hug,Switzerland,1:46:26,4th victory


### Standardise tables

As with the London Marathon tables, we can now make the year the index and change the column headings out of the first row. We'll also convert the time to datetime format.

In [132]:
def standardise_table(df):
    df.columns = df.iloc[0]
    df=df.reindex(df.index.drop(0))
    df=df.set_index('Year')
    df['Time']=pd.to_datetime(df['Time'],format='%H:%M:%S',errors='coerce').dt.time
    return(df)

In [133]:
male_elite_Boston=standardise_table(male_elite_Boston)
female_elite_Boston=standardise_table(female_elite_Boston)
male_wheel_Boston=standardise_table(male_wheel_Boston)
female_wheel_Boston=standardise_table(female_wheel_Boston)

male_elite_Boston.head()

Unnamed: 0_level_0,Athlete,Country/State or Province,Time,Notes
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1897,"McDermott, John J.John J. McDermott",United States United States(NY),02:55:10,
1898,"MacDonald, Ronald J.Ronald J. MacDonald",Canada Canada (NS),02:42:00,
1899,"Brignolia, LawrenceLawrence Brignolia",United States United States (MA),02:54:38,
1900,"Caffery, JohnJohn ""Jack"" Caffery",Canada Canada (ON),02:39:44,
1901,"Caffery, JohnJohn ""Jack"" Caffery",Canada Canada (ON),02:29:23,2nd victory


### Cleaning up

For the Boston results, we have an odd situation where many of the names in the tables were hyperlinked so we have repetitions where the cell of the table in the html contained text both in the span and in the anchor text of the url. For this we'll use regex.

In [134]:
male_elite_Boston['Athlete'].iloc[1]

'MacDonald, Ronald J.Ronald J. MacDonald'

In [135]:
def clean_names(athlete):
    #find all words in name starting with a capital letter followed by lower case
    regex='[A-Z][a-z]+'
    names=re.findall(regex,athlete)
    #join 'Mc' and 'Mac' to second part of the name and take care of double barrels
    exceptions=['Mc','Mac','De','Van','Cable']
    names_amended=[]
    for i,name in enumerate(names):
        if name in exceptions:
            names_amended.append(name+str(names[i+1]))
            names.remove(names[i+1])
        else:
            names_amended.append(name)
    #the full name should now be the last two words in the list
    names_amended
    name=' '.join(names_amended[len(names_amended)-2:])
    return(name)

In [136]:
male_elite_Boston['Athlete']=male_elite_Boston['Athlete'].apply(clean_names)
female_elite_Boston['Athlete']=female_elite_Boston['Athlete'].apply(clean_names)
male_wheel_Boston['Athlete']=male_wheel_Boston['Athlete'].apply(clean_names)
female_wheel_Boston['Athlete']=female_wheel_Boston['Athlete'].apply(clean_names)
male_elite_Boston.head()

Unnamed: 0_level_0,Athlete,Country/State or Province,Time,Notes
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1897,John McDermott,United States United States(NY),02:55:10,
1898,Ronald MacDonald,Canada Canada (NS),02:42:00,
1899,Lawrence Brignolia,United States United States (MA),02:54:38,
1900,Jack Caffery,Canada Canada (ON),02:39:44,
1901,Jack Caffery,Canada Canada (ON),02:29:23,2nd victory


The final cleaning step for the Boston results is to separate out the Country from the state for US and Canadian athletes.

In [137]:
male_elite_Boston['Country/State or Province'].iloc[0]

'United States United States(NY)'

In [138]:
test=male_elite_Boston['Country/State or Province'].iloc[0]

In [139]:
def find_state(entry):
    state=re.findall('\(.*?\)',entry)
    if state != []:
        state=state[0]
        state=state.replace(')','')
        state=state.replace('(','')
    else:
        state=np.NaN
    return(state)

In [140]:
male_elite_Boston['State']=male_elite_Boston['Country/State or Province'].apply(find_state)
female_elite_Boston['State']=female_elite_Boston['Country/State'].apply(find_state)
male_wheel_Boston['State']=male_wheel_Boston['Country/State'].apply(find_state)
female_wheel_Boston['State']=female_wheel_Boston['Country/State'].apply(find_state)
male_elite_Boston.head()

Unnamed: 0_level_0,Athlete,Country/State or Province,Time,Notes,State
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1897,John McDermott,United States United States(NY),02:55:10,,NY
1898,Ronald MacDonald,Canada Canada (NS),02:42:00,,NS
1899,Lawrence Brignolia,United States United States (MA),02:54:38,,MA
1900,Jack Caffery,Canada Canada (ON),02:39:44,,ON
1901,Jack Caffery,Canada Canada (ON),02:29:23,2nd victory,ON


In [141]:
def clean_country(country):
    country=re.sub('\(.*?\)','',country)
    country=' '.join(set(country.split(' '))).strip()
    return(country)

In [142]:
male_elite_Boston['Country/State or Province']=male_elite_Boston['Country/State or Province'].apply(clean_country)
female_elite_Boston['Country/State']=female_elite_Boston['Country/State'].apply(clean_country)
male_wheel_Boston['Country/State']=male_wheel_Boston['Country/State'].apply(clean_country)
female_wheel_Boston['Country/State']=female_wheel_Boston['Country/State'].apply(clean_country)
male_elite_Boston.head()

Unnamed: 0_level_0,Athlete,Country/State or Province,Time,Notes,State
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1897,John McDermott,United States,02:55:10,,NY
1898,Ronald MacDonald,Canada,02:42:00,,NS
1899,Lawrence Brignolia,United States,02:54:38,,MA
1900,Jack Caffery,Canada,02:39:44,,ON
1901,Jack Caffery,Canada,02:29:23,2nd victory,ON


## Chicago Marathon

https://en.wikipedia.org/wiki/List_of_winners_of_the_Chicago_Marathon

In [110]:
Chicago_page.summary

"The Chicago Marathon, one of the six World Marathon Majors, has been contested by men and women annually since 1977.  Since 1983, it has been held annually in October.  The United States had been represented by the most Chicago Marathon winners (nine men and twelve women).  After a seventh consecutive win by a Kenyan man in 2009, Kenyan men have won more times (ten) than men representing any other country. The United Kingdom is in third place in total victories (eight), victories by men (five) and victories by women (three).  All four of Brazil's victors have been men, and all three of Portugal's winners have been women."

In [111]:
html = wp.page("List of winners of the Chicago Marathon").html()
all_elite_Chicago = pd.read_html(html)[1]
all_wheel_Chicago = pd.read_html(html)[2]

In [124]:
all_elite_Chicago.tail(20)

Unnamed: 0_level_0,Male athlete,Male Country,Male Time,Female athlete,Female Country,Female Time
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1999,Khalid Khannouchi,Morocco,,Joyce Chepchumba,Kenya,02:25:59
2000,Khalid Khannouchi,United States,02:07:01,Catherine Ndereba,Kenya,02:21:33
2001,Ben Kimondiu,Kenya,02:08:52,Catherine Ndereba,Kenya,
2002,Khalid Khannouchi,United States,02:05:56,Paula Radcliffe,United Kingdom,
2003,Evans Rutto,Kenya,02:05:50,Svetlana Zakharova,Russia,02:23:07
2004,Evans Rutto,Kenya,02:06:16,Constantina Diță,Romania,02:23:45
2005,Felix Limo,Kenya,02:07:02,Deena Kastor,United States,02:21:25
2006,Robert Cheruiyot,Kenya,02:07:35,Berhane Adere,Ethiopia,02:20:42
2007,Patrick Ivuti,Kenya,02:11:11,Berhane Adere,Ethiopia,02:33:49
2008,Evans Cheruiyot,Kenya,02:06:25,Lidiya Grigoryeva,Russia,02:27:17


In [113]:
all_wheel_Chicago

Unnamed: 0,0,1,2,3,4,5,6
0,Date,Male athlete,Country,Time,Female athlete,Country,Time
1,1984,Robert Fitch,United States,2:35:06,Jonnie Baylark,United States,3:29:10
2,1985,Robert Fitch,United States,2:23:41,Jayne Fortson,United States,2:52:22
3,1986,Bart Bardwell,United States,2:10:19,Jonnie Baylark,United States,3:23:32
4,1987[11],—,—,—,—,—,—
5,1988,Ken Luckenbaugh,United States,2:12:17,—,—,—


The Chicago Marathon results are formatted so the male and female athletes appear in the same table and both the elites and wheelchair tables will need different considerations in terms of cleaning and standardising so we'll need to treat each one separately. 

### Cleaning up - Elites table

Let's start by standardising the table, making the top row the column heading and changing the time to datetime format. We'll also need to change the 'Time' and 'Country column names to avoid duplicate column names.

In [114]:
def standardise_table(df):
    df[3].iloc[0]='Male Time'
    df[2].iloc[0]='Male Country'
    df[6].iloc[0]='Female Time'
    df[5].iloc[0]='Female Country'
    df.columns = df.iloc[0]
    df=df.reindex(df.index.drop(0))
    df['Male Time']=pd.to_datetime(df['Male Time'],format='%H:%M:%S',errors='coerce').dt.time
    df['Female Time']=pd.to_datetime(df['Female Time'],format='%H:%M:%S',errors='coerce').dt.time
    return(df)

In [115]:
all_elite_Chicago=standardise_table(all_elite_Chicago)
all_elite_Chicago.head()

Unnamed: 0,Date,Male athlete,Male Country,Male Time,Female athlete,Female Country,Female Time
1,"September 25, 1977",Dan Cloeter,United States,02:17:52,Dorothy Doolittle,United States,02:50:47
2,"September 24, 1978",Mark Stanforth,United States,02:19:20,Lynae Larson,United States,02:59:25
3,"October 21, 1979",Dan Cloeter,United States,02:23:20,Laura Michalek,United States,03:15:45
4,"September 28, 1980",Frank Richardson,United States,02:14:04,Sue Peterson,United States,02:45:03
5,"September 27, 1981",Phil Coppess,United States,02:16:13,Tina Gandy,United States,02:49:39


The year format is slightly different in the case of the elite athletes table as it give the full date of the race. We only want the year so we'll extract that first. 

In [116]:
def get_year(date):
    return(date[len(date)-4:])

In [117]:
all_elite_Chicago['Year']=all_elite_Chicago['Date'].apply(get_year)
all_elite_Chicago.head()

Unnamed: 0,Date,Male athlete,Male Country,Male Time,Female athlete,Female Country,Female Time,Year
1,"September 25, 1977",Dan Cloeter,United States,02:17:52,Dorothy Doolittle,United States,02:50:47,1977
2,"September 24, 1978",Mark Stanforth,United States,02:19:20,Lynae Larson,United States,02:59:25,1978
3,"October 21, 1979",Dan Cloeter,United States,02:23:20,Laura Michalek,United States,03:15:45,1979
4,"September 28, 1980",Frank Richardson,United States,02:14:04,Sue Peterson,United States,02:45:03,1980
5,"September 27, 1981",Phil Coppess,United States,02:16:13,Tina Gandy,United States,02:49:39,1981


We can then set the index of the elites table as the year and remove the date column.

In [118]:
def set_index(df):
    df=df.set_index('Year')
    df=df.drop(['Date'],axis=1)
    df=df.drop(['[11]'],axis=0)
    return(df)

In [119]:
all_elite_Chicago=set_index(all_elite_Chicago)
all_elite_Chicago.head(10)

Unnamed: 0_level_0,Male athlete,Male Country,Male Time,Female athlete,Female Country,Female Time
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1977,Dan Cloeter,United States,02:17:52,Dorothy Doolittle,United States,02:50:47
1978,Mark Stanforth,United States,02:19:20,Lynae Larson,United States,02:59:25
1979,Dan Cloeter,United States,02:23:20,Laura Michalek,United States,03:15:45
1980,Frank Richardson,United States,02:14:04,Sue Peterson,United States,02:45:03
1981,Phil Coppess,United States,02:16:13,Tina Gandy,United States,02:49:39
1982,Greg Meyer,United States,02:10:59,Nancy Conz,United States,02:33:23
1983,Joseph Nzau,Kenya,02:09:44,Rosa Mota,Portugal,02:31:12
1984,Steve Jones,United Kingdom,,Rosa Mota,Portugal,02:26:01
1985,Steve Jones,United Kingdom,02:07:13,Joan Benoit,United States,02:21:21
1986,Toshihiko Seko,Japan,02:08:27,Ingrid Kristiansen,Norway,02:27:08


We can now finally split the table into male and female.

In [121]:
male_elite_Chicago=all_elite_Chicago[['Male athlete','Male Country','Male Time']]
male_elite_Chicago=male_elite_Chicago.rename(index=str, columns={"Male athlete": "Athlete", "Male Country": "Country", "Male Time":"Time"})
male_elite_Chicago.head(10)

Unnamed: 0_level_0,Athlete,Country,Time
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1977,Dan Cloeter,United States,02:17:52
1978,Mark Stanforth,United States,02:19:20
1979,Dan Cloeter,United States,02:23:20
1980,Frank Richardson,United States,02:14:04
1981,Phil Coppess,United States,02:16:13
1982,Greg Meyer,United States,02:10:59
1983,Joseph Nzau,Kenya,02:09:44
1984,Steve Jones,United Kingdom,
1985,Steve Jones,United Kingdom,02:07:13
1986,Toshihiko Seko,Japan,02:08:27


In [122]:
female_elite_Chicago=all_elite_Chicago[['Female athlete','Female Country','Female Time']]
female_elite_Chicago=female_elite_Chicago.rename(index=str, columns={"Female athlete": "Athlete", "Female Country": "Country", "Female Time":"Time"})
female_elite_Chicago.head(10)

Unnamed: 0_level_0,Athlete,Country,Time
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1977,Dorothy Doolittle,United States,02:50:47
1978,Lynae Larson,United States,02:59:25
1979,Laura Michalek,United States,03:15:45
1980,Sue Peterson,United States,02:45:03
1981,Tina Gandy,United States,02:49:39
1982,Nancy Conz,United States,02:33:23
1983,Rosa Mota,Portugal,02:31:12
1984,Rosa Mota,Portugal,02:26:01
1985,Joan Benoit,United States,02:21:21
1986,Ingrid Kristiansen,Norway,02:27:08


### Cleaning up - Wheelchair table

In [39]:
all_wheel_Chicago

Unnamed: 0,0,1,2,3,4,5,6
0,Date,Male athlete,Country,Time,Female athlete,Country,Time
1,1984,Robert Fitch,United States,2:35:06,Jonnie Baylark,United States,3:29:10
2,1985,Robert Fitch,United States,2:23:41,Jayne Fortson,United States,2:52:22
3,1986,Bart Bardwell,United States,2:10:19,Jonnie Baylark,United States,3:23:32
4,1987[11],—,—,—,—,—,—
5,1988,Ken Luckenbaugh,United States,2:12:17,—,—,—


The wheelchair race results of Chicago is admittedly incomplete on wikipedia but we'll tidy up what we have here. We'll first remove the 1987 blank result as this year was contested as a half marathon. We'll also remove the blank results for the female race in 1988.

In [40]:
all_wheel_Chicago=all_wheel_Chicago.drop([4]).reset_index(drop=True)
all_wheel_Chicago=all_wheel_Chicago.replace('—', np.NaN)
all_wheel_Chicago

Unnamed: 0,0,1,2,3,4,5,6
0,Date,Male athlete,Country,Time,Female athlete,Country,Time
1,1984,Robert Fitch,United States,2:35:06,Jonnie Baylark,United States,3:29:10
2,1985,Robert Fitch,United States,2:23:41,Jayne Fortson,United States,2:52:22
3,1986,Bart Bardwell,United States,2:10:19,Jonnie Baylark,United States,3:23:32
4,1988,Ken Luckenbaugh,United States,2:12:17,,,


We can now standardise the table in the same way as with the elites table and set the date column as the index.

In [41]:
all_wheel_Chicago=standardise_table(all_wheel_Chicago)
all_wheel_Chicago=all_wheel_Chicago.set_index('Date')
all_wheel_Chicago

Unnamed: 0_level_0,Male athlete,Male Country,Male Time,Female athlete,Female Country,Female Time
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984,Robert Fitch,United States,02:35:06,Jonnie Baylark,United States,03:29:10
1985,Robert Fitch,United States,02:23:41,Jayne Fortson,United States,02:52:22
1986,Bart Bardwell,United States,02:10:19,Jonnie Baylark,United States,03:23:32
1988,Ken Luckenbaugh,United States,02:12:17,,,


And now split between male and female.

In [42]:
male_wheel_Chicago=all_wheel_Chicago[['Male athlete','Male Country','Male Time']]
male_wheel_Chicago=male_wheel_Chicago.rename(index=str, columns={"Male athlete": "Athlete", "Male Country": "Country", "Male Time":"Time"})
male_wheel_Chicago

Unnamed: 0_level_0,Athlete,Country,Time
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1984,Robert Fitch,United States,02:35:06
1985,Robert Fitch,United States,02:23:41
1986,Bart Bardwell,United States,02:10:19
1988,Ken Luckenbaugh,United States,02:12:17


In [43]:
female_wheel_Chicago=all_wheel_Chicago[['Female athlete','Female Country','Female Time']]
female_wheel_Chicago=female_wheel_Chicago.drop(['1988'])
female_wheel_Chicago=female_wheel_Chicago.rename(index=str, columns={"Female athlete": "Athlete", "Female Country": "Country", "Female Time":"Time"})
female_wheel_Chicago

Unnamed: 0_level_0,Athlete,Country,Time
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1984,Jonnie Baylark,United States,03:29:10
1985,Jayne Fortson,United States,02:52:22
1986,Jonnie Baylark,United States,03:23:32


## New York City Marathon

https://en.wikipedia.org/wiki/List_of_winners_of_the_New_York_City_Marathon

In [44]:
New_York_page.summary

"The New York City Marathon is a 42,195-meter (26.2 mile) race through all five boroughs of New York City, and ranks as the largest marathon in the world, with a record 51,392 finishers (29,927 men/21,465 women) in 2016. Since its inaugural race in 1970, 34 men and 25 women have won the open division of the New York City Marathon, representing 22 different countries. From 1970 through 1975, the race was held entirely in Central Park. It has started in Staten Island and gone through New York City's other four boroughs since 1976. Grete Waitz of Norway has won the race more than any other athlete, having completed her 9th victory in 1988, setting three course records in the process. Current course records were set by Geoffrey Mutai of Kenya in 2011 in the men's division, and by Margaret Okayo of Kenya in 2003 in the women's division.\nA wheelchair race has been held since 2000. Among the wheelchair racers, Edith Hunkeler of Switzerland and Tatyana McFadden of USA has the most victories, 

In [145]:
html = wp.page("List of winners of the New York City Marathon").html()
male_elite_New_York = pd.read_html(html)[0]
female_elite_New_York = pd.read_html(html)[1]
male_wheel_New_York = pd.read_html(html)[2]
female_wheel_New_York = pd.read_html(html)[3]

In [146]:
male_elite_New_York.head()

Unnamed: 0,0,1,2,3,4
0,Year,Winner,Country,Time,Notes
1,1970,Gary Muhrcke,United States,2:31:38,Course record
2,1971,Norman Higgins,United States,2:22:54,Course record
3,1972,Sheldon Karlin,United States,2:27:52,
4,1973,Tom Fleming,United States,2:21:54,Course record


The results for the NYC Marathon elites race look pretty straightforward where we can just standardise the tables in a similar way to the results from the London Marathon.

In [147]:
def standardise_table(df):
    df.columns = df.iloc[0]
    df=df.reindex(df.index.drop(0))
    df=df.set_index('Year')
    df['Time']=pd.to_datetime(df['Time'],format='%H:%M:%S',errors='coerce').dt.time
    return(df)

In [148]:
male_elite_New_York=standardise_table(male_elite_New_York)
female_elite_New_York=standardise_table(female_elite_New_York)

male_elite_New_York.tail()

Unnamed: 0_level_0,Winner,Country,Time,Notes
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013,Geoffrey Mutai,Kenya,02:08:24,2nd victory
2014,Wilson Kipsang,Kenya,02:10:59,
2015,Stanley Biwott,Kenya,02:10:34,
2016,Ghirmay Ghebreslassie,Eritrea,02:07:51,
2017,Geoffrey Kamworor,Kenya,02:10:53,


The results for the wheelchair race need a little extra attention as there is a small formatting mistake in the time for the 2017 mens wheelchair race. We'll add this back in manually.

In [149]:
male_wheel_New_York.tail()

Unnamed: 0,0,1,2,3,4
14,2013,Marcel Hug,Switzerland,1:40:14,
15,2014,Kurt Fearnley,Australia,1:30:55,5th victory (Note that the course was shortene...
16,2015,Ernst van Dyk,South Africa,1:30:54,2nd victory
17,2016,Marcel Hug,Switzerland,1:35:44,2nd victory
18,2017,Marcel Hug,Switzerland,1:37.17,3rd victory


In [150]:
male_wheel_New_York[3].iloc[18]=male_wheel_New_York[3].iloc[18].replace('.',':')

In [151]:
male_wheel_New_York.tail()

Unnamed: 0,0,1,2,3,4
14,2013,Marcel Hug,Switzerland,1:40:14,
15,2014,Kurt Fearnley,Australia,1:30:55,5th victory (Note that the course was shortene...
16,2015,Ernst van Dyk,South Africa,1:30:54,2nd victory
17,2016,Marcel Hug,Switzerland,1:35:44,2nd victory
18,2017,Marcel Hug,Switzerland,1:37:17,3rd victory


We can now standardise the wheelchair results as usual.

In [152]:
male_wheel_New_York=standardise_table(male_wheel_New_York)
female_wheel_New_York=standardise_table(female_wheel_New_York)

male_wheel_New_York.tail()

Unnamed: 0_level_0,Winner,Country,Time,Notes
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013,Marcel Hug,Switzerland,01:40:14,
2014,Kurt Fearnley,Australia,01:30:55,5th victory (Note that the course was shortene...
2015,Ernst van Dyk,South Africa,01:30:54,2nd victory
2016,Marcel Hug,Switzerland,01:35:44,2nd victory
2017,Marcel Hug,Switzerland,01:37:17,3rd victory


The final step with the NYC results is to remove the line for the 2012 results which was cancelled due to Hurricane Sandy.

In [153]:
male_elite_New_York=male_elite_New_York.drop(['2012'])
female_elite_New_York=female_elite_New_York.drop(['2012'])
male_wheel_New_York=male_wheel_New_York.drop(['2012'])
female_wheel_New_York=female_wheel_New_York.drop(['2012'])

male_elite_New_York.tail(10)

Unnamed: 0_level_0,Winner,Country,Time,Notes
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2007,Martin Lel,Kenya,02:09:04,2nd victory
2008,Marílson Gomes dos Santos,Brazil,02:08:43,2nd victory
2009,Meb Keflezighi,United States,02:09:15,
2010,Gebregziabher Gebremariam,Ethiopia,02:08:14,
2011,Geoffrey Mutai,Kenya,02:05:05,Course record
2013,Geoffrey Mutai,Kenya,02:08:24,2nd victory
2014,Wilson Kipsang,Kenya,02:10:59,
2015,Stanley Biwott,Kenya,02:10:34,
2016,Ghirmay Ghebreslassie,Eritrea,02:07:51,
2017,Geoffrey Kamworor,Kenya,02:10:53,


## Tokyo Marathon

https://en.wikipedia.org/wiki/Tokyo_Marathon

In [54]:
Tokyo_page.summary

'The Tokyo Marathon (東京マラソン, Tokyo Marason) is an annual marathon sporting event in Tokyo, the capital of Japan. It is an IAAF Gold Label marathon and one of the six World Marathon Majors. The latest edition of the race took place on 25 February 2018. It is sponsored by Tokyo Metro.'

In [65]:
html = wp.page("Tokyo Marathon").html()
Table = pd.read_html(html)[1]

In [66]:
Table

Unnamed: 0,0,1,2,3,4,5,6
0,Year,Men's winner,Country,Time (m:s),Women's winner,Country,Time (m:s)
1,2018,Dickson Chumba,Kenya,2:05:30,Birhane Dibaba,Ethiopia,2:19:51
2,2017[8],Wilson Kipsang,Kenya,2:03:58,Sarah Chepchirchir,Kenya,2:19:47
3,2016,Feyisa Lilesa,Ethiopia,2:06:56,Helah Kiprop,Kenya,2:21:27
4,2015,Endeshaw Negesse,Ethiopia,2:06:00,Birhane Dibaba,Ethiopia,2:23:15
5,2014,Dickson Chumba,Kenya,2:05:42,Tirfi Tsegaye,Ethiopia,2:22:23
6,2013,Dennis Kimetto,Kenya,2:06:50,Aberu Kebede,Ethiopia,2:25:34
7,2012,Michael Kipyego,Kenya,2:07:37,Atsede Habtamu,Ethiopia,2:25:28
8,2011,Hailu Mekonnen,Ethiopia,2:07:35,Noriko Higuchi [9],Japan,2:28:49
9,2010,Masakazu Fujiwara,Japan,2:12:19,Alevtina Biktimirova,Russia,2:34:39


Lets start by removing that citation in 2017 and then standardising the table.

In [67]:
Table[0].iloc[2]='2017'

In [68]:
Table

Unnamed: 0,0,1,2,3,4,5,6
0,Year,Men's winner,Country,Time (m:s),Women's winner,Country,Time (m:s)
1,2018,Dickson Chumba,Kenya,2:05:30,Birhane Dibaba,Ethiopia,2:19:51
2,2017,Wilson Kipsang,Kenya,2:03:58,Sarah Chepchirchir,Kenya,2:19:47
3,2016,Feyisa Lilesa,Ethiopia,2:06:56,Helah Kiprop,Kenya,2:21:27
4,2015,Endeshaw Negesse,Ethiopia,2:06:00,Birhane Dibaba,Ethiopia,2:23:15
5,2014,Dickson Chumba,Kenya,2:05:42,Tirfi Tsegaye,Ethiopia,2:22:23
6,2013,Dennis Kimetto,Kenya,2:06:50,Aberu Kebede,Ethiopia,2:25:34
7,2012,Michael Kipyego,Kenya,2:07:37,Atsede Habtamu,Ethiopia,2:25:28
8,2011,Hailu Mekonnen,Ethiopia,2:07:35,Noriko Higuchi [9],Japan,2:28:49
9,2010,Masakazu Fujiwara,Japan,2:12:19,Alevtina Biktimirova,Russia,2:34:39


In [69]:
def standardise_table(df):
    df[2].iloc[0]='Male Country'
    df[3].iloc[0]='Male Time'
    df[5].iloc[0]='Female Country'
    df[6].iloc[0]='Female Time'
    df.columns = df.iloc[0]
    df=df.reindex(df.index.drop(0))
    df=df.set_index('Year')
    df['Male Time']=pd.to_datetime(df['Male Time'],format='%H:%M:%S',errors='coerce').dt.time
    df['Female Time']=pd.to_datetime(df['Female Time'],format='%H:%M:%S',errors='coerce').dt.time
    return(df)

In [70]:
all_elite_Tokyo=standardise_table(Table)
all_elite_Tokyo

Unnamed: 0_level_0,Men's winner,Male Country,Male Time,Women's winner,Female Country,Female Time
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018,Dickson Chumba,Kenya,02:05:30,Birhane Dibaba,Ethiopia,02:19:51
2017,Wilson Kipsang,Kenya,02:03:58,Sarah Chepchirchir,Kenya,02:19:47
2016,Feyisa Lilesa,Ethiopia,02:06:56,Helah Kiprop,Kenya,02:21:27
2015,Endeshaw Negesse,Ethiopia,02:06:00,Birhane Dibaba,Ethiopia,02:23:15
2014,Dickson Chumba,Kenya,02:05:42,Tirfi Tsegaye,Ethiopia,02:22:23
2013,Dennis Kimetto,Kenya,02:06:50,Aberu Kebede,Ethiopia,02:25:34
2012,Michael Kipyego,Kenya,02:07:37,Atsede Habtamu,Ethiopia,02:25:28
2011,Hailu Mekonnen,Ethiopia,02:07:35,Noriko Higuchi [9],Japan,02:28:49
2010,Masakazu Fujiwara,Japan,02:12:19,Alevtina Biktimirova,Russia,02:34:39
2009,Salim Kipsang,Kenya,02:10:27,Mizuho Nasukawa,Japan,02:25:38


We can now split the table to male and female winners.

In [71]:
male_elite_Tokyo=all_elite_Tokyo[['Men\'s winner','Male Country','Male Time']]
male_elite_Tokyo=male_elite_Tokyo.rename(index=str, columns={"Men\'s winner": "Athlete", "Male Country": "Country", "Male Time":"Time"})
male_elite_Tokyo

Unnamed: 0_level_0,Athlete,Country,Time
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018,Dickson Chumba,Kenya,02:05:30
2017,Wilson Kipsang,Kenya,02:03:58
2016,Feyisa Lilesa,Ethiopia,02:06:56
2015,Endeshaw Negesse,Ethiopia,02:06:00
2014,Dickson Chumba,Kenya,02:05:42
2013,Dennis Kimetto,Kenya,02:06:50
2012,Michael Kipyego,Kenya,02:07:37
2011,Hailu Mekonnen,Ethiopia,02:07:35
2010,Masakazu Fujiwara,Japan,02:12:19
2009,Salim Kipsang,Kenya,02:10:27


In [72]:
female_elite_Tokyo=all_elite_Tokyo[['Women\'s winner','Female Country','Female Time']]
female_elite_Tokyo=female_elite_Tokyo.rename(index=str, columns={"Women\'s winner": "Athlete", "Female Country": "Country", "Female Time":"Time"})
female_elite_Tokyo

Unnamed: 0_level_0,Athlete,Country,Time
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018,Birhane Dibaba,Ethiopia,02:19:51
2017,Sarah Chepchirchir,Kenya,02:19:47
2016,Helah Kiprop,Kenya,02:21:27
2015,Birhane Dibaba,Ethiopia,02:23:15
2014,Tirfi Tsegaye,Ethiopia,02:22:23
2013,Aberu Kebede,Ethiopia,02:25:34
2012,Atsede Habtamu,Ethiopia,02:25:28
2011,Noriko Higuchi [9],Japan,02:28:49
2010,Alevtina Biktimirova,Russia,02:34:39
2009,Mizuho Nasukawa,Japan,02:25:38


## Berlin Marathon

https://en.wikipedia.org/wiki/Berlin_Marathon

In [86]:
Berlin_page.summary

"The Berlin Marathon (branded BMW Berlin Marathon for sponsorship reasons) is a major running and sporting event held annually in Berlin, Germany. The official marathon distance of 42.195 kilometers (26 miles 385 yards) is set up as a citywide road race where professional athletes and amateur runners jointly participate. Initiated in 1974, the event traditionally takes place on the last weekend in September, with the exceptions of 2000, because of a conflict with the Olympic Marathon date, and 2018, held two weeks earlier due to Day of German Unity preparations.There have been several title sponsors in the race's history. From 1974 until 1989 it was just the Berlin Marathon. In 1990 it was the Yanase Berlin Marathon. In 1991 and 1992 it was the Canon Berlin Marathon. It reverted to simply the Berlin Marathon from 1993 until 1997. It then became the Alberto Berlin Marathon in 1998 and 1999. A new title sponsor changed the name to real,- Berlin Marathon from 2000-2010. Finally this has b

In [87]:
html = wp.page("Berlin Marathon").html()
Table = pd.read_html(html)[3]

In [88]:
Table.head()

Unnamed: 0,0,1,2,3,4,5,6,7
0,Edition,Date,Male winner,Country,Time (h:m:s),Female winner,Country,Time (h:m:s)
1,45,16 September 2018,Eliud Kipchoge,Kenya,2:01:39 WR,Gladys Cherono,Kenya,2:18:11
2,44,24 September 2017,Eliud Kipchoge,Kenya,2:03:32,Gladys Cherono,Kenya,2:20:23
3,43,25 September 2016,Kenenisa Bekele,Ethiopia,2:03:03,Aberu Kebede,Ethiopia,2:20:45
4,42,27 September 2015,Eliud Kipchoge,Kenya,2:04:00,Gladys Cherono,Kenya,2:19:25


Berlin is a historically fast marathon where athletes often run for a world attempt. As such the results table marks each time a world record is broken with a 'WR' prefix in the time column. Lets remove those first.

In [89]:
def remove_wr(string):
    if string.endswith('WR'):
        return(string.replace('WR','').strip())
    else:
        return(string)

In [91]:
Table[4]=Table[4].apply(remove_wr)
Table[7]=Table[7].apply(remove_wr)
Table.head()

Unnamed: 0,0,1,2,3,4,5,6,7
0,Edition,Date,Male winner,Country,Time (h:m:s),Female winner,Country,Time (h:m:s)
1,45,16 September 2018,Eliud Kipchoge,Kenya,2:01:39,Gladys Cherono,Kenya,2:18:11
2,44,24 September 2017,Eliud Kipchoge,Kenya,2:03:32,Gladys Cherono,Kenya,2:20:23
3,43,25 September 2016,Kenenisa Bekele,Ethiopia,2:03:03,Aberu Kebede,Ethiopia,2:20:45
4,42,27 September 2015,Eliud Kipchoge,Kenya,2:04:00,Gladys Cherono,Kenya,2:19:25


Now let's standardise the table

In [93]:
def standardise_table(df):
    df[4].iloc[0]='Male Time'
    df[3].iloc[0]='Male Country'
    df[7].iloc[0]='Female Time'
    df[6].iloc[0]='Female Country'
    df.columns = df.iloc[0]
    df=df.reindex(df.index.drop(0))
    df['Male Time']=pd.to_datetime(df['Male Time'],format='%H:%M:%S',errors='coerce').dt.time
    df['Female Time']=pd.to_datetime(df['Female Time'],format='%H:%M:%S',errors='coerce').dt.time
    return(df)

In [94]:
all_elite_Berlin=standardise_table(Table)
all_elite_Berlin.head()

Unnamed: 0,Edition,Date,Male winner,Male Country,Male Time,Female winner,Female Country,Female Time
1,45,16 September 2018,Eliud Kipchoge,Kenya,02:01:39,Gladys Cherono,Kenya,02:18:11
2,44,24 September 2017,Eliud Kipchoge,Kenya,02:03:32,Gladys Cherono,Kenya,02:20:23
3,43,25 September 2016,Kenenisa Bekele,Ethiopia,02:03:03,Aberu Kebede,Ethiopia,02:20:45
4,42,27 September 2015,Eliud Kipchoge,Kenya,02:04:00,Gladys Cherono,Kenya,02:19:25
5,41,28 September 2014,Dennis Kimetto,Kenya,02:02:57,Tirfi Tsegaye,Ethiopia,02:20:18


As with the Chicago marathon results, we don't need the full date, just the year so let's retrieve that now.

In [92]:
def get_year(date):
    return(date[len(date)-4:])

In [95]:
all_elite_Berlin['Year']=all_elite_Berlin['Date'].apply(get_year)
all_elite_Berlin.head()

Unnamed: 0,Edition,Date,Male winner,Male Country,Male Time,Female winner,Female Country,Female Time,Year
1,45,16 September 2018,Eliud Kipchoge,Kenya,02:01:39,Gladys Cherono,Kenya,02:18:11,2018
2,44,24 September 2017,Eliud Kipchoge,Kenya,02:03:32,Gladys Cherono,Kenya,02:20:23,2017
3,43,25 September 2016,Kenenisa Bekele,Ethiopia,02:03:03,Aberu Kebede,Ethiopia,02:20:45,2016
4,42,27 September 2015,Eliud Kipchoge,Kenya,02:04:00,Gladys Cherono,Kenya,02:19:25,2015
5,41,28 September 2014,Dennis Kimetto,Kenya,02:02:57,Tirfi Tsegaye,Ethiopia,02:20:18,2014


Now let's set the index as the year and drop the unnecessary date column

In [96]:
def set_index(df):
    df=df.set_index('Year')
    df=df.drop(['Date'],axis=1)
    return(df)

In [97]:
all_elite_Berlin=set_index(all_elite_Berlin)
all_elite_Berlin.head()

Unnamed: 0_level_0,Edition,Male winner,Male Country,Male Time,Female winner,Female Country,Female Time
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018,45,Eliud Kipchoge,Kenya,02:01:39,Gladys Cherono,Kenya,02:18:11
2017,44,Eliud Kipchoge,Kenya,02:03:32,Gladys Cherono,Kenya,02:20:23
2016,43,Kenenisa Bekele,Ethiopia,02:03:03,Aberu Kebede,Ethiopia,02:20:45
2015,42,Eliud Kipchoge,Kenya,02:04:00,Gladys Cherono,Kenya,02:19:25
2014,41,Dennis Kimetto,Kenya,02:02:57,Tirfi Tsegaye,Ethiopia,02:20:18


Finally, let's split the table for male and female winners.

In [99]:
male_elite_Berlin=all_elite_Berlin[['Male winner','Male Country','Male Time']]
male_elite_Berlin=male_elite_Berlin.rename(index=str, columns={"Male winner": "Athlete", "Male Country": "Country", "Male Time":"Time"})
male_elite_Berlin.head()

Unnamed: 0_level_0,Athlete,Country,Time
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018,Eliud Kipchoge,Kenya,02:01:39
2017,Eliud Kipchoge,Kenya,02:03:32
2016,Kenenisa Bekele,Ethiopia,02:03:03
2015,Eliud Kipchoge,Kenya,02:04:00
2014,Dennis Kimetto,Kenya,02:02:57


In [100]:
female_elite_Berlin=all_elite_Berlin[['Female winner','Female Country','Female Time']]
female_elite_Berlin=female_elite_Berlin.rename(index=str, columns={"Female winner": "Athlete", "Female Country": "Country", "Female Time":"Time"})
female_elite_Berlin.head()

Unnamed: 0_level_0,Athlete,Country,Time
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018,Gladys Cherono,Kenya,02:18:11
2017,Gladys Cherono,Kenya,02:20:23
2016,Aberu Kebede,Ethiopia,02:20:45
2015,Gladys Cherono,Kenya,02:19:25
2014,Tirfi Tsegaye,Ethiopia,02:20:18


Finally, let's write these processed results tables to file.

In [101]:
elite=['London','Boston','Chicago','New_York','Tokyo','Berlin']
wheel=['London','Boston','Chicago','New_York']

In [143]:
male_elite_London.to_csv("../data/Male_Elite_London.csv")
female_elite_London.to_csv("../data/Female_Elite_London.csv")

male_elite_Boston.to_csv("../data/Male_Elite_Boston.csv")
female_elite_Boston.to_csv("../data/Female_Elite_Boston.csv")

male_elite_Chicago.to_csv("../data/Male_Elite_Chicago.csv")
female_elite_Chicago.to_csv("../data/Female_Elite_Chicago.csv")

male_elite_New_York.to_csv("../data/Male_Elite_New_York.csv")
female_elite_New_York.to_csv("../data/Female_Elite_New_York.csv")

male_elite_Tokyo.to_csv("../data/Male_Elite_Tokyo.csv")
female_elite_Tokyo.to_csv("../data/Female_Elite_Tokyo.csv")

male_elite_Berlin.to_csv("../data/Male_Elite_Berlin.csv")
female_elite_Berlin.to_csv("../data/Female_Elite_Berlin.csv")

In [154]:
male_wheel_London.to_csv("../data/Male_Wheelchair_London.csv")
female_wheel_London.to_csv("../data/Female_Wheelchair_London.csv")

male_wheel_Boston.to_csv("../data/Male_Wheelchair_Boston.csv")
female_wheel_Boston.to_csv("../data/Female_Wheelchair_Boston.csv")

male_wheel_Chicago.to_csv("../data/Male_Wheelchair_Chicago.csv")
female_wheel_Chicago.to_csv("../data/Female_Wheelchair_Chicago.csv")

male_wheel_New_York.to_csv("../data/Male_Wheelchair_New_York.csv")
female_wheel_New_York.to_csv("../data/Female_Wheelchair_New_York.csv")

To Do 
- Fix WR times not pulling through for Chicago
- Fix Marcel Hug's 2017 NY winning time typo