# World Marathon Majors - Winners

The world marathon majors consist of six major city marathons (https://en.wikipedia.org/wiki/World_Marathon_Majors):

List of all historic winners can be found via their individual wikipedia pages:

    - Tokyo (https://en.wikipedia.org/wiki/Tokyo_Marathon)
    - Boston (https://en.wikipedia.org/wiki/List_of_winners_of_the_Boston_Marathon)
    - London (https://en.wikipedia.org/wiki/List_of_winners_of_the_London_Marathon)
    - Berlin (https://en.wikipedia.org/wiki/Berlin_Marathon)
    - Chicago (https://en.wikipedia.org/wiki/List_of_winners_of_the_Chicago_Marathon)
    - New York (https://en.wikipedia.org/wiki/List_of_winners_of_the_New_York_City_Marathon)
    
Using the Wikipedia API, lets see if we can collect and compile a list of winners for each race on both the male and female runners and wheelchair athletes races and process to an easy to read form.

In [32]:
import pandas as pd
import wikipedia as wp

### Find each page and make sure each exists

In [35]:
Tokyo_page=wp.page('Tokyo Marathon')
Boston_page=wp.page('List of winners of the Boston Marathon')
London_page=wp.page('List of winners of the London Marathon')
Berlin_page=wp.page('Berlin Marathon')
Chicago_page=wp.page('List of winners of the Chicago Marathon')
New_York_page=wp.page('List of winners of the New York City Marathon')

pages=[Tokyo_page,Boston_page,London_page,Berlin_page,Chicago_page,New_York_page]

for page in pages:
    print(page.title)

Tokyo Marathon
List of winners of the Boston Marathon
List of winners of the London Marathon
Berlin Marathon
List of winners of the Chicago Marathon
List of winners of the New York City Marathon


For each page, we'll use the pandas read_html() method to get all relevant tables.

We first need to identify which tables we need by manually inspecting the wikipedia pages for each race.

## The London Marathon

https://en.wikipedia.org/wiki/List_of_winners_of_the_London_Marathon

In [36]:
London_page.summary

"The London Marathon, one of the six World Marathon Majors, has been contested by men and women annually since 29 March 1981. Set over a largely flat course around the River Thames, the marathon is 26.2 miles (42.2 km) in length and generally regarded as a competitive and unpredictable event, and conducive to fast times.The inaugural marathon had 7,741 entrants, 6,255 of whom completed the race. The first Men's Elite Race was tied between American Dick Beardsley and Norwegian Inge Simonsen, who crossed the finish line holding hands in 2 hours, 11 minutes, 48 seconds. The first Women's Elite Race was won by Briton Joyce Smith in 2:29:57. In 1983, the first wheelchair races took place. Organized by the British Sports Association for the Disabled (BASD), 19 people competed and 17 finished. Gordon Perry of the United Kingdom won the Men's Wheelchair Race, coming in at 3:20:07, and Denise Smith, also of the UK, won the Women's Wheelchair Race in 4:29:03.Twenty athletes representing the Unit

By manually inspecting this page, we see that the results we're looking for are the first to fourth tables on the page.

In [101]:
html = wp.page("List of winners of the London Marathon").html().encode("UTF-8")
male_elite_London = pd.read_html(html)[0]
female_elite_London = pd.read_html(html)[1]
male_wheel_London = pd.read_html(html)[2]
female_wheel_London = pd.read_html(html)[3]

In [102]:
male_elite_London.head()

Unnamed: 0,0,1,2,3,4
0,Year,Athlete,Nationality,Time(h:m:s),Notes
1,1981,Dick Beardsley (Tie),United States,2:11:48,Course record
2,Inge Simonsen (Tie),Norway,,,
3,1982,Hugh Jones,United Kingdom,2:09:24,Course record
4,1983,Mike Gratton,United Kingdom,2:09:43,


### Cleaning up

London is famous for having a tie in the men's race on its first edition in 1981. This is reflected in the first table taken from the wiki page so let's clean that up first. It's only one instance, so let's clean this manually.

In [103]:
male_elite_London.iloc[1][1]='Dick Beardsley and Inge Simonson (Tie)'
male_elite_London.iloc[1][2]='United States and Norway'
male_elite_London=male_elite_London.reindex(male_elite_London.index.drop(2)).reset_index(drop=True)
male_elite_London.head()

Unnamed: 0,0,1,2,3,4
0,Year,Athlete,Nationality,Time(h:m:s),Notes
1,1981,Dick Beardsley and Inge Simonson (Tie),United States and Norway,2:11:48,Course record
2,1982,Hugh Jones,United Kingdom,2:09:24,Course record
3,1983,Mike Gratton,United Kingdom,2:09:43,
4,1984,Charlie Spedding,United Kingdom,2:09:57,


Now that initial cleaning is done, let's standardise these tables somewhat making the year the index and change the column headings out of the first row.

In [90]:
def standardise_table(df):
    df.columns = df.iloc[0]
    df=df.reindex(df.index.drop(0))
    df.set_index('Year')
    return(df)

In [109]:
male_elite_London=standardise_table(male_elite_London)
female_elite_London=standardise_table(female_elite_London)
male_wheel_London=standardise_table(male_wheel_London)
female_wheel_London=standardise_table(female_wheel_London)

male_elite_London.head()

Unnamed: 0,Year,Athlete,Nationality,Time(h:m:s),Notes
1,1981,Dick Beardsley and Inge Simonson (Tie),United States and Norway,2:11:48,Course record
2,1982,Hugh Jones,United Kingdom,2:09:24,Course record
3,1983,Mike Gratton,United Kingdom,2:09:43,
4,1984,Charlie Spedding,United Kingdom,2:09:57,
5,1985,Steve Jones,United Kingdom,2:08:16,Course record
