## Exercise 1: Exceptional Olympians

Scrape data from [this wikipedia site](https://en.wikipedia.org/wiki/List_of_multiple_Olympic_medalists) about exceptional Olympic medalists. 

1. Download the html using urllib. 
2. Parse this html with BeautifulSoup.
3. Extract the html that corresponds to the big table from the soup.
4. Parse the table into a pandas dataframe. Hint: both the "No." and the "Total." column use row-spans which are tricky to parse, both with a pandas reader and manually. For the purpose of this exercise, exclude all rows that are not easy to parse (the first one is Bjørn Dæhlie).
5. Create a table that shows for each country how many gold, silver, bronze, and total medals it won in that list.

In [3]:
from bs4 import BeautifulSoup
import urllib.request
import pandas as pd

Download the html using urllib. 
Parse this html with BeautifulSoup.

In [4]:
url = "https://en.wikipedia.org/wiki/List_of_multiple_Olympic_medalists"

req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
    html = response.read()

print(html)

class_soup = BeautifulSoup(html, 'html.parser')

b'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>List of multiple Olympic medalists - Wikipedia</title>\n<script>document.documentElement.className = document.documentElement.className.replace( /(^|\\s)client-nojs(\\s|$)/, "$1client-js$2" );</script>\n<script>(window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_multiple_Olympic_medalists","wgTitle":"List of multiple Olympic medalists","wgCurRevisionId":825763339,"wgRevisionId":825763339,"wgArticleId":18855244,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages using citations with format and no URL","Articles with hCards","Incomplete lists from May 2012","Lists of Olympic medalists","Olympic Games medal tables"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSepara

Extract the html that corresponds to the big table from the soup.

In [5]:
# we can retreive all tables, our desired table is the first one:
table_html = class_soup("table")[0]
table_html

<table class="wikitable sortable">
<tr>
<th>No.</th>
<th style="width:7.8em;">Athlete</th>
<th style="width:8.2em;">Nation</th>
<th style="width:5.6em;">Sport</th>
<th>Years</th>
<th>Games</th>
<th>Gender</th>
<th style="background-color:gold; width:3.5em; font-weight:bold;">Gold</th>
<th style="background-color:silver; width:3.5em; font-weight:bold;">Silver</th>
<th style="background-color:#cc9966; width:3.5em; font-weight:bold;">Bronze</th>
<th style="width:3.5em;">Total</th>
</tr>
<tr>
<td>1</td>
<td align="left"><span class="sortkey">Phelps, Michael</span><span class="vcard"><span class="fn"><a href="/wiki/Michael_Phelps" title="Michael Phelps">Michael Phelps</a></span></span></td>
<td align="left"><img alt="" class="thumbborder" data-file-height="650" data-file-width="1235" height="12" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/22px-Flag_of_the_United_States.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_St

Parse the table into a pandas dataframe. 

In [8]:
athletes = pd.read_html(str(table_html), header=0)[0]
athletes.head(15)

Unnamed: 0,No.,Athlete,Nation,Sport,Years,Games,Gender,Gold,Silver,Bronze,Total
0,1,"Phelps, MichaelMichael Phelps",United States,Swimming,2004–2016,Summer,M,23,3,2.0,28.0
1,2,"Latynina, LarisaLarisa Latynina",Soviet Union,Gymnastics,1956–1964,Summer,F,9,5,4.0,18.0
2,3,"Andrianov, NikolaiNikolai Andrianov",Soviet Union,Gymnastics,1972–1980,Summer,M,7,5,3.0,15.0
3,4,"Bjorndalen, Ole EinarOle Einar Bjørndalen",Norway,Biathlon,1998–2014,Winter,M,8,4,1.0,13.0
4,5,"Shakhlin, BorisBoris Shakhlin",Soviet Union,Gymnastics,1956–1964,Summer,M,7,4,2.0,
5,6,"Mangiarotti, EdoardoEdoardo Mangiarotti",Italy,Fencing,1936–1960,Summer,M,6,5,2.0,
6,7,"Ono, TakashiTakashi Ono",Japan,Gymnastics,1952–1964,Summer,M,5,4,4.0,
7,8,"Nurmi, PaavoPaavo Nurmi",Finland,Athletics,1920–1928,Summer,M,9,3,0.0,12.0
8,9,"Fischer, BirgitBirgit Fischer",East Germany Germany,Canoeing,1980–2004,Summer,F,8,4,0.0,
9,"Dahlie, BjornBjørn Dæhlie",Norway,Cross-country skiing,1992–1998,Winter,M,8,4,0,,


For cases where the row is screwed up, the bronze column is NaN, which we can use to filter:

In [9]:
athletes = athletes[pd.notnull(athletes["Bronze"])]

Subset to the relevant columns: 

In [10]:
athletes = athletes[["Nation", "Gold", "Silver", "Bronze"]]
athletes.head(10)

Unnamed: 0,Nation,Gold,Silver,Bronze
0,United States,23,3,2.0
1,Soviet Union,9,5,4.0
2,Soviet Union,7,5,3.0
3,Norway,8,4,1.0
4,Soviet Union,7,4,2.0
5,Italy,6,5,2.0
6,Japan,5,4,4.0
7,Finland,9,3,0.0
8,East Germany Germany,8,4,0.0
10,Japan,8,3,1.0


Grouping, summing, calculating the total, and sorting: 

In [11]:
countries = athletes.groupby("Nation").sum()

In [12]:
countries["Total"] = countries["Gold"] + countries["Silver"] + countries["Bronze"]
countries.sort_values("Total", ascending=False)

Unnamed: 0_level_0,Gold,Silver,Bronze,Total
Nation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
United States,90,36,25.0,151.0
Soviet Union,41,36,18.0,95.0
Italy,23,19,14.0,56.0
Japan,29,12,10.0,51.0
Finland,25,12,13.0,50.0
Hungary,25,14,11.0,50.0
Sweden,20,15,6.0,41.0
Norway,22,13,5.0,40.0
Germany,11,10,8.0,29.0
Netherlands,8,9,2.0,19.0


## Exercise 2 – APIs

Use the [Open Notify API](http://open-notify.org/Open-Notify-API/People-In-Space/) to find out how many people are in space right now.

In [2]:
import requests 

url = "http://api.open-notify.org/astros.json"

r = requests.get(url)
data = r.json()
data

{'message': 'success',
 'number': 6,
 'people': [{'craft': 'ISS', 'name': 'Alexander Misurkin'},
  {'craft': 'ISS', 'name': 'Mark Vande Hei'},
  {'craft': 'ISS', 'name': 'Joe Acaba'},
  {'craft': 'ISS', 'name': 'Anton Shkaplerov'},
  {'craft': 'ISS', 'name': 'Scott Tingle'},
  {'craft': 'ISS', 'name': 'Norishige Kanai'}]}