# BW \#77 Paris Olympics
## Data and six questions
We'll look at some data coming grom the Olympic games in France.
- The main data will come from Codante (https://codante.io/), a Brazilian company that has provided a free API to Olympic information. This is free, so it's limited to 100 requests/minute (which should be more than enough). The API is documented at https://docs.apis.codante.io/olympic-games-english .
- Geographic data about the Olympic venues come from Clockwork Micro (https://www.clockworkmicro.com/), which made shapefiles available in a GitHub repository at https://github.com/clockworkmicro/parisolympics2024 .

We'll also use the pycountry package on PyPI (https://pypi.org/project/pycountry/).

## Challenges
The learning goals include retrieving data from APIs, working with data from different sources, grouping, applying functions to a data frame and also some work with `GeoPandas`. 
- Using the API from apis.codante.io, download all of the per-country medal information. As of this writing, the country API has a total of five pages to download; you'll want to combine them into a single data frame. Set the index to be the 3-letter country ID.
- What countries don't seem to have any continent? What's the deal with them?


In [2]:
import pandas as pd
import requests # To perform API calls
import pycountry

APIs = Application Programming Interface is a way for 2 or more computer programs or components to communicate with each other. 

The easiest way to retrieve Olympics data via the API is with requests, since I can just say `requests.get(URL)`, for a given URL. In the case of this API, though, we'll need to retrieve five pages of data, with each page (according to the API documentation) specified by passing a page name-value pair along with an integer.

We can do that with requests by passing not only the URL, but also {'pages':1}, a dict containing the key-value pairs we want to add to our request. The integer passed along with 'pages' will have to change, with values 1-5, as we retrieve each page of results.

The results themselves will come as JSON. Fortunately, we can easily turn most JSON data into a data frame by simply passing it to `DataFrame`. We'll thus end up with one data frame for each page. If we create a list of data frames, we can then combine them into a single one with `pd.concat`.

Let's start by setting up a base URL and an empty list, all_data, where we'll collect the data frames:

In [3]:
url_base = 'https://apis.codante.io/olympic-games'
all_data = []

for page_number in range(1,6):
    print(f'Getting page {page_number}')
    r = requests.get(f'{url_base}/countries', {'page': page_number})
    all_data.append(pd.DataFrame(r.json()['data']))

Getting page 1
Getting page 2
Getting page 3
Getting page 4
Getting page 5


Notice that when we get a response back from `requests`, we can invoke `json` on it to get Python data structures (lists and dicts). I originally tried to invoke DataFrame directly on the result of invoking `r.json()`, but saw that the actual data was under the 'data' dict key. So I ran ` DataFrame(r.json()['data'])`, giving me a data frame; I then appended it to all_data.

In [4]:
df = pd.concat(all_data).set_index('id')
df

Unnamed: 0_level_0,name,continent,flag_url,gold_medals,silver_medals,bronze_medals,total_medals,rank,rank_total_medals
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
CHN,China,ASI,https://codante.s3.amazonaws.com/codante-apis/...,12,7,7,26,1,3
USA,EUA,AME,https://codante.s3.amazonaws.com/codante-apis/...,9,16,14,39,2,1
FRA,França,EUR,https://codante.s3.amazonaws.com/codante-apis/...,8,11,9,28,3,2
GBR,Grã-Bretanha,EUR,https://codante.s3.amazonaws.com/codante-apis/...,8,8,8,24,4,4
AUS,Austrália,OCE,https://codante.s3.amazonaws.com/codante-apis/...,8,6,5,19,5,5
...,...,...,...,...,...,...,...,...,...
ZAM,Zâmbia,AFR,https://codante.s3.amazonaws.com/codante-apis/...,0,0,0,0,0,0
ZIM,Zimbábue,AFR,https://codante.s3.amazonaws.com/codante-apis/...,0,0,0,0,0,0
EOR,EOR,,https://codante.s3.amazonaws.com/codante-apis/...,0,0,0,0,0,0
AIN,AIN,,https://codante.s3.amazonaws.com/codante-apis/...,0,0,0,0,0,0


### What countries don't seem to have any continent? What's the deal with them?

In [11]:
df['continent'].unique()

array(['ASI', 'AME', 'EUR', 'OCE', 'AFR', '', '-'], dtype=object)

In [9]:
df[(df['continent'] == '') | (df['continent'] == '-')]

Unnamed: 0_level_0,name,continent,flag_url,gold_medals,silver_medals,bronze_medals,total_medals,rank,rank_total_medals
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
EOR,EOR,,https://codante.s3.amazonaws.com/codante-apis/...,0,0,0,0,0,0
AIN,AIN,,https://codante.s3.amazonaws.com/codante-apis/...,0,0,0,0,0,0
SAM,Samoa,-,https://codante.s3.amazonaws.com/codante-apis/...,0,0,0,0,0,0


Correction

While exploring this data set, I decided to see how many countries are on each continent:



In [10]:
df['continent'].value_counts()

AFR    53
EUR    47
ASI    44
AME    41
OCE    15
        2
-       1
Name: continent, dtype: int64

In [12]:
df.loc[lambda df_:df_['continent'].isin(['', '-']),['continent', 'name']]

Unnamed: 0_level_0,continent,name
id,Unnamed: 1_level_1,Unnamed: 2_level_1
EOR,,EOR
AIN,,AIN
SAM,-,Samoa


In the above code, I use loc to retrieve a subset of rows and columns

For the row selector, I used lambda, creating an anonymous function that takes a single argument, a data frame. We then, inside of the function, run isin(['', '-']) on the data frame, getting a boolean series back. The series is True when the continent is either an empty string or just -. Specifying a boolean series in this way is often more natural and flexible than other methods.

For the column selector, I pass a list of strings, the names of the columns we want to see.

We thus see that three of the teams competing in the Olympics have no continent. Which are they?

- Samoa, a country that I think would classify itself as being in Oceania, much like Australia and New Zealand. I'm guessing (hoping) that the data I got was just a glitch.
- EOR, a team of Olympic athletes who are refugees. This team has participated in Olympic games since 2016, and was previously known as ROT, for "Refugee Olympic Team." France insisted that the team be known by its French initials this time around, and the acronym for "Équipe olympique des réfugiés" is EOR.
- AIN, a team of Russian and Belorusian athletes, since their national teams were banned in the wake of Russia's invasion of Ukraine. In English, they were known as INA ("Individual Neutral Athletes"), but this year they are known by AIN, the French initials of "Athlètes Individuels Neutres."