![](https://upload.wikimedia.org/wikipedia/commons/thumb/7/77/IOC_Logo.svg/2341px-IOC_Logo.svg.png)

# Introduction

Welcome to your first day as an analyst working for the IOC! The IOC is at the very heart of world sport, supporting every Olympic Movement stakeholder, promoting Olympism worldwide, and overseeing the regular celebration of the Olympic Games.

For a moment of glory on the medalist podium, elite athletes dedicate *everything* to their sport. Olympics medalists from 1896 through 2016 comprise the dataset you'll be working with. Who are the youngest and oldest medalists of all time? Are there physical differences between Summer Olympics medalists and Winter Olympics medalists? You're about to use your data coding chops to find out!

You'll start this Milestone assignment by cleaning and filtering the data. So many of your Python skills that you've learned so far will be at play. Are you up for it? Let's go!

### Dataset Description

The dataset is stored in a .csv file named `olympics.csv`. It contains the following columns:

* **ID**: A unique identifying number of each athlete
* **Name**: The name of each athlete
* **Sex**: M or F
* **Age**: The age of an athlete, in years, at the time they competed.
* **Height**: The height of an athlete, in centimeters
* **Weight**: The weight of an athlete, in kilograms
* **Team**: The name of the athlete’s team. Not always the name of a country.
* **NOC**: National Olympic Committee’s 3 letter code
* **Games**: Year and season
* **Season**: Summer or Winter
* **City**: Host city
* **Sport**: The sport or category of an olympic event/activity
* **Event**: specific event within a sport, e.g. Men’s 400 meters breaststroke.
* **Medal**: Gold, Silver, Bronze
* **Region**: Name of athlete’s country



# Task 1: Data Inspection

![](https://media.giphy.com/media/42wQXwITfQbDGKqUP7/giphy.gif)

In [1]:
# import the pandas library
import pandas as pd

In [2]:
# Load in the data
olympics = pd.read_csv('datasets/olympics.csv')

In [3]:
# Preview DataFrame
olympics.head()

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal,region
0,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold,Denmark
1,15,Arvo Ossian Aaltonen,M,30.0,,,Finland,FIN,1920 Summer,1920,Summer,Antwerpen,Swimming,Swimming Men's 200 metres Breaststroke,Bronze,Finland
2,15,Arvo Ossian Aaltonen,M,30.0,,,Finland,FIN,1920 Summer,1920,Summer,Antwerpen,Swimming,Swimming Men's 400 metres Breaststroke,Bronze,Finland
3,16,Juhamatti Tapio Aaltonen,M,28.0,184.0,85.0,Finland,FIN,2014 Winter,2014,Winter,Sochi,Ice Hockey,Ice Hockey Men's Ice Hockey,Bronze,Finland
4,17,Paavo Johannes Aaltonen,M,28.0,175.0,64.0,Finland,FIN,1948 Summer,1948,Summer,London,Gymnastics,Gymnastics Men's Individual All-Around,Bronze,Finland


In [4]:
# Inspect the numbers of rows and columns
olympics.shape

(39783, 16)

In [16]:
# Inspect column names
olympics.columns

Index(['ID', 'Name', 'Sex', 'Age', 'Height', 'Weight', 'Team', 'NOC', 'Games',
       'Year', 'Season', 'City', 'Sport', 'Event', 'Medal', 'region'],
      dtype='object')

In [17]:
# Inspect column data types, memory usage, etc.
olympics.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39783 entries, 0 to 39782
Data columns (total 16 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   ID      39783 non-null  int64  
 1   Name    39783 non-null  object 
 2   Sex     39783 non-null  object 
 3   Age     39051 non-null  float64
 4   Height  31072 non-null  float64
 5   Weight  30456 non-null  float64
 6   Team    39783 non-null  object 
 7   NOC     39783 non-null  object 
 8   Games   39783 non-null  object 
 9   Year    39783 non-null  int64  
 10  Season  39783 non-null  object 
 11  City    39783 non-null  object 
 12  Sport   39783 non-null  object 
 13  Event   39783 non-null  object 
 14  Medal   39783 non-null  object 
 15  region  39774 non-null  object 
dtypes: float64(3), int64(2), object(11)
memory usage: 4.9+ MB


In [18]:
# Display a statistical summary of the data
olympics.describe()

Unnamed: 0,ID,Age,Height,Weight,Year
count,39783.0,39051.0,31072.0,30456.0,39783.0
mean,69407.051806,25.925175,177.554197,73.77068,1973.943845
std,38849.980737,5.914026,10.893723,15.016025,33.822857
min,4.0,10.0,136.0,28.0,1896.0
25%,36494.0,22.0,170.0,63.0,1952.0
50%,68990.0,25.0,178.0,73.0,1984.0
75%,103461.5,29.0,185.0,83.0,2002.0
max,135563.0,73.0,223.0,182.0,2016.0


In [22]:
# What types of medals are there?
olympics['Medal'].unique()
#gold, bronze, and silver

array(['Gold', 'Bronze', 'Silver'], dtype=object)

# Task 2: Data Cleaning

![](https://media.giphy.com/media/10zsjaH4g0GgmY/giphy.gif)

In [25]:
# Rename 'NOC' column to 'CountryCode'
# Rename 'region' column to 'Country'
cols_rename = {
    'NOC' : 'Country Code',
    'region' : 'Country'
}
renamed = olympics.rename(columns=cols_rename)
renamed.head()

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,Country Code,Games,Year,Season,City,Sport,Event,Medal,Country
0,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold,Denmark
1,15,Arvo Ossian Aaltonen,M,30.0,,,Finland,FIN,1920 Summer,1920,Summer,Antwerpen,Swimming,Swimming Men's 200 metres Breaststroke,Bronze,Finland
2,15,Arvo Ossian Aaltonen,M,30.0,,,Finland,FIN,1920 Summer,1920,Summer,Antwerpen,Swimming,Swimming Men's 400 metres Breaststroke,Bronze,Finland
3,16,Juhamatti Tapio Aaltonen,M,28.0,184.0,85.0,Finland,FIN,2014 Winter,2014,Winter,Sochi,Ice Hockey,Ice Hockey Men's Ice Hockey,Bronze,Finland
4,17,Paavo Johannes Aaltonen,M,28.0,175.0,64.0,Finland,FIN,1948 Summer,1948,Summer,London,Gymnastics,Gymnastics Men's Individual All-Around,Bronze,Finland


In [46]:
# Remove the 'Team' column
no_team = renamed.drop(columns=['Team'])
no_team.head()

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Country Code,Games,Year,Season,City,Sport,Event,Medal,Country
0,4,Edgar Lindenau Aabye,M,34.0,,,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold,Denmark
1,15,Arvo Ossian Aaltonen,M,30.0,,,FIN,1920 Summer,1920,Summer,Antwerpen,Swimming,Swimming Men's 200 metres Breaststroke,Bronze,Finland
2,15,Arvo Ossian Aaltonen,M,30.0,,,FIN,1920 Summer,1920,Summer,Antwerpen,Swimming,Swimming Men's 400 metres Breaststroke,Bronze,Finland
3,16,Juhamatti Tapio Aaltonen,M,28.0,184.0,85.0,FIN,2014 Winter,2014,Winter,Sochi,Ice Hockey,Ice Hockey Men's Ice Hockey,Bronze,Finland
4,17,Paavo Johannes Aaltonen,M,28.0,175.0,64.0,FIN,1948 Summer,1948,Summer,London,Gymnastics,Gymnastics Men's Individual All-Around,Bronze,Finland


# Task 3: Data Analysis

![](https://media.giphy.com/media/MT5UUV1d4CXE2A37Dg/giphy.gif)

In [33]:
# What is the youngest age of an Olympics medalist?
olympics.sort_values(by='Age', ascending=True).head(10)
# the youngest age of an olympic medalist is 10 years old! wow.

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal,region
20753,71691,Dimitrios Loundras,M,10.0,,,Ethnikos Gymnastikos Syllogos,GRE,1896 Summer,1896,Summer,Athina,Gymnastics,"Gymnastics Men's Parallel Bars, Teams",Bronze,Greece
11021,40129,Luigina Giavotti,F,11.0,,,Italy,ITA,1928 Summer,1928,Summer,Amsterdam,Gymnastics,Gymnastics Women's Team All-Around,Silver,Italy
36402,125092,tienne Nol Henri Vandernotte,M,12.0,,37.0,France,FRA,1936 Summer,1936,Summer,Berlin,Rowing,Rowing Men's Coxed Pairs,Bronze,France
36403,125092,tienne Nol Henri Vandernotte,M,12.0,,37.0,France,FRA,1936 Summer,1936,Summer,Berlin,Rowing,Rowing Men's Coxed Fours,Bronze,France
36617,125944,Ines Vercesi,F,12.0,,,Italy,ITA,1928 Summer,1928,Summer,Amsterdam,Gymnastics,Gymnastics Women's Team All-Around,Silver,Italy
27872,96664,Dorothy Poynton-Hill (-Teuber),F,12.0,,,United States,USA,1928 Summer,1928,Summer,Amsterdam,Diving,Diving Women's Springboard,Silver,USA
32803,113580,Inge Srensen (-Tabur),F,12.0,,,Denmark,DEN,1936 Summer,1936,Summer,Berlin,Swimming,Swimming Women's 200 metres Breaststroke,Bronze,Denmark
21656,74712,Carla Marangoni,F,12.0,,,Italy,ITA,1928 Summer,1928,Summer,Amsterdam,Gymnastics,Gymnastics Women's Team All-Around,Silver,Italy
29027,100797,Aileen Muriel Riggin (-Soule),F,13.0,142.0,,United States,USA,1920 Summer,1920,Summer,Antwerpen,Diving,Diving Women's Springboard,Gold,USA
11303,41040,Gina Elena Gogean (-Groza),F,13.0,150.0,40.0,Romania,ROU,1992 Summer,1992,Summer,Barcelona,Gymnastics,Gymnastics Women's Team All-Around,Silver,Romania


In [35]:
# What is the oldest age of an Olympics medalist?
olympics.sort_values(by='Age', ascending=False).head(10)
# the oldest olympic medalist was 73 years old

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal,region
6072,22984,John (Herbert Crawford-) Copley (Williamson-),M,73.0,,,Great Britain,GBR,1948 Summer,1948,Summer,London,Art Competitions,"Art Competitions Mixed Painting, Graphic Arts",Silver,UK
8279,30731,Jozu Dupon,M,72.0,,,Belgium,BEL,1936 Summer,1936,Summer,Berlin,Art Competitions,"Art Competitions Mixed Sculpturing, Medals",Bronze,Belgium
33952,117046,Oscar Gomer Swahn,M,72.0,,,Sweden,SWE,1920 Summer,1920,Summer,Antwerpen,Shooting,"Shooting Men's Running Target, Double Shot, Team",Silver,Sweden
21901,75648,Charles William Martin,M,71.0,,,Crabe II-1,FRA,1900 Summer,1900,Summer,Paris,Sailing,Sailing Mixed 0.5-1 Ton,Silver,France
21902,75648,Charles William Martin,M,71.0,,,Crabe II-4,FRA,1900 Summer,1900,Summer,Paris,Sailing,Sailing Mixed 0.5-1 Ton,Bronze,France
12614,45286,Letitia Marion Hamilton,F,69.0,,,Ireland,IRL,1948 Summer,1948,Summer,London,Art Competitions,"Art Competitions Mixed Painting, Paintings",Bronze,Ireland
34707,119650,Oskar Thiede,M,69.0,,,Austria,AUT,1948 Summer,1948,Summer,London,Art Competitions,"Art Competitions Mixed Sculpturing, Medals And...",Silver,Austria
21198,73120,Frederick William MacMonnies,M,68.0,,,United States,USA,1932 Summer,1932,Summer,Los Angeles,Art Competitions,"Art Competitions Mixed Sculpturing, Medals And...",Silver,USA
8340,30932,Samuel Harding Duvall,M,68.0,,,Cincinnati Archers,USA,1904 Summer,1904,Summer,St. Louis,Archery,Archery Men's Team Round,Silver,USA
25165,87135,Louis Noverraz,M,66.0,179.0,78.0,Switzerland,SUI,1968 Summer,1968,Summer,Mexico City,Sailing,Sailing Mixed 5.5 metres,Silver,Switzerland


In [49]:
# How many of each medal were awarded?
n_medals = olympics["Medal"].value_counts()
n_medals

Gold      13372
Bronze    13295
Silver    13116
Name: Medal, dtype: int64

In [39]:
# How many events are there?
olympics["Event"].nunique()

756

In [40]:
# How many sports are there?
olympics["Sport"].nunique()

66

In [41]:
# What is the average age of an Olympics medalist?
olympics["Age"].mean()

25.925174771452717

In [44]:
# Among the 10 oldest medalists, what are the most common sports?
oldest = olympics.sort_values(by='Age', ascending=False).head(10)
oldest["Sport"].value_counts()

Art Competitions    5
Sailing             3
Shooting            1
Archery             1
Name: Sport, dtype: int64

In [107]:
# What are the 10 winningest countries in total medal count?
winningest = renamed['Country'].value_counts().head(10)
winningest

USA          5637
Russia       3947
Germany      3756
UK           2068
France       1777
Italy        1637
Sweden       1536
Canada       1352
Australia    1349
Hungary      1135
Name: Country, dtype: int64

In [108]:
# How many medals have been awarded in the sport of trampolining?
renamed[(renamed['Sport'] == 'Trampolining')].value_counts('Medal')

Medal
Bronze    10
Gold      10
Silver    10
dtype: int64

# Level Up

![](https://media.giphy.com/media/YYaapBJ7UAZp9DJS7o/giphy.gif)

Want to Level Up your practice? We love to see it! Take a crack at some of these extra challenges, including visualizing some of this here data.

In [117]:
# How many gold medals were awarded to the United States?
golds = renamed[(renamed['Country'] == 'USA') & (renamed['Medal'] == 'Gold')].shape
#2638 gold medals

(2638, 16)

In [128]:
# List the Olympics in dataset, starting with the most recent
renamed1 = olympics.sort_values(by='Year', ascending=False)
renamed1

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal,region
36669,126096,Oleh Yuriyovych Verniaiev,M,22.0,161.0,55.0,Ukraine,UKR,2016 Summer,2016,Summer,Rio de Janeiro,Gymnastics,Gymnastics Men's Parallel Bars,Gold,Ukraine
6450,24313,Rsul unayev,M,25.0,171.0,66.0,Azerbaijan,AZE,2016 Summer,2016,Summer,Rio de Janeiro,Wrestling,"Wrestling Men's Welterweight, Greco-Roman",Bronze,Azerbaijan
26155,90631,Mariana Pajn Londoo,F,24.0,158.0,50.0,Colombia,COL,2016 Summer,2016,Summer,Rio de Janeiro,Cycling,Cycling Women's BMX,Gold,Colombia
23612,81588,Domenico Montrone,M,30.0,189.0,97.0,Italy,ITA,2016 Summer,2016,Summer,Rio de Janeiro,Rowing,Rowing Men's Coxless Fours,Bronze,Italy
24396,84527,Daniel Narcisse,M,36.0,189.0,93.0,France,FRA,2016 Summer,2016,Summer,Rio de Janeiro,Handball,Handball Men's Handball,Silver,France
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8083,30049,Dimitrios Drivas,M,,,,Greece,GRE,1896 Summer,1896,Summer,Athina,Swimming,Swimming Men's 100 metres Freestyle For Sailors,Bronze,Greece
16872,59429,Stefanos Khristopoulos,M,,,,Greece,GRE,1896 Summer,1896,Summer,Athina,Wrestling,"Wrestling Men's Unlimited Class, Greco-Roman",Bronze,Greece
3329,12929,John Mary Pius Boland,M,25.0,,,Great Britain,GBR,1896 Summer,1896,Summer,Athina,Tennis,Tennis Men's Singles,Gold,UK
3330,12929,John Mary Pius Boland,M,25.0,,,Great Britain/Germany,GBR,1896 Summer,1896,Summer,Athina,Tennis,Tennis Men's Doubles,Gold,UK


In [133]:
# Average medalist height in the most recent Winter Olympics
heights = renamed1[(renamed1['Height'].mean() == True) & (renamed1['Season'] == 'Winter')].shape
heights

(0, 16)

In [None]:
# Average medalist weight in the most recent Winter Olympics


In [None]:
# Average medalist height in the most recent Summer Olympics


In [None]:
# Average medalist weight in the most recent Summer Olympics


In [None]:
# Import plotly express library


In [None]:
# Assign top 10 winningest countries table to a variable
# You did this in task 3


In [None]:
# Visualize the table as a bar chart
