# Olympic medalists data

In [1]:
import pandas as pd

In [4]:
medals = pd.read_csv("all_medalists.csv")

In [9]:
medals.sample(5)

Unnamed: 0,City,Edition,Sport,Discipline,Athlete,NOC,Gender,Event,Event_gender,Medal
20098,Barcelona,1992,Baseball,Baseball,"HUANG, Wen-Po",TPE,Men,baseball,M,Silver
15305,Moscow,1980,Aquatics,Swimming,"KRUGLOVA, Elena",URS,Women,4x100m medley relay,W,Bronze
19066,Seoul,1988,Hockey,Hockey,"DE BEUS, Bernadette",NED,Women,hockey,W,Bronze
1386,London,1908,Football,Football,"LINDGREEN, August Ludwig",DEN,Men,football,M,Silver
15609,Moscow,1980,Boxing,Boxing,"RYBAKOV, Viktor",URS,Men,54 - 57kg (featherweight),M,Bronze


## Using .value_counts() for ranking
For this exercise, you will use the pandas Series method .value_counts() to determine the top 15 countries ranked by total number of medals.

Notice that .value_counts() sorts by values by default. The result is returned as a Series of counts indexed by unique entries from the original Series with values (counts) ranked in descending order.

The DataFrame has been pre-loaded for you as medals.

- Extract the 'NOC' column from the DataFrame medals and assign the result to country_names. Notice that this Series has repeated entries for every medal (of any type) a country has won in any Edition of the Olympics.
- Create a Series medal_counts by applying .value_counts() to the Series country_names.
- Print the top 15 countries ranked by total number of medals won. This has been done for you, so hit 'Submit Answer' to see the result.

In [10]:
# Select the 'NOC' column of medals: country_names
country_names = medals["NOC"]

In [12]:
# Count the number of medals won by each country: medal_counts
medal_counts = country_names.value_counts()

In [13]:
print(medal_counts.head(15))

USA    4335
URS    2049
GBR    1594
FRA    1314
ITA    1228
GER    1211
AUS    1075
HUN    1053
SWE    1021
GDR     825
NED     782
JPN     704
CHN     679
RUS     638
ROU     624
Name: NOC, dtype: int64


## Using .pivot_table() to count medals by type
Rather than ranking countries by total medals won and showing that list, you may want to see a bit more detail. You can use a pivot table to compute how many separate bronze, silver and gold medals each country won. That pivot table can then be used to repeat the previous computation to rank by total medals won.

In this exercise, you will use .pivot_table() first to aggregate the total medals by type. Then, you can use .sum() along the columns of the pivot table to produce a new column. When the modified pivot table is sorted by the total medals column, you can display the results from the last exercise with a bit more detail.

- Construct a pivot table counted from the DataFrame medals, aggregating by 'count'. Use 'NOC' as the index, 'Athlete' for the values, and 'Medal' for the columns.
- Modify the DataFrame counted by adding a column counted['totals']. The new column 'totals' should contain the result of taking the sum along the columns (i.e., use .sum(axis='columns')).
- Overwrite the DataFrame counted by sorting it with the .sort_values() method. Specify the keyword argument ascending=False.
- Print the first 15 rows of counted using .head(15). This has been done for you, so hit 'Submit Answer' to see the result.

In [14]:
# Construct the pivot table: counted
counted = medals.pivot_table(index="NOC", values="Athlete", columns="Medal", aggfunc="count")

In [15]:
counted

Medal,Bronze,Gold,Silver
NOC,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AFG,1.0,,
AHO,,,1.0
ALG,8.0,4.0,2.0
ANZ,5.0,20.0,4.0
ARG,88.0,68.0,83.0
...,...,...,...
VIE,,,2.0
YUG,118.0,143.0,174.0
ZAM,1.0,,1.0
ZIM,1.0,18.0,4.0


In [17]:
# Create the new column: counted['totals']
counted['totals'] = counted.sum(axis="columns")

In [21]:
# Sort counted by the 'totals' column
counted = counted.sort_values("totals", ascending=False)

In [22]:
#counted.to_excel("pivot_medals.xlsx")

In [23]:
# Print the top 15 rows of counted
print(counted.head(15))

Medal  Bronze    Gold  Silver  totals
NOC                                  
USA    1052.0  2088.0  1195.0  4335.0
URS     584.0   838.0   627.0  2049.0
GBR     505.0   498.0   591.0  1594.0
FRA     475.0   378.0   461.0  1314.0
ITA     374.0   460.0   394.0  1228.0
GER     454.0   407.0   350.0  1211.0
AUS     413.0   293.0   369.0  1075.0
HUN     345.0   400.0   308.0  1053.0
SWE     325.0   347.0   349.0  1021.0
GDR     225.0   329.0   271.0   825.0
NED     320.0   212.0   250.0   782.0
JPN     270.0   206.0   228.0   704.0
CHN     193.0   234.0   252.0   679.0
RUS     240.0   192.0   206.0   638.0
ROU     282.0   155.0   187.0   624.0


# Understanding the column labels