## pivot_table

Pandas provides the pivot_table function for pivoting with aggregation of numeric data.

In [10]:
import pandas as pd
import numpy as np

In [13]:
# Let's load some data about summer olympics 
oly = pd.read_csv('./data/summerOlympics/data.csv')
oly.head()

Unnamed: 0,City,Edition,Sport,Discipline,Athlete,NOC,Gender,Event,Event_gender,Medal
0,Athens,1896,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100m freestyle,M,Gold
1,Athens,1896,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100m freestyle,M,Silver
2,Athens,1896,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100m freestyle for sailors,M,Bronze
3,Athens,1896,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100m freestyle for sailors,M,Gold
4,Athens,1896,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100m freestyle for sailors,M,Silver


We can use pivot table to do some interesting aggregations

In [18]:
medals = pd.pivot_table(oly, index=['NOC'], values='Athlete', columns=['Medal'], aggfunc= 'count').fillna(0)
medals.head()

Medal,Bronze,Gold,Silver
NOC,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AFG,1,0,0
AHO,0,0,1
ALG,8,4,2
ANZ,5,20,4
ARG,88,68,83


In [25]:
# Let's add a medals total column, by summing up all the column values
medals['Total'] = medals.sum(axis='columns')

In [26]:
# Next, lets sort and print the top 10 countries based on total medal tally
medals.sort_values(by=['Total'],ascending=False).head(10)

Medal,Bronze,Gold,Silver,totals,Total
NOC,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
USA,1052,2088,1195,4335,8670
URS,584,838,627,2049,4098
GBR,505,498,591,1594,3188
FRA,475,378,461,1314,2628
ITA,374,460,394,1228,2456
GER,454,407,350,1211,2422
AUS,413,293,369,1075,2150
HUN,345,400,308,1053,2106
SWE,325,347,349,1021,2042
GDR,225,329,271,825,1650


In [27]:
# We can sort on a different column to see top 10 countries to win gold
medals.sort_values(by=['Gold'],ascending=False).head(10)

Medal,Bronze,Gold,Silver,totals,Total
NOC,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
USA,1052,2088,1195,4335,8670
URS,584,838,627,2049,4098
GBR,505,498,591,1594,3188
ITA,374,460,394,1228,2456
GER,454,407,350,1211,2422
HUN,345,400,308,1053,2106
FRA,475,378,461,1314,2628
SWE,325,347,349,1021,2042
GDR,225,329,271,825,1650
AUS,413,293,369,1075,2150


Let's try to figure out the most medals won by some countries between a specific set of years

In [32]:
medals_won_by_country = pd.pivot_table(oly, index=['Edition'], values='Athlete', columns=['NOC'], aggfunc= 'count')

medals_filtered_by_country = medals_won_by_country.loc[1975:1995 , ['USA', 'URS', 'CHN']]

most_medals = medals_filtered_by_country.idxmax(axis='columns')

print ("Most medals per olympic")
most_medals

Most medals per olympic


Edition
1976    URS
1980    URS
1984    USA
1988    URS
1992    USA
dtype: object