# História das Olimpíadas  
_(créditos ao prof. Rafael Moreira)_

Após um ano de atraso por conta da pandemia de Covid-19, as atenções do mundo todo se voltaram para Tóquio, no Japão, para acompanhar mais uma edição das Olimpíadas.

No Brasil não foi diferente, e muitos se uniram para torcer por nossos atletas em diferentes competições, tanto em esportes onde o Brasil já possui tradição quanto em novos esportes.

Vamos aproveitar o clima para estudar um pouco das Olimpíadas! Utilizaremos um _dataset_ com 120 anos de dados históricos das Olimpíadas, cobrindo desde os jogos de Atenas 1896 até Rio 2016. 

Faça o download do _dataset_ em https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results e carregue o arquivo ```athlete_events.csv``` para um DataFrame utilizando Pandas. Aproveite para explorar seu DataFrame e se familiarizar com a sua estrutura. 

OBS: Fique à vontade para acrescentar mais células Python conforme necessário em qualquer etapa do exercício.

##  <span style='color:LightSeaGreen'>Sumário</span>
0. [Preparando o Ambiente](#env)
1. [O Brasil nas Olimpíadas](#pt1)  
    -[Atletas brasileiros](#pt1.1)  
    -[Medalhistas](#pt1.2)  
    -[Verão vs Inverno](#pt1.3)  
    -[Atletas do Brasil](#pt1.4)  
2. [O mundo nos jogos de verão](#pt2)
3. [Brasil vs Mundo](#pt3)

## 0. Preparando o Ambiente <span id='env'>

In [1]:
import pandas as pd
import re
#import warnings
#warnings.filterwarnings('ignore')

df = pd.read_csv('./dados/athlete_events.csv')
df.head()

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,
1,2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,
2,3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,
3,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold
4,5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,NED,1988 Winter,1988,Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,


## 1. O Brasil nas Olimpíadas <span id='pt1'>

Vamos começar estudando o desempenho do nossos próprio país. Gere um DataFrame novo contendo apenas as informações sobre atletas brasileiros.

### <span id='pt1.1' style='color:Gold'> Atletas brasileiros </span>

In [3]:
df[df['Team']=='Brazi-1']
df[df['Name'].str.contains(r'^Ana Paula.*')]
year_2004_df = df[df['Year']==2004].copy()
print(year_2004_df[year_2004_df['Team'].str.contains(r'^Brazil.*')]['Team'].value_counts())
year_2004_df[year_2004_df['Team'].str.contains(r'^Brazil-.*')].sort_values('Medal')

Brazil      310
Brazil-2      4
Brazil-1      4
Name: Team, dtype: int64


Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
63345,32468,Emanuel Fernando Scheffler Rego,M,31.0,190.0,80.0,Brazil-1,BRA,2004 Summer,2004,Summer,Athina,Beach Volleyball,Beach Volleyball Men's Beach Volleyball,Gold
199962,100430,Ricardo Alex Costa Santos,M,29.0,200.0,102.0,Brazil-1,BRA,2004 Summer,2004,Summer,Athina,Beach Volleyball,Beach Volleyball Men's Beach Volleyball,Gold
17628,9394,Adriana Brando Behar,F,35.0,180.0,64.0,Brazil-1,BRA,2004 Summer,2004,Summer,Athina,Beach Volleyball,Beach Volleyball Women's Beach Volleyball,Silver
217980,109492,Shelda Kelly Bruno Bede,F,31.0,165.0,59.0,Brazil-1,BRA,2004 Summer,2004,Summer,Athina,Beach Volleyball,Beach Volleyball Women's Beach Volleyball,Silver
6448,3615,Ana Paula Rodrigues Connelly (-Rodrigues Henkel),F,32.0,183.0,68.0,Brazil-2,BRA,2004 Summer,2004,Summer,Athina,Beach Volleyball,Beach Volleyball Women's Beach Volleyball,
19120,10125,Benjamin Insfran,M,32.0,196.0,97.0,Brazil-2,BRA,2004 Summer,2004,Summer,Athina,Beach Volleyball,Beach Volleyball Men's Beach Volleyball,
149213,74846,Mrcio Henrique Barroso Arajo,M,30.0,192.0,89.0,Brazil-2,BRA,2004 Summer,2004,Summer,Athina,Beach Volleyball,Beach Volleyball Men's Beach Volleyball,
209880,105385,Sandra Tavares Pires Nascimento,F,31.0,174.0,64.0,Brazil-2,BRA,2004 Summer,2004,Summer,Athina,Beach Volleyball,Beach Volleyball Women's Beach Volleyball,


In [4]:
df_brazil = df[df['Team'].str.contains(r'^Brazil.*')].copy()
df_brazil.reset_index(drop=True, inplace=True)
df_brazil.head()

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,386,Alexandre Abeid,M,22.0,194.0,92.0,Brazil,BRA,1972 Summer,1972,Summer,Munich,Volleyball,Volleyball Men's Volleyball,
1,386,Alexandre Abeid,M,26.0,194.0,92.0,Brazil,BRA,1976 Summer,1976,Summer,Montreal,Volleyball,Volleyball Men's Volleyball,
2,388,Abel Carlos da Silva Braga,M,19.0,190.0,73.0,Brazil,BRA,1972 Summer,1972,Summer,Munich,Football,Football Men's Football,
3,451,Diana Monteiro Abla,F,21.0,175.0,75.0,Brazil,BRA,2016 Summer,2016,Summer,Rio de Janeiro,Water Polo,Water Polo Women's Water Polo,
4,565,Glauclio Serro Abreu,M,26.0,185.0,75.0,Brazil,BRA,2004 Summer,2004,Summer,Athina,Boxing,Boxing Men's Middleweight,


### <span id='pt1.2' style='color:Gold'> Medalhistas </span>

Vamos focar um pouco nos casos de sucesso do Brasil. Use o seu DataFrame anterior para filtrar apenas informações sobre **medalhistas** brasileiros. 

**DICA:** observe como a coluna ```Medal``` é representada quando o atleta não ganhou medalha.

In [5]:
df_brazil_medals = df_brazil[df_brazil['Medal'].notna()].sort_values('Medal').reset_index(drop=True)
df_brazil_medals.head()

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,55088,Andr Bier Johannpeter,M,37.0,178.0,78.0,Brazil,BRA,2000 Summer,2000,Summer,Sydney,Equestrianism,"Equestrianism Mixed Jumping, Team",Bronze
1,97270,Nlson Prudncio,M,28.0,182.0,71.0,Brazil,BRA,1972 Summer,1972,Summer,Munich,Athletics,Athletics Men's Triple Jump,Bronze
2,32468,Emanuel Fernando Scheffler Rego,M,35.0,190.0,80.0,Brazil-2,BRA,2008 Summer,2008,Summer,Beijing,Beach Volleyball,Beach Volleyball Men's Beach Volleyball,Bronze
3,96701,Bruno Prada,M,40.0,185.0,110.0,Brazil,BRA,2012 Summer,2012,Summer,London,Sailing,Sailing Men's Two Person Keelboat,Bronze
4,93746,Rodrigo de Paula Pessoa,M,27.0,177.0,67.0,Brazil,BRA,2000 Summer,2000,Summer,Sydney,Equestrianism,"Equestrianism Mixed Jumping, Team",Bronze


In [6]:
df_brazil_medals['Medal'].value_counts()

Bronze    191
Silver    175
Gold      109
Name: Medal, dtype: int64

### <span id='pt1.3' style='color:Gold'> Verão vs Inverno </span>

Você deve ter notado que temos duas categorias distintas de jogos olímpicos, representados pela estação: temos os jogos de verão e os jogos de inverno, que ocorrem de maneira intercalada.

Agora que já conhecemos os medalhistas brasileiros, resposta: quantos atletas brasileiros receberam medalha nos jogos de verão e quantos receberam nos jogos de inverno?

In [7]:
df_brazil_medals['Season'].value_counts()

Summer    475
Name: Season, dtype: int64

In [8]:
df_brazil[df_brazil['Season']=='Winter']['Medal'].value_counts()

Series([], Name: Medal, dtype: int64)

<span style='color:LightGreen'>Todas as <b>475</b> medalhas são dos jogos de verão, não há medalhas em jogos de inverno.</span>

Os jogos de verão são bem mais populares do que os jogos de inverno no Brasil. Portanto, deste ponto em diante iremos focar apenas nos jogos de verão. Descarte de seu DataFrame os dados dos jogos de inverno.



In [9]:
df_brazil_medals_summer = df_brazil_medals[df_brazil_medals['Season']=='Summer']

### <span id='pt1.4' style='color:Gold'> Atletas do Brasil </span>

In [None]:
####

In [10]:
brazilian_mean_height = df_brazil_medals_summer['Height'].mean()
brazilian_mean_weight = df_brazil_medals_summer['Weight'].mean()
print(f'Altura média dos atletas: {round(brazilian_mean_height,2)} cm')
print(f'Peso médio dos atletas: {round(brazilian_mean_weight,2)} Kg')

Altura média dos atletas: 182.49 cm
Peso médio dos atletas: 76.71 Kg


#### <span id='pt1.4.2'></span>Imaginamos que diferentes esportes podem beneficiar diferentes tipos físicos, certo? Então refaça a análise anterior, mas obtendo os valores médios **por esporte**.

In [34]:
print('Altura média dos atletas por esporte:')
medalist_height_by_sport = df_brazil_medals_summer.groupby(by='Sport')['Height'].mean().apply(lambda value:round(value,2))
medalist_height_by_sport

Altura média dos atletas por esporte:


Sport
Athletics            181.00
Basketball           185.61
Beach Volleyball     184.88
Boxing               170.00
Canoeing             175.00
Equestrianism        179.67
Football             175.80
Gymnastics           162.75
Judo                 176.67
Modern Pentathlon    166.00
Sailing              181.59
Shooting             175.00
Swimming             189.11
Taekwondo            184.00
Volleyball           190.59
Name: Height, dtype: float64

In [35]:
print('Peso médio dos atletas por esporte:')
medalist_weight_by_sport = df_brazil_medals_summer.groupby(by='Sport')['Weight'].mean().apply(lambda value:round(value,2))
medalist_weight_by_sport

Peso médio dos atletas por esporte:


Sport
Athletics            74.58
Basketball           78.48
Beach Volleyball     78.16
Boxing               64.00
Canoeing             83.25
Equestrianism        75.00
Football             69.96
Gymnastics           63.75
Judo                 86.29
Modern Pentathlon    55.00
Sailing              80.41
Shooting             69.00
Swimming             81.56
Taekwondo            79.50
Volleyball           81.17
Name: Weight, dtype: float64

#### <span id='pt1.4.3'></span>Será que os dados acima influenciaram no interesse geral dos atletas pelo esporte ou realmente impactaram no desempenho deles? Podemos tentar descobrir se há algum tipo de correlação.

Você ainda possui o dataframe original contendo todos os atletas brasileiros, incluindo os sem medalha? Obtenha os valores médios de peso e altura por esporte daquele dataframe e compare-o com os dos medalhistas. Há alguma diferença significativa em algum esporte?

In [36]:
# Height analysis
# Create data frame with data from medalist vs all
all_height_by_sport = df_brazil.groupby(by='Sport')['Height'].mean().apply(lambda value:round(value,2))
all_weight_by_sport = df_brazil.groupby(by='Sport')['Weight'].mean().apply(lambda value:round(value,2))

In [74]:
# merge height data
medal_vs_all_height = pd.merge(medalist_height_by_sport, all_height_by_sport, \
            left_index=True, right_index=True,\
            suffixes=('_Medalist', '_All')).sort_index()
# add a column with the height difference between medalist and everyone else
medal_vs_all_height['Height_diff_percent'] = (medal_vs_all_height['Height_Medalist'] - medal_vs_all_height['Height_All'])\
                                    /medal_vs_all_height['Height_Medalist']*100
# merge weight data)
medal_vs_all_weight = pd.merge(medalist_weight_by_sport, all_weight_by_sport, \
            left_index=True, right_index=True,\
            suffixes=('_Medalist', '_All')).sort_index()
# add a column with the height difference between medalist and everyone else
medal_vs_all_weight['Weight_diff_percent'] = (medal_vs_all_weight['Weight_Medalist'] - medal_vs_all_weight['Weight_All'])\
                                    /medal_vs_all_weight['Weight_Medalist']*100

# visualizing both data
pd.concat([medal_vs_all_height,medal_vs_all_weight], axis=1)

Unnamed: 0_level_0,Height_Medalist,Height_All,Height_diff_percent,Weight_Medalist,Weight_All,Weight_diff_percent
Sport,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Athletics,181.0,176.2,2.651934,74.58,67.8,9.090909
Basketball,185.61,190.91,-2.85545,78.48,85.9,-9.454638
Beach Volleyball,184.88,185.7,-0.443531,78.16,78.62,-0.588536
Boxing,170.0,171.99,-1.170588,64.0,64.11,-0.171875
Canoeing,175.0,177.79,-1.594286,83.25,77.4,7.027027
Equestrianism,179.67,177.43,1.24673,75.0,72.31,3.586667
Football,175.8,173.37,1.382253,69.96,67.76,3.144654
Gymnastics,162.75,157.27,3.367127,63.75,52.46,17.709804
Judo,176.67,173.93,1.550914,86.29,80.29,6.953297
Modern Pentathlon,166.0,168.6,-1.566265,55.0,60.67,-10.309091


In [75]:
# for height data, Taekwondo, Swimming, Gymnastics, Basketball have the most difference

heigh_diff = medal_vs_all_height['Height_diff_percent'].apply(lambda value: abs(value)).sort_values(ascending=False)
heigh_diff[heigh_diff > heigh_diff.describe()['75%']]

Sport
Taekwondo     5.559783
Swimming      4.193327
Gymnastics    3.367127
Basketball    2.855450
Name: Height_diff_percent, dtype: float64

In [77]:
# for weight data, Taekwondo, Gymnastics, Shooting and Modern Pentathlon have the most difference

weigh_diff = medal_vs_all_weight['Weight_diff_percent'].apply(lambda value: abs(value)).sort_values(ascending=False)
weigh_diff[weigh_diff > weigh_diff.describe()['75%']]

Sport
Taekwondo            19.974843
Gymnastics           17.709804
Shooting             11.478261
Modern Pentathlon    10.309091
Name: Weight_diff_percent, dtype: float64

#### <span id='pt1.4.4'></span>Existe um detalhe importante passando batido até agora em nossa análise: as categorias esportivas costumam ser divididas por gênero justamente por conta de diferenças físicas entre homens e mulheres que poderiam influenciar no desempenho. Compare a altura e peso médios de atletas brasileiros por esporte segmentado por sexo.

In [88]:
# verifying how many gender types
df_brazil['Sex'].value_counts()

M    2680
F    1148
Name: Sex, dtype: int64

In [106]:
# data frame with only male
df_brazil_male = df_brazil[df_brazil['Sex']=='M']
# data frame with only female
df_brazil_female = df_brazil[df_brazil['Sex']=='F']
# get height mean values by sport for males
brazil_male_height_by_sport = df_brazil_male.groupby(by='Sport')['Height'].mean().apply(lambda value: round(value,2))
# get height mean values by sport for females
brazil_female_height_by_sport = df_brazil_female.groupby(by='Sport')['Height'].mean().apply(lambda value: round(value,2))
# merge both series
male_vs_female_height = pd.concat([brazil_male_height_by_sport,brazil_female_height_by_sport], axis=1).dropna()
male_vs_female_height.columns = ['Male_Height','Female_Height']
# add a column with the height difference between male and female
male_vs_female_height['Height_diff_percent'] = round(abs((male_vs_female_height['Male_Height'] - male_vs_female_height['Female_Height'])\
                                    /male_vs_female_height['Male_Height']*100),2)

# same procedure for weight
brazil_male_weight_by_sport = df_brazil_male.groupby(by='Sport')['Weight'].mean().apply(lambda value: round(value,2))
brazil_female_weight_by_sport = df_brazil_female.groupby(by='Sport')['Weight'].mean().apply(lambda value: round(value,2))
male_vs_female_weight = pd.concat([brazil_male_weight_by_sport,brazil_female_weight_by_sport], axis=1).dropna()
male_vs_female_weight.columns = ['Male_Weight','Female_Weight']
male_vs_female_weight['Weight_diff_percent'] = round(abs((male_vs_female_weight['Male_Weight'] - male_vs_female_weight['Female_Weight'])\
                                    /male_vs_female_weight['Male_Weight']*100),2)

# visualize data
pd.concat([male_vs_female_height,male_vs_female_weight],axis=1)

Unnamed: 0_level_0,Male_Height,Female_Height,Height_diff_percent,Male_Weight,Female_Weight,Weight_diff_percent
Sport,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alpine Skiing,184.15,160.33,12.94,85.15,52.33,38.54
Archery,177.85,162.86,8.43,78.15,59.71,23.6
Athletics,180.07,167.49,6.99,71.55,59.81,16.41
Badminton,183.0,168.0,8.2,78.0,70.0,10.26
Basketball,195.46,182.87,6.44,92.95,73.38,21.05
Beach Volleyball,194.54,176.48,9.28,91.75,64.91,29.25
Bobsleigh,185.0,168.5,8.92,89.75,74.5,16.99
Boxing,172.34,167.6,2.75,64.03,65.0,1.51
Canoeing,178.55,169.5,5.07,78.57,64.5,17.91
Cross Country Skiing,178.75,167.25,6.43,74.5,53.75,27.85


In [112]:
heigh_diff = male_vs_female_height['Height_diff_percent'].apply(lambda value: abs(value)).sort_values(ascending=False)
heigh_diff.describe()

count    33.000000
mean      6.559394
std       2.342812
min       2.290000
25%       5.170000
50%       6.110000
75%       8.200000
max      12.940000
Name: Height_diff_percent, dtype: float64

In [115]:
heigh_diff[heigh_diff > heigh_diff.describe()['75%']]

Sport
Alpine Skiing       12.94
Gymnastics          10.88
Triathlon            9.90
Tennis               9.69
Beach Volleyball     9.28
Bobsleigh            8.92
Archery              8.43
Swimming             8.41
Name: Height_diff_percent, dtype: float64

In [118]:
weigh_diff = male_vs_female_weight['Weight_diff_percent'].apply(lambda value: abs(value)).sort_values(ascending=False)
weigh_diff.describe()

count    33.000000
mean     22.182424
std       7.258357
min       1.510000
25%      17.910000
50%      23.400000
75%      26.000000
max      38.540000
Name: Weight_diff_percent, dtype: float64

In [119]:
# since std is high, difference is better show with mean
weigh_diff[weigh_diff > weigh_diff.describe()['mean']]

Sport
Alpine Skiing           38.54
Wrestling               31.47
Gymnastics              31.08
Weightlifting           30.28
Beach Volleyball        29.25
Rowing                  28.98
Cross Country Skiing    27.85
Triathlon               26.90
Golf                    26.00
Swimming                25.69
Modern Pentathlon       25.34
Rugby Sevens            25.30
Handball                25.01
Shooting                23.73
Archery                 23.60
Cycling                 23.54
Judo                    23.40
Equestrianism           22.50
Name: Weight_diff_percent, dtype: float64

#### <span id='pt1.4.5'></span>Qual foi (ou quais foram) o maior medalhista brasileiro em quantidade total de medalhas?

In [137]:
', '.join(df_brazil_medals['Name'].value_counts().index[:2])

'Torben Schmidt Grael, Robert Scheidt'

#### <span id='pt1.4.6'></span>E o(s) maior(es) em quantidade de medalhas de ouro?

In [139]:
df_brazil_medals[df_brazil_medals['Medal']=='Gold']['Name'].value_counts()

Paula Renata Marques Pequeno                          2
Thasa Daher de Menezes                                2
Jaqueline Maria "Jaque" Pereira de Carvalho Endres    2
Fabiana "Fabi" Alvim de Oliveira                      2
Giovane Farinazzo Gvio                                2
                                                     ..
Janelson dos Santos Carvalho                          1
Aurlio Fernndez Miguel                                1
Marcos "Marquinhos" Aos Corra                         1
Marianne "Mari" Steinbrecher                          1
Luiz Felipe Marques Fonteles                          1
Name: Name, Length: 96, dtype: int64

#### <span id='pt1.4.7'></span>Qual esporte rendeu mais medalhas de ouro para o Brasil? E qual rendeu mais medalhas no total?

**DICA:** tome muito cuidado nessa análise: cada **evento esportivo** rende 1 medalha. Por exemplo, quando a equipe de futebol vence, isso é considerado 1 medalha, mesmo tendo cerca de 20 atletas medalhistas na equipe. 

In [180]:
df_brazil.columns

Index(['ID', 'Name', 'Sex', 'Age', 'Height', 'Weight', 'Team', 'NOC', 'Games',
       'Year', 'Season', 'City', 'Sport', 'Event', 'Medal'],
      dtype='object')

In [192]:
# select only the sports, events and by year
df_unique_by_sport = df_brazil[['Year','Sport','Event','Medal']].copy()
df_unique_by_sport.shape

(3828, 4)

In [193]:
# remove duplicates, like there were 1 athlete per sport
df_unique_by_sport.drop_duplicates(inplace=True)
df_unique_by_sport.shape

(1647, 4)

In [194]:
df_brazil_medals_by_sport = df_unique_by_sport.groupby(by='Sport')['Medal'].value_counts()
df_brazil_medals_by_sport.head()

Sport       Medal 
Athletics   Bronze    8
            Gold      5
            Silver    3
Basketball  Bronze    4
            Silver    1
Name: Medal, dtype: int64

In [220]:
# accessing the multindex dataframe
print('Number of gold medals by sport:')
df_brazil_medals_by_sport.loc[:,'Gold'].sort_values(ascending=False)

Number of gold medals by sport:


Sport
Sailing             7
Athletics           5
Volleyball          5
Judo                4
Beach Volleyball    3
Boxing              1
Equestrianism       1
Football            1
Gymnastics          1
Shooting            1
Swimming            1
Name: Medal, dtype: int64

In [221]:
print('Number of total medals by sport:')
df_brazil_medals_by_sport.groupby('Sport').sum().sort_values(ascending=False)

Number of total medals by sport:


Sport
Judo                 22
Sailing              18
Athletics            16
Swimming             14
Beach Volleyball     13
Volleyball           10
Football              8
Basketball            5
Boxing                5
Gymnastics            4
Shooting              4
Canoeing              3
Equestrianism         3
Taekwondo             2
Modern Pentathlon     1
Name: Medal, dtype: int64

#### <span id='pt1.4.8'></span>Cada "categoria" dentro de um esporte é considerado um evento. Por exemplo, dentro de "atletismo", temos uma competição de 100m masculina, uma de 100m feminino, um revezamento 4 x 100m masculino, um revezamento 4 x 100m feminino, uma competição de 400m masculino, uma de 400m feminino, uma maratona masculina, uma maratona feminina, e assim sucessivamente.

Sabendo disso, qual evento esportivo mais rendeu medalhas de ouro para o Brasil? E total de medalhas?

In [226]:
df_unique_by_event = df_brazil[['Year','Event','Medal']].copy()
df_unique_by_event.drop_duplicates(inplace=True)
df_brazil_medals_by_event = df_unique_by_event.groupby(by='Event')['Medal'].value_counts()
df_brazil_medals_by_event

Event                                 Medal 
Athletics Men's 200 metres            Bronze    1
Athletics Men's 4 x 100 metres Relay  Bronze    2
                                      Silver    1
Athletics Men's 800 metres            Gold      1
                                      Silver    1
                                               ..
Taekwondo Women's Heavyweight         Bronze    1
Volleyball Men's Volleyball           Gold      3
                                      Silver    3
Volleyball Women's Volleyball         Bronze    2
                                      Gold      2
Name: Medal, Length: 90, dtype: int64

In [236]:
print('Event with most gold medals: ',end='')
print(df_brazil_medals_by_event.loc[:,'Gold'].sort_values(ascending=False).index[0])

print('Event with most medals in total: ',end='')
print(df_brazil_medals_by_event.groupby('Event').sum().sort_values(ascending=False).index[0])

Event with most gold medals: Volleyball Men's Volleyball
Event with most medals in total: Beach Volleyball Women's Beach Volleyball


#### <span id='pt1.4.9'></span>Para finalizar sobre o Brasil: obtenha o total de medalhas de ouro, prata, bronze e total por ano.

In [291]:
medals_by_year = df_brazil_medals.groupby('Year')['Medal'].value_counts()
medals_by_year.head()

Year  Medal 
1920  Bronze     5
      Gold       1
      Silver     1
1948  Bronze    10
1952  Bronze     2
Name: Medal, dtype: int64

In [296]:
total_medals_by_year = medals_by_year.unstack()
total_medals_by_year.assign(Total=total_medals_by_year.sum(axis=1)).stack().to_frame('#')

Unnamed: 0_level_0,Unnamed: 1_level_0,#
Year,Medal,Unnamed: 2_level_1
1920,Bronze,5.0
1920,Gold,1.0
1920,Silver,1.0
1920,Total,7.0
1948,Bronze,10.0
1948,Total,10.0
1952,Bronze,2.0
1952,Gold,1.0
1952,Total,3.0
1956,Gold,1.0


## 2. O mundo nos jogos de verão <span id='pt2'>

Vamos agora analisar um pouquinho do que aconteceu nas Olimpíadas de verão em todo o mundo.



#### <span id='pt2.1'></span>Retome o DataFrame original e descarte as informações sobre os jogos de inverno.

In [27]:
df_summer = df[df['Season']=='Summer']
df_summer.head()

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,
1,2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,
2,3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,
3,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold
26,8,"Cornelia ""Cor"" Aalten (-Strannood)",F,18.0,168.0,,Netherlands,NED,1932 Summer,1932,Summer,Los Angeles,Athletics,Athletics Women's 100 metres,


#### <span id='pt2.2'></span>Obtenha a lista de todos os esportes já disputados nas olimpíadas de verão.

In [28]:
df_summer['Sport'].unique()

array(['Basketball', 'Judo', 'Football', 'Tug-Of-War', 'Athletics',
       'Swimming', 'Badminton', 'Sailing', 'Gymnastics',
       'Art Competitions', 'Handball', 'Weightlifting', 'Wrestling',
       'Water Polo', 'Hockey', 'Rowing', 'Fencing', 'Equestrianism',
       'Shooting', 'Boxing', 'Taekwondo', 'Cycling', 'Diving', 'Canoeing',
       'Tennis', 'Modern Pentathlon', 'Golf', 'Softball', 'Archery',
       'Volleyball', 'Synchronized Swimming', 'Table Tennis', 'Baseball',
       'Rhythmic Gymnastics', 'Rugby Sevens', 'Trampolining',
       'Beach Volleyball', 'Triathlon', 'Rugby', 'Lacrosse', 'Polo',
       'Cricket', 'Ice Hockey', 'Racquets', 'Motorboating', 'Croquet',
       'Figure Skating', 'Jeu De Paume', 'Roque', 'Basque Pelota',
       'Alpinism', 'Aeronautics'], dtype=object)

#### <span id='pt2.3'></span>Obtenha a lista de todas as modalidades esportivas já disputadas nas olimpíadas de verão.

In [29]:
df_summer['Event'].unique()

array(["Basketball Men's Basketball", "Judo Men's Extra-Lightweight",
       "Football Men's Football", "Tug-Of-War Men's Tug-Of-War",
       "Athletics Women's 100 metres",
       "Athletics Women's 4 x 100 metres Relay",
       "Swimming Men's 400 metres Freestyle", "Badminton Men's Singles",
       "Sailing Women's Windsurfer",
       "Swimming Men's 200 metres Breaststroke",
       "Swimming Men's 400 metres Breaststroke",
       "Gymnastics Men's Individual All-Around",
       "Gymnastics Men's Team All-Around",
       "Gymnastics Men's Floor Exercise", "Gymnastics Men's Horse Vault",
       "Gymnastics Men's Parallel Bars",
       "Gymnastics Men's Horizontal Bar", "Gymnastics Men's Rings",
       "Gymnastics Men's Pommelled Horse", "Athletics Men's Shot Put",
       'Art Competitions Mixed Sculpturing, Unknown Event',
       "Handball Women's Handball",
       "Weightlifting Women's Super-Heavyweight",
       "Wrestling Men's Light-Heavyweight, Greco-Roman",
       "Gymnastics M

#### <span id='pt2.4'></span>Obtenha a lista de todos os países que já disputaram olimpíadas.

In [30]:
print(df_summer.shape)
df_summer['Medal'].value_counts()

(222552, 15)


Gold      11459
Bronze    11409
Silver    11220
Name: Medal, dtype: int64

#### <span id='pt2.5'></span>Qual atleta foi o maior medalhista (em medalhas totais) da história das olimpíadas de verão?

In [31]:
df_summer_medals = df_summer.dropna(subset=['Medal'])
df_summer_medals['Name'].value_counts().index[0]


'Michael Fred Phelps, II'

#### <span id='pt2.6'></span>Qual atleta foi o maior medalhista de ouro da história das olimpíadas de verão?

In [32]:
df_summer_medals[df_summer_medals['Medal']=='Gold']['Name'].value_counts().index[0]

'Michael Fred Phelps, II'

#### <span id='pt2.7'></span>Qual país foi o maior medalhista de ouro da história das olimpíadas de verão? Lembre-se da questão do evento esportivo, para não considerar múltiplas medalhas para um mesmo evento (ex: uma equipe de futebol fazendo parecer que mais de 20 medalhas foram distribuídas).

In [37]:
df_summer_medals_by_event = df_summer_medals[['Team','Year','Event','Medal']].copy()
df_summer_medals_by_event.drop_duplicates(inplace=True)
df_summer_medals_by_event.groupby(by='Team')['Medal'].value_counts().loc[:,'Gold'].sort_values(ascending=False).index[0]

'United States'

#### <span id='pt2.8'></span>Qual país foi o maior medalhista em medalhas totais na história das olimpíadas de verão?

In [38]:
df_summer_medals_by_event.groupby(by='Team')['Medal'].value_counts().groupby(by='Team').sum().sort_values(ascending=False).index[0]

'United States'

#### <span id='pt2.9'></span>Obtenha o total de medalhas de ouro, prata e total por edição das Olimpíadas de verão. Lembre-se da questão do evento esportivo.

In [43]:
summer_medals = df_summer_medals_by_event.groupby(by='Year')['Medal'].value_counts().unstack()
summer_medals.assign(Total=summer_medals.sum(axis=1)).stack().to_frame('#')

Unnamed: 0_level_0,Unnamed: 1_level_0,#
Year,Medal,Unnamed: 2_level_1
1896,Bronze,34
1896,Gold,43
1896,Silver,41
1896,Total,118
1900,Bronze,92
...,...,...
2012,Total,962
2016,Bronze,360
2016,Gold,307
2016,Silver,306


## 3. Brasil vs Mundo <span id='pt3'>

#### <span id='pt3.1'></span>Para finalizar, vamos fazer algumas comparações entre Brasil e mundo. Qual o ranking do Brasil em cada edição das olimpíadas? Lembrando que o ranking é ordenado por medalhas de ouro.

#### <span id='pt3.2'></span>Compare o maior medalhista em ouros do Brasil com o maior medalhista em ouros do mundo.

#### <span id='pt3.4'></span>Compare o maior medalhista em total de medalhas do Brasil com o maior medalhista em total de medalhas do mundo.

#### <span id='pt3.5'></span>Compare o maior medalhista em ouros do Brasil com o maior medalhista do mundo no mesmo esporte.

#### <span id='pt3.6'></span>Compare o maior medalhista em total de medalhas do Brasil com o maior medalhista do mundo no mesmo esporte.

#### <span id='pt3.7'></span>Calcule o percentual de medalhas de ouro, prata e bronze que o Brasil ganhou em cada olimpíada.