# All Time Olympic Games Medals

### Business Understanding

The Olympic Games is an important international event held every four years. In this analysis we will be interested for all Olympic Games from 1896 to 2018. We will try answering the following questions :

1) Which country has won the most gold medals in summer games ?

2) Which country had the biggest difference between their summer and winter gold medal counts ?

3) Which country has the biggest difference between their summer gold medal counts and winter gold medal counts relative to their total gold medal count ?

4) Which are the top 5 countries if we weights each gold medal for 3 points, silver medals for 2 points, and bronze medals for 1 point ?

### Data Understanding and Data Preparation



The following code loads the olympics dataset (olympics.csv), which was derrived from the Wikipedia entry on All Time Olympic Games Medals, and does some basic data cleaning.

The columns are organized as # of Summer games, Summer medals, # of Winter games, Winter medals, total # number of games, total # of medals.

In [None]:
#import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib notebook

In [None]:
# read the dataset an make some cleaning and store in 'df'
df = pd.read_csv('olympics.csv', index_col=0, skiprows=1)

# renaming columns to make it more readable
for col in df.columns:
    if col[:2]=='01':
        df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
    if col[:2]=='02':
        df.rename(columns={col:'Silver'+col[4:]}, inplace=True)
    if col[:2]=='03':
        df.rename(columns={col:'Bronze'+col[4:]}, inplace=True)
    if col[:1]=='№':
        df.rename(columns={col:'#'+col[1:]}, inplace=True)

names_ids = df.index.str.split('\s\(') # split the index by '('

df.index = names_ids.str[0] # the [0] element is the country name (new index) 
df['ID'] = names_ids.str[1].str[:3] # the [1] element is the abbreviation or ID (take first 3 characters from that)

df = df.drop('Totals')
#display first five rows of the dataset
df.head()

In [19]:
# note : that there is no missing values
df.isnull().sum()

# Summer          0
Gold              0
Silver            0
Bronze            0
Total             0
# Winter          0
Gold.1            0
Silver.1          0
Bronze.1          0
Total.1           0
# Games           0
Gold.2            0
Silver.2          0
Bronze.2          0
Combined total    0
ID                0
dtype: int64

In [22]:
# df shape
df.shape

(146, 16)

### Answer Questions


## Question 1

Which country has won the most gold medals in summer games ?

In [None]:
def most_gold():
    '''
    This function has no inputs and returns a coutry name as a string. 
    '''
    
    return df['Gold'].idxmax()

most_gold()

## Question 2

Which country had the biggest difference between their summer and winter gold medal counts ?

In [None]:
def biggest_diff():
    '''
    This function has no inputs and returns a coutry name as a string.  
    '''
    
    return (df['Gold'] - df['Gold.1']).abs().idxmax()

biggest_diff()

## Question 3

Which country has the biggest difference between their summer gold medal counts and winter gold medal counts relative to their total gold medal count ? 

(This includes countries that have won at least 1 gold in both summer and winter)

In [None]:
def relative_biggest_diff():
    '''
    This function has no inputs and returns a coutry name as a string.  
    '''
    
    new_df = df[(df['Gold']>0) & (df['Gold.1']>0)]
    
    return ((new_df['Gold'] - new_df['Gold.1']).abs()/new_df['Gold.2']).idxmax() 
    
relative_biggest_diff()

## Question 4

Which are the top 5 countries if we weights each gold medal for 3 points, silver medals for 2 points, and bronze medals for 1 point ?

In [None]:
def top_five():
    '''
    This function has no inputs and returns a pandas Series with the top 5 countries as index and their scores as values.  
    '''
    
    return (df['Gold.2']*3 + df['Silver.2']*2 + df['Bronze.2']).sort_values(ascending=False)[:5]

top5 = top_five()
    
# make  a plot
fig, ax = plt.subplots(figsize=(10,6))
bars = plt.bar(top5.index, top5)
ax.set_title('Top five countries in Olympic Games (weighted medals)\n Gold = 3pts, Silver = 2pts and Bronze = 1pt')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.gca().axes.get_yaxis().set_ticks([])
ax.tick_params(bottom=False)

for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,'%d' % int(height), ha='center', va='bottom')