# NOC Medal Rating System

# Purpose 
We take the Games-700 notebook from 700-Olympic_Medals_and_Athlete_Numbers_DF and calculate the ratings for each country for every olympics and then each countries ranking for every olympics. 

# Datasets
Uses: <br>
** Games-700 notebook ** from 700-Olympic_Medals_and_Athlete_Numbers_DF
Creates: &emsp;
<br>
** Games-800.csv ** Medal df joined wwith added column for a countries rank in each olympic games by Rating. <br>
** Games-NOCRank-800 ** Contains a field for the ranking for each country for every individual games. <br>
** Games-TotalRank-800.csv ** Contains a field for the ranking of each country for every individual games and overall.

In [12]:
import os.path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
%matplotlib inline
from bs4 import BeautifulSoup
import webbrowser
import urllib.request
from lxml import html
import zipfile
import re
import string
import sys, os
from IPython.display import Image

In [13]:
# Ensure the file exists
if not os.path.exists( r"..\..\data\prep\Games\Games-700.csv" ):
    print("Missing dataset file")

In [14]:
# read the medal csv into a dataframe
df = pd.read_csv( r"..\..\data\prep\Games\Games-700.csv", encoding = "ISO-8859-1")

In [15]:
# this is the medal dataframe 
df.head(3)

Unnamed: 0,Year,Host_City,Host_Country,Total_Males,Total_Females,Total_Athletes,Summer,Winter,Discipline,Sport,Ath_Name,Gender,NOC,Home_Adv,Gold,Silver,Bronze,Total_Medals
0,1960,Rome,ITA,4727,611,5338,True,False,Athletics,Athletics,"BROWN, Earlene",Women,USA,False,0,0,1,1
1,1960,Rome,ITA,4727,611,5338,True,False,Athletics,Athletics,"LÜTTGE, Johanna",Women,EUA,False,0,1,0,1
2,1960,Rome,ITA,4727,611,5338,True,False,Athletics,Athletics,"CLAUS, Hildrun",Women,EUA,False,0,0,1,1


In [36]:
df.columns

Index(['Year', 'Host_City', 'Host_Country', 'Total_Males', 'Total_Females',
       'Total_Athletes', 'Summer', 'Winter', 'Discipline', 'Sport', 'Ath_Name',
       'Gender', 'NOC', 'Home_Adv', 'Gold', 'Silver', 'Bronze', 'Total_Medals',
       'NOC_Gold', 'NOC_Silver', 'NOC_Bronze', 'NOC_Total_Medals',
       'NOC_Rating', 'NOC_Rank'],
      dtype='object')

# Getting the Total medals for each country for every olympics 
In order to do this we will use the groupby function. We want the total number of each medal every country one for each olympics. So we will groupby the Year of the olympic and the host city andcountry we'll also put in the winter and summer fields although they are not needed they're nice to have and then of course the country or NOC. Then we will get the sum of the gold sliver and bronze medals for this groupby and we will get our result.

In [16]:
dfRate = df.groupby(['Year', 'Host_City', 'Host_Country', 'NOC', 'Summer', 'Winter'])[['Gold', 'Silver', 'Bronze', 'Total_Medals']].sum().reset_index()

In [17]:
dfRate.head(3)

Unnamed: 0,Year,Host_City,Host_Country,NOC,Summer,Winter,Gold,Silver,Bronze,Total_Medals
0,1960,Rome,ITA,ARG,True,False,0,1,1,2
1,1960,Rome,ITA,AUS,True,False,9,12,8,29
2,1960,Rome,ITA,AUT,True,False,1,1,0,2


# Rating Field 
Now we can add the rating field. he rating for a country is based on the weighted system we created for the olympic medals. A rating of 3 is for Gold 2 for sliver and 1 for bronze. The rating field will just be the countries total medlas by weight. 

In [18]:
# We have to put the none value in every column first or else the sum doesn't work 
dfRate['Rating'] = None
dfRate['Rating'] = dfRate['Gold'] * 3 +  dfRate['Silver'] * 2 + dfRate['Bronze']

In [19]:
dfRate.head(3)

Unnamed: 0,Year,Host_City,Host_Country,NOC,Summer,Winter,Gold,Silver,Bronze,Total_Medals,Rating
0,1960,Rome,ITA,ARG,True,False,0,1,1,2,3
1,1960,Rome,ITA,AUS,True,False,9,12,8,29,59
2,1960,Rome,ITA,AUT,True,False,1,1,0,2,5


# Creating a ranking field 
Now that we have the rating system in place we want to give a rank to each country of each olympics. First we should sort the ratings by each olympics then we can give a rank based on this sort. 

In [20]:
# Sorting the df by the year of the games, host city, host country and then the rating of each country  
dfRate = dfRate.sort_values(by=['Year', 'Host_City', 'Host_Country', 'Rating'], ascending=False).reset_index()
# Dropping the old index 
dfRate = dfRate.drop(dfRate.columns[[0]], axis=1)

In [21]:
dfRate.head(3)

Unnamed: 0,Year,Host_City,Host_Country,NOC,Summer,Winter,Gold,Silver,Bronze,Total_Medals,Rating
0,2018,PyeongChang,KOR,NOR,False,True,22,17,12,51,112
1,2018,PyeongChang,KOR,GER,False,True,16,11,10,37,80
2,2018,PyeongChang,KOR,CAN,False,True,13,7,10,30,63


# The Rank field popluation 
So far the rating dataframe is sorted by the Year, host city, host country and importanly the Rating. In order to populate the rank field correctly we have to consider that we can have a winter and summer games on the same year. With this in mind everytime we reach a new games during our iteration of the rating datframe we must reset the rank to 1. 

In [22]:
# For loop for populating the rank field 
dfRate['Rank'] = None 

# The lastyear and lasthost varaibles are needed so we can track when the games change in the iteration
lastyear = dfRate['Year'].iloc[0]
lastHost = dfRate['Host_City'].iloc[0]
rank = 1


for x, row in dfRate.iterrows():
    
    # current year and host to compare with the last years 
    curryear = dfRate['Year'].iloc[x]
    currHost = dfRate['Host_City'].iloc[x]
    
    # as long as the current host and year are the same we're in the same games so rank is assinged
    if(curryear == lastyear and currHost == lastHost):
        dfRate.loc[x, 'Rank'] = rank
    
    # if the games changes then we reset the rank varaible 
    else:
        rank = 1
        dfRate.loc[x, 'Rank'] = rank
    
    # give the last year and host varaibles their new values 
    lastyear = curryear
    lastHost = currHost
    # increment rank 
    rank = rank + 1

In [23]:
# Looking at n example of the case we spoke aobut above and check if or loop worked correctly 
dfRate[dfRate['Year'] == 1960].head(20)

Unnamed: 0,Year,Host_City,Host_Country,NOC,Summer,Winter,Gold,Silver,Bronze,Total_Medals,Rating,Rank
1178,1960,Squaw Valley,USA,URS,False,True,5,7,10,22,39,1
1179,1960,Squaw Valley,USA,EUA,False,True,4,3,1,8,19,2
1180,1960,Squaw Valley,USA,FIN,False,True,3,3,3,9,18,3
1181,1960,Squaw Valley,USA,NOR,False,True,3,4,0,7,17,4
1182,1960,Squaw Valley,USA,USA,False,True,2,4,3,9,17,5
1183,1960,Squaw Valley,USA,SWE,False,True,3,2,2,7,15,6
1184,1960,Squaw Valley,USA,AUT,False,True,1,2,3,6,10,7
1185,1960,Squaw Valley,USA,CAN,False,True,2,1,1,4,9,8
1186,1960,Squaw Valley,USA,FRA,False,True,1,0,2,3,5,9
1187,1960,Squaw Valley,USA,NED,False,True,0,1,1,2,3,10


In [24]:
dfRate.to_csv( r"..\..\data\prep\Games\Games-NOCRank-800.csv", index=False)

# Overall across all games 
Now we can get the rating and rank of each NOC across all the olympic games from 1960 -> 2018. But we'll split it into the winter and summer games. '

In [25]:
dfTRate = dfRate.groupby(['NOC', 'Summer', 'Winter'])[['Gold', 'Silver', 'Bronze', 'Total_Medals', 'Rating']].sum().reset_index()

In [26]:
dfTRate = dfTRate.sort_values(by=['Summer', 'Winter', 'Rating'], ascending=False).reset_index()
# Dropping the old index 
dfTRate = dfTRate.drop(dfTRate.columns[[0]], axis=1)

# Rank field for total ratings 
Now we'll create a rank field for the new dataframe which contains the total ratings across all olympic games for each NOC. The for loop below will be the same as the one above except rank will only be rest when we more to winter games from summer. 

In [27]:
# For loop for populating the rank field 
dfTRate['Rank'] = None 

# Ranks will start at one 
rankS = 1
rankW = 1

for x, row in dfTRate.iterrows():
        
    # keeping track of the olympic games type 
    gameType = dfTRate['Summer'].iloc[x]
    
    # while the iteration is within the summer games the summer rank is assinged
    if(gameType):
        dfTRate.loc[x, 'Rank'] = rankS
        rankS = rankS + 1
    
    # if the games changes to winter rank the winter rank variable is used 
    else:
        dfTRate.loc[x, 'Rank'] = rankW
        rankW = rankW + 1

In [28]:
dfTRate.to_csv( r"..\..\data\prep\Games\Games-TotalRank-800.csv", index=False)

# Joining the Rank and Rating to Medal dataFrame 
Another addition that can be made to the orginal medal winner datFrame is adding the rank and rating of each NOC to the NOC of a given athlete of a row in the medal table.
I'll set the index of the two tables to Year, Host_City, Host_Country and most importantly the NOC so each athletes row also has infomration on his NOC.I'll have to change the names of the gold, sliver, bronze and total medals so when they're in this new big dataFrame they will be distiguishable fromm the athletes medals. 

In [29]:
# Changing the field names for the NOCs medals and ratings so they can be added to the original medal dataframe  
dfRate.rename(columns={'Gold': 'NOC_Gold'}, inplace=True)
dfRate.rename(columns={'Silver': 'NOC_Silver'}, inplace=True)
dfRate.rename(columns={'Bronze': 'NOC_Bronze'}, inplace=True)
dfRate.rename(columns={'Total_Medals': 'NOC_Total_Medals'}, inplace=True)
dfRate.rename(columns={'Rating': 'NOC_Rating'}, inplace=True)
dfRate.rename(columns={'Rank': 'NOC_Rank'}, inplace=True)

In [30]:
# Setting the indexes of both tables so they are joinable 
df = df.set_index(['Year', 'Host_Country', 'Host_City', 'NOC'])
dfRate = dfRate.set_index(['Year', 'Host_Country', 'Host_City', 'NOC'])

In [31]:
# It'll be possible to join the medal df to the rating df because they share the same indexes
# I just have to include which of the fields from the medal df I want to include in the join (which is all of them)
df = df[['Total_Males', 'Total_Females', 'Total_Athletes', 'Discipline','Gender', 'Sport', 'Ath_Name', 'Home_Adv', 'Gold', 'Silver', 'Bronze', 'Total_Medals']].join(dfRate).reset_index()

In [32]:
# Changing the order of the columns 
df = df[['Year', 'Host_City', 'Host_Country', 'Total_Males', 'Total_Females', 'Total_Athletes', 'Summer', 'Winter', 'Discipline','Sport',
              'Ath_Name', 'Gender', 'NOC', 'Home_Adv', 'Gold', 'Silver', 'Bronze', 'Total_Medals', 'NOC_Gold', 'NOC_Silver', 'NOC_Bronze', 'NOC_Total_Medals', 'NOC_Rating', 'NOC_Rank']]

In [33]:
df.to_csv( r"..\..\data\prep\Games\Games-800.csv", index=False)

In [35]:
df.columns

Index(['Year', 'Host_City', 'Host_Country', 'Total_Males', 'Total_Females',
       'Total_Athletes', 'Summer', 'Winter', 'Discipline', 'Sport', 'Ath_Name',
       'Gender', 'NOC', 'Home_Adv', 'Gold', 'Silver', 'Bronze', 'Total_Medals',
       'NOC_Gold', 'NOC_Silver', 'NOC_Bronze', 'NOC_Total_Medals',
       'NOC_Rating', 'NOC_Rank'],
      dtype='object')