# Analysis of Olympic Performance of Countries based on Government Type and Change in Government Throughout the Years

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5c/Olympic_rings_without_rims.svg/2880px-Olympic_rings_without_rims.svg.png" width="250" height="250">

<img src="https://assets.telegraphindia.com/telegraph/bb5aaa2f-4a8a-4ae9-81c3-24816f1ea88c.jpg" width="250" height="250">

The Olympics is a global sporting event with more than 200 participants. These games are held every 4 years where countries showcase their athletic might. 

The 20th Century saw the many countries change their government type and we also witnessed divisions and emergence of newer countries. With these instances in mind, we try to see if there has been a change in the Olympic performance of countries that had a change in their government type.  

We analyze countries with short-term changes in the government type before reverting back to the original type, analyze the data during this time and see if this had an effect on the future events, and also see if there are long term effects on performance and representation, essentially to see if there is a lag in this because of the short-term change in the government type.

## Datasets Used:

- 120 Years Olympics Data: https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results
     
     The data has the following columns   
     - ID - Unique number for each athlete
     - Name - Athlete's name
     - Sex - M/F
     - Age - Integer
     - Height - In centimeters
     - Weight - In kilograms
     - Team - Team name
     - NOC - National Olympic Committee 3-letter code
     - Games - Year and season
     - Year - Integer
     - Season - Summer or Winter
     - City - Host city
     - Sport - Sport
     - Event - Event
     - Medal - Gold, Silver, Bronze, or NA  


- World Systemic Peace Dataset: https://www.systemicpeace.org/polityproject.html


    We plan to use the following columns:
    - Year - Integer
    - Country - Unique countries
    - polity2 - -10 to +10 (completely autocratic to completely democratic)
    - scode - country identifier

## Final Analysis
We would classify a country's success in the Olympics based on the following metrics:
- Number of Participants
- Medals won (Gold, Silver, Bronze)
- Medals to Participants Ratio
- Male to Female Representation
- Average Age

In [20]:
import pandas as pd
import numpy as np
import helper_function as hf
import ipywidgets
import warnings
warnings.filterwarnings('ignore')
%run helper_function.py 

pd.set_option('display.max_rows', 10000)
pd.set_option('display.max_columns', 10000)
pd.set_option('display.width', 10000)

In [21]:
final_df, noc = prepare_olympic_dataset("athlete_events.csv", "noc_regions.csv")

(             region  Year  NOC            City      Sport                                   Event    Age  Name  Sex_F  Sex_M  Medal_Bronze  Medal_Silver  Medal_Gold  Season_Summer  Season_Winter
0       AFGHANISTAN  1936  AFG          Berlin  Athletics              Athletics Men's 100 metres  25.00     1      0      1             0             0           0            1.0              0
1       AFGHANISTAN  1936  AFG          Berlin  Athletics               Athletics Men's Long Jump  25.00     1      0      1             0             0           0            1.0              0
2       AFGHANISTAN  1936  AFG          Berlin  Athletics                Athletics Men's Shot Put  23.00     1      0      1             0             0           0            1.0              0
3       AFGHANISTAN  1936  AFG          Berlin     Hockey                     Hockey Men's Hockey  24.25    13      0     13             0             0           0           13.0              0
4       AFGHANISTAN  194

In [6]:
final_df.head()

Unnamed: 0,region,Year,NOC,City,Sport,Event,Age,Name,Sex_F,Sex_M,Medal_Bronze,Medal_Silver,Medal_Gold,Season_Summer,Season_Winter
0,AFGHANISTAN,1936,AFG,Berlin,Athletics,Athletics Men's 100 metres,25.0,1,0,1,0,0,0,1.0,0
1,AFGHANISTAN,1936,AFG,Berlin,Athletics,Athletics Men's Long Jump,25.0,1,0,1,0,0,0,1.0,0
2,AFGHANISTAN,1936,AFG,Berlin,Athletics,Athletics Men's Shot Put,23.0,1,0,1,0,0,0,1.0,0
3,AFGHANISTAN,1936,AFG,Berlin,Hockey,Hockey Men's Hockey,24.25,13,0,13,0,0,0,13.0,0
4,AFGHANISTAN,1948,AFG,London,Football,Football Men's Football,,11,0,11,0,0,0,11.0,0


In [7]:
countries = {"GDR":"GERMANY EAST"}
final_df = handle_countries_that_split(countries, final_df, noc)  

In [8]:
polity_dff2 = prepare_polity_dataset("p5v2018.xls", noc)


In [9]:
polity_dff2.head()

Unnamed: 0,scode,country,year,polity,polity2,durable,alternate_region,alternate_noc
0,AFG,AFGHANISTAN,1890,-6,-6.0,,AFGHANISTAN,AFG
1,AFG,AFGHANISTAN,1891,-6,-6.0,,AFGHANISTAN,AFG
2,AFG,AFGHANISTAN,1892,-6,-6.0,,AFGHANISTAN,AFG
3,AFG,AFGHANISTAN,1893,-6,-6.0,,AFGHANISTAN,AFG
4,AFG,AFGHANISTAN,1894,-6,-6.0,,AFGHANISTAN,AFG


In [10]:
polity_dff3 = get_polityshift_column(polity_dff2)

In [22]:
mapper = dict(zip(final_df.NOC, final_df.region))

In [12]:
country_dict = {'BOSNIA': 'BIH', 'CONGO-BRAZZAVILLE': 'CGO', 'CONGO BRAZZAVILLE': 'CGO',
                    'VIETNAM NORTH': 'VIE', 'TIMOR LESTE': 'TLS', 'GERMANY WEST': 'FRG', "COTE D'IVOIRE": 'CIV',
                    'KOREA SOUTH': 'KOR', 'SOUTH VIETNAM': 'VIE', 'SUDAN-NORTH': 'SUD',
                    'TRINIDAD AND TOBAGO': 'TTO', 'UNITED KINGDOM': 'GBR', 'USSR': 'URS', 'YUGOSLAVIA': 'YUG',
                    'SERBIA AND MONTENEGRO': 'YUG', 'YEMEN SOUTH': 'YMD', 'CONGO KINSHASA': 'COD'}
polity_dff3 = map_polity_region_dataset(country_dict, polity_dff2, mapper)


In [13]:
sport_dict = {'Rugby': True,
       'Alpinism':False, 'Speed Skating':False, 'Ice Hockey':True, 'Nordic Combined':False,
       'Rhythmic Gymnastics':False, 'Short Track Speed Skating':False, 'Baseball':True,
       'Softball':True, 'Tug-Of-War':True, 'Ski Jumping':False, 'Lacrosse':True, 'Curling':True,
       'Military Ski Patrol':True, 'Cricket':True, 'Croquet':False, 'Motorboating':True,
       'Basque Pelota':False, 'Aeronautics':False, 'Jeu De Paume':False, 'Racquets':False,
       'Roque':False, 'Athletics':False, 'Hockey':True, 'Football':True, 'Wrestling': False, 'Boxing':False, 'Judo':False,
       'Taekwondo':False, 'Shooting':False, 'Weightlifting':False, 'Swimming':True, 'Cycling':False,
       'Alpine Skiing':False, 'Gymnastics':False, 'Fencing':False, 'Handball':True, 'Tennis':True,
       'Volleyball':True, 'Rowing':True, 'Table Tennis':True, 'Trampolining':False,
       'Cross Country Skiing':False, 'Badminton':False, 'Sailing':True, 'Bobsleigh':True,
       'Archery':False, 'Canoeing':False, 'Snowboarding':False, 'Biathlon':False, 'Basketball':True,
       'Beach Volleyball':True, 'Figure Skating':False, 'Polo':True, 'Equestrianism':True,
       'Water Polo':True, 'Art Competitions':False, 'Modern Pentathlon':False, 'Diving':False,
       'Luge':False, 'Freestyle Skiing':False, 'Triathlon':False, 'Skeleton':False,
       'Synchronized Swimming':True, 'Golf':False, 'Rugby Sevens':True
    
}
final_df = correct_team_medals_won(final_df, sport_dict)


## Countries list to be provided for the graph

In [14]:
final_df.region.unique()

array(['AFGHANISTAN', 'ALBANIA', 'ALGERIA', 'AMERICAN SAMOA', 'ANDORRA',
       'ANGOLA', 'ANTIGUA', 'ARGENTINA', 'ARMENIA', 'ARUBA', 'AUSTRALIA',
       'AUSTRIA', 'AZERBAIJAN', 'BAHAMAS', 'BAHRAIN', 'BANGLADESH',
       'BARBADOS', 'BELARUS', 'BELGIUM', 'BELIZE', 'BENIN', 'BERMUDA',
       'BHUTAN', 'BOLIVA', 'BOSNIA AND HERZEGOVINA', 'BOTSWANA', 'BRAZIL',
       'BRUNEI', 'BULGARIA', 'BURKINA FASO', 'BURUNDI', 'CAMBODIA',
       'CAMEROON', 'CANADA', 'CAPE VERDE', 'CAYMAN ISLANDS',
       'CENTRAL AFRICAN REPUBLIC', 'CHAD', 'CHILE', 'CHINA', 'COLOMBIA',
       'COMOROS', 'COOK ISLANDS', 'COSTA RICA', 'CROATIA', 'CUBA',
       'CURACAO', 'CYPRUS', 'CZECH REPUBLIC',
       'DEMOCRATIC REPUBLIC OF THE CONGO', 'DENMARK', 'DJIBOUTI',
       'DOMINICA', 'DOMINICAN REPUBLIC', 'ECUADOR', 'EGYPT',
       'EL SALVADOR', 'EQUATORIAL GUINEA', 'ERITREA', 'ESTONIA',
       'ETHIOPIA', 'FIJI', 'FINLAND', 'FRANCE', 'GABON', 'GAMBIA',
       'GEORGIA', 'GERMANY', 'GERMANY EAST', 'GHANA', 'GREECE', '

In [15]:
def plot_graphs_for_country(olympic_df, polity_df, country, start_year, end_year):
    plot_country_medal_polity(olympic_df, polity_df, country, start_year, end_year)
    plot_country_medal_to_participants_ratio(olympic_df, polity_df, country, start_year, end_year)
    plot_country_age_polity(olympic_df, polity_df, country, start_year, end_year)
    country_male_female_ratio(olympic_df, polity_df, country, start_year, end_year)
    plot_country_season_wise_participants(olympic_df, polity_df, country, start_year, end_year)

### Results of Analysis

### France:

During 1960, France was recovering and rebuiling after world war II. The polity score dipped to 4 in year 1960, there was decrease in number of medals in the future events, but as polity score increased there was a gradual improvement in performance which resulted in increase in number of medals. During that year, France had its lowest participation ratio of 3.59. The average age of players before war time was observed to be in early 30s while post war there was a higher younger generation participation. The female participation gradually increased once the polity score began to increase after 1960.

### Results of Analysis

### KOREA:

In 1945, Korea split into North Korea (Autocratic) and South Korea (Democratic). During 1960, the April Revolution caused president Syngman Rhee to resign which is why the polity score might have dipped. Since 1972, South Korea's polity score increased, as it became more and more democratic. This might be a cause for its increase in performance over the next few years. While North Korea continued to grow as an autocracy, its performance in Olympics decreased. 1960 onwards, South Korea moved from a low income to middle income country which might have caused an increase in the participation ratio when compared to North Korea. The Male participation was found to be more in South Korea while female participation vs male participation was higher in North Korea.


In [98]:
plot_graphs_for_country(final_df, polity_dff3, 'FRANCE', 1890, 2020)

In [83]:
plot_graphs_for_country(final_df, polity_dff3, 'SOUTH KOREA', 1945, 2014)

In [84]:
plot_graphs_for_country(final_df, polity_dff3, 'NORTH KOREA', 1945, 2014)

### Germany:

In 1916, 1940 and 1944, the Olympics was cancelled due to world war I and II. During 1945-1990, Germany had split into Germany East which was influenced by Russia and Germany West which was influenced by USA, and was democratic. Surprsisingly, Germany East performed well in years 1980 compared to Germany West which showed dip in number of medals. The medals to participation ratio was higher in East Germany as well. It was widely believed that doping was allowed in East Germany but there is no solid evidence. Female participation was found to be higher in East Germany whereas male participation was higher in West Germany.




In [81]:
plot_graphs_for_country(final_df, polity_dff3, 'GERMANY EAST', 1929, 2014)

In [82]:
plot_graphs_for_country(final_df, polity_dff3, 'GERMANY', 1968, 1988)

### INDIA:

India gained independence in 1945 and its democratic consitituion was setup around 1950. The polity graph indicates that INDIA has been consistently democratic except for the dip in polity score around 1976. Reason for this dip being the declaration of emergency from 1975-1977, to recover from the Indo-Pakistan war in 1971-1974. India did not participate in Olympics during this time. During 1990 India was dealing with Kargil war and did not participate there as well. 

Medals Won vs Polity graph shows that before 2000, India has been winning one medal (mostly in field hockey which is the national sport). With increase in the democracy score we do see improvement in number of medals won but it is not significant enough. Same with the medal to participant ratio and number of participants. The reason for this maybe that India does not give much importance to development of athletes since it's still a developing nation. One interesting observation is the increase in female participation with improvement in democracy score. In 2016, male to female ratio is almost close to 1.   

In [85]:
plot_graphs_for_country(final_df, polity_dff3, 'INDIA', 1900, 2016)

### USA:

USA has always been a democratic country performing very well at the olympics. During 1980, the USA boycotted Summer Olympic Games at Moscow in order to protest for the late 1979 Soviet invasion of Afghanistan. This has caused the graph to indicate poor performance in olympics despite the increase in democracy score. 

Whenever the presidential candidate is republican, the polity score dips a bit. But this does not have any significant impact in the olympic performance. Consistent with previous graphs, in the recent years, female participation from the USA in olympics has been increasing and the average age has been around 24-26 years. 


In [51]:
plot_graphs_for_country(final_df, polity_dff3, 'USA', 1900, 2020)

In [19]:
@ipywidgets.interact(CountryName = final_df.region.unique().tolist() , value='KOREA', start_year = np.)
def widget_plotter(CountryName):
    plot_graphs_for_country(final_df, polity_dff3, CountryName, 1929, 2014)

interactive(children=(Dropdown(description='CountryName', options=('AFGHANISTAN', 'ALBANIA', 'ALGERIA', 'AMERI…