## Explore, Clean & Draft
Notebook is for exploring and cleaning the dataset.

The notebook will also work for drafting desiered plots, before moving them over into the functions.py.


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
# I will use iso3166 for proper country data (again)
# Plotly for ploting
!pip install plotly
import plotly.express as px
import functions



In [2]:
df = pd.read_csv("olympic_games.csv")

### Using Wikipedia to fill in the blanks
[Wikipedia](https://en.wikipedia.org/wiki/Independent_Olympians_at_the_Olympic_Games)

- row 186 is `Kuwait`
- row 816 is Serbia and Montenegro, known until 2003 as the Federal Republic of Yugoslavia but then later each being their own state.
    - According to Wikipedia, each athlet above from 1992 is credited as being from `Serbia`. However, Wikipedia is an open database and such a sensetive topic should be investigated further for proper. For this assignment, `Serbia` will suffice.

In [3]:
df.loc[df['country'] == "Independent Olympic Athletes"]

Unnamed: 0,year,games_type,host_country,host_city,athletes,teams,competitions,country,gold,silver,bronze
186,2016,Summer,Brazil,Rio,11238,207,306,Independent Olympic Athletes,1,0,1
816,1992,Summer,Spain,Barcelona,9356,169,257,Independent Olympic Athletes,0,1,2


## Mixd Teams
We also have `Mixed teams`, rather than investigating and making too many assumptions I will simply drop these as it was only 4 rows

In [4]:
# Show the mixed teams
print(df.loc[df['country'] == "Mixed team"])

      year games_type   host_country  host_city  athletes  teams  \
1680  1924     Winter         France   Chamonix       260     16   
1752  1904     Summer  United States  St. Louis       651     12   
1763  1900     Summer         France      Paris      1226     26   
1777  1896     Summer         Greece     Athens       241     14   

      competitions     country  gold  silver  bronze  
1680            16  Mixed team     1       0       0  
1752            95  Mixed team     2       1       1  
1763            95  Mixed team     8       5       6  
1777            43  Mixed team     1       0       1  


### Clean and set proper names

In [5]:
# Change names to iso names
df = functions.set_country_names(df)
# Time to grab the iso codes
df = functions.clean_olympics(df)
df = functions.set_countries_alpha(df, "country")
# Make a new column with total amount of medals per row
df = functions.set_olympic_medals(df)

# Plots

## Athletes, Teams or Competitions
Line chart to show *changing over time*

In [6]:
selected_years=[df['year'].min(), df['year'].max()]
fig = functions.bar_distribution_maker(df, selected_years, game_type="Summer")
fig.show()

## Summer/Winter games
- A simple pie chart

In [7]:
fig = functions.summer_winter_games(df)
fig.show()

## Host by country
- Single bar chart by country

In [8]:
fig = functions.host_by_country(df)
fig.show()

# Medals
- Medal type per country, *share of a total* composition with a pie chart

In [10]:
fig = functions.pie_plot_medals(df, year=1896)
#fig = pie_plot_medals(df, country="Greece")
#fig = pie_plot_medals(df, year=1896, country="Greece")
fig.show()

## Geo Maps
- Distribution of Country, on the world map and medals earned
    - I will create a `mean` score on medals earned and color the bubbles accordingly
- Users can change for type of game and adjust the age range with sliders

In [14]:
fig = functions.make_geo_map(df, selected_years, game_type=None)
fig.show()