## Explore, Clean & Draft
Notebook is for exploring and cleaning the dataset.

The notebook will also work for drafting desiered plots, before moving them over into the functions.py.


In [None]:
import pandas as pd
import functions

In [None]:
df = pd.read_csv("olympic_games.csv")

In [None]:
df.head()

In [None]:
df.info()

## Using Wikipedia to fill in the blanks
[Wikipedia](https://en.wikipedia.org/wiki/Independent_Olympians_at_the_Olympic_Games)

- 2016, row 186, is `Kuwait` according to the Wiki
- 1992, row 816, is Serbia and Montenegro, known until 2003 as the Federal Republic of Yugoslavia but then later split into a state each.
    - According to Wikipedia, each athlete above from 1992 is credited as being from *Serbia*. However, Wikipedia is an open database and such a sensitive topic should be investigated further for proper but for this assignment, `Serbia` will suffice.

In [None]:
df.loc[df['country'] == "Independent Olympic Athletes"]

#### Mixd Teams
We also have `Mixed teams`, rather than investigating and making too many assumptions I will simply drop these as it was only 4 rows

In [None]:
# Show the mixed teams
print(df.loc[df['country'] == "Mixed team"])

## Clean and set proper names
We have plenty of names being a little all over, name changes or other events. Simply by going over the list I was able to puzzle togeheter most of them into ISO standard names using a custom switch case function.

I will then add columns for ISO codes, for easier geographical plot later and total number of medallions.

In [None]:
uncleaned_countries = df['country'].unique()
print(uncleaned_countries)

In [None]:
# Change names to iso names
df = functions.set_country_names(df)
# Time to grab the iso codes
df = functions.clean_olympics(df)
df = functions.set_countries_alpha(df, "country")
# Make a new column with total amount of medals per row
df = functions.set_olympic_medals(df)

# Plots
Using (plotly)[https://plotly.com/]
- All plots have been moved into custom functions in the `functions.py` file for easy (dash)[https://dash.plotly.com/] usage in the dashboard, which can be found over in `app.py` file.

## Athletes, Teams or Competitions
Line chart to show *changing over time*

In [None]:
selected_years=[df['year'].min(), df['year'].max()]
fig = functions.bar_distribution_maker(df, selected_years, game_type="Summer")
fig.show()

## Summer/Winter games
- A simple *pie chart* showing distribution of medals.

In [None]:
fig = functions.summer_winter_games(df)
fig.show()

## Host by country
- Single *bar chart comparision* by country

In [None]:
fig = functions.host_by_country(df)
fig.show()

# Medals
- Medal type per country, *share of a total* composition with a pie chart

In [None]:
fig = functions.pie_plot_medals(df, year=1896)
#fig = pie_plot_medals(df, country="Greece")
#fig = pie_plot_medals(df, year=1896, country="Greece")
fig.show()

## Geo Maps
- Distribution of Country, on the world map and medals earned
    - Total number of medals is used for color. Country `iso_alpha` code is used for location.
- Users can change for type of game (*Summer, Winter*) and adjust the year range with sliders in the final dashboard

In [None]:
fig = functions.make_geo_map(df, selected_years, game_type=None)
fig.show()