**In this Notebook we will explore the "20 years of Olympic history: athletes and results" dataset using pandas, and perform various data analysis tasks.**

In order to work with this notebook one would need to download the file `athlete_events.csv` from [Kaggle page](https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results). The dataset has the following features (copied from Kaggle):

- __ID__ - Unique number for each athlete
- __Name__ - Athlete's name
- __Sex__ - M or F
- __Age__ - Integer
- __Height__ - In centimeters
- __Weight__ - In kilograms
- __Team__ - Team name
- __NOC__ - National Olympic Committee 3-letter code
- __Games__ - Year and season
- __Year__ - Integer
- __Season__ - Summer or Winter
- __City__ - Host city
- __Sport__ - Sport
- __Event__ - Event
- __Medal__ - Gold, Silver, Bronze, or NA

In [None]:
import pandas as pd

In [None]:
# Change the path to the dataset file if needed. 
PATH = 'athlete_events.csv'

In [None]:
data = pd.read_csv(PATH)
data.head()

**With the following code we will find out how old were the youngest participants of the 1996 Olympics.**

In [None]:
df = data[(data['Year'] == 1996) & (data.Age < 17)]
df.sort_values(by='Age', ascending=True).head()

**With the following code we will find what was the percentage of male gymnasts among all the male participants of the 2000 Olympics.**

In [None]:
de = data[(data['Year'] == 2000) & (data.Sex == 'M')]
sport = de['Sport'].str.contains('Gymnastics').mean()
sport

**With the following code we will find out what are the mean and standard deviation of height for female basketball players participated in the 2000 Olympics.**

In [None]:
db = data[(data['Year'] == 2000) & (data.Sex == 'F') & (data.Sport == 'Basketball')]
db.describe()

**With the following code we will find a sportsperson that participated in the 2002 Olympics, with the highest weight among other participants of the same Olympics, alongisde their sport.**

In [None]:
dw = data[(data['Year'] == 2002)]
dw.sort_values(by='Weight', ascending=False).head()

**With the following code we will find how many times did Pawe Abratkiewicz participate in the Olympics held in different years.**

In [None]:
dt = data[(data['Name'] == 'Pawe Abratkiewicz')]
dt

**With the following code we will find out how many silver medals in tennis did Australia win at the 2000 Olympics.**

In [None]:
dm = data[(data['Year'] == 2000) & (data.Team == 'Australia') & (data.Medal == 'Silver') & (data.Sport == 'Tennis')]
dm.count()

**With the following code we will find out if it is true that Switzerland won fewer medals than Serbia at the 2016 Olympics.**

In [None]:
dsrb = data[(data['Year'] == 2016) & (data.Team == 'Serbia')]
dsrb.count()

In [None]:
dswt = data[(data['Year'] == 2016) & (data.Team == 'Switzerland')]
dswt.count()

In [None]:
print('Serbia won 54 medals, whereas Switzerland only won 11, so yes the statement is true')

**With the following code we will find what age category did the fewest and the most participants of the 2014 Olympics belong to.**

In [None]:
dage = data[(data['Year'] == 2014) & (data.Age < 36) & (data.Age > 24)]
dage.count()

In [None]:
dage = data[(data['Year'] == 2014) & (data.Age < 56) & (data.Age > 44)]
dage.count()

In [None]:
print('The answer is [45-55] and [25-35]')

**With the following code we will find out if it is true that there were Summer Olympics held in Lake Placid, and that there were Winter Olympics held in Sankt Moritz**

In [None]:
dlocs = data[(data['City'] == 'Lake Placid') & (data.Season == 'Summer')]
dlocs

In [None]:
dlocw = data[(data['City'] == 'Sankt Moritz') & (data.Season == 'Winter')]
dlocw.head()

In [None]:
print('The answer is No, Yes')