## Task 1: Load Data & Import Libraries

**Goal:** Set up your environment and download the dataset for analysis.

### Instructions:

1. Download the dataset using `opendatasets`.  
   - Dataset URL: https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results  
   - You may need to provide your Kaggle API credentials.


2. Set file paths for:
   - `athlete_data_filename` → `athlete_events.csv`
   - `regions_data_filename` → `noc_regions.csv`


3. Install and import the following libraries:
   - `pandas`, `numpy`
   - `matplotlib.pyplot`, `seaborn`
   - `plotly.express`
   - `ListedColormap` from `matplotlib.colors`


**Expected Output:**

- Dataset downloaded

- All libraries successfully imported

- File paths assigned

In [1]:
# 1. Download the dataset using `opendatasets`:
import opendatasets as od
od.download('https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results')

In [2]:
# 2. Set file paths:
athlete_data_filename = './120-years-of-olympic-history-athletes-and-results/athlete_events.csv'
regions_data_filename = './120-years-of-olympic-history-athletes-and-results/noc_regions.csv'

In [3]:
# 3. Install and import the libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from matplotlib.colors import ListedColormap

##  Task 2: Set Plot Style and Color Palette

Goal: Customize the appearance of your plots for consistent and clean visuals.

Instructions:
1. Set the global style for all plots using plt.style.use().
    - Use 'ggplot' for a simple, clean base style.
2. Define a custom color palette:
    - Use a list of hex color codes.
    - Example: ["#0a2e36", "#27FB6B", "#14cc60", "#036d19", "#09a129"]

3. Apply the color palette using Seaborn:
    - Use sns.set_palette().

In [4]:
# 1. Set the global style for all plots:
plt.style.use('ggplot')

In [5]:
# 2. Define a custom color palette:
custom_colors = ["#0d0887", "#5302a3", "#8b0aa5", "#b83289", "#db5c68", "#f48849", "#febd2a", "#f0f921"]

In [6]:
# 3. Apply the color palette:
sns.set_palette(custom_colors)

## Task 3: Data Preparation

Goal: Prepare data for analysis.

Steps:
1. Load the file using pandas.
2. Look for some of the information about the data and the columns.
3. Fix any of the missing or incorrect values.
4. Which, and how much data types are in the dataset.
5. List the minimum age on the competence.

The focus of this EDA project will solely be on the "Summer Olympics", filter of all the "Winter Olympics Games" from the dataset.

In [7]:
# 1. Load the file using pandas.
athletes_df = pd.read_csv(athlete_data_filename)
regions_df = pd.read_csv(regions_data_filename)

In [8]:
# 2. Look for some of the information about the data and the columns.
athletes_df.head(10)
# regions_df.head(10)

In [9]:
# 3. Fix any of the missing or incorrect values.

# Athletes Dataset
    # Look for NULL values
athletes_df.isnull().sum()
    # Replace the NULL values in the "Medal" column for "None"
    # 'Age', 'Height' and 'Weight columns are omitted as they are in task 5.
athletes_df.fillna({'Medal':'None'}, inplace=True)

# Region Dataset
    # Look for NULL values
regions_df.isnull().sum()
    # Replace the NULL values in the "notes" column for "None"
regions_df.fillna({'notes':'None'}, inplace=True)
    # 'region' column has 3 NULL values which values are in the 'notes' column.
    # Replace the NULL values in the "region" column for the corresponding values in the "notes"
mask = regions_df['region'].isnull()
regions_df.loc[mask, 'region'] = regions_df.loc[mask, 'notes']

#  Note: In the Athletes table there are a NOC value "SGP" which represents Singapur by the ISO 3166.
#  Insert the NOC value "SGP" in the Region table
new_row = {
    'NOC':   'SGP',
    'region':'Singapur',
    'notes': 'None'
}
regions_df.loc[len(regions_df)] = new_row

In [10]:
# 4. Which, and how much data types are in the dataset.
athletes_df.info()

In [11]:
# 5. List the minimum age on the competence.
minimum_age = athletes_df['Age'].min()
print(minimum_age)

In [12]:
# 6. Filter of all the "Winter Olympics Games" from the dataset
summer_athletes_df = (athletes_df.loc[athletes_df['Season'] == "Summer"])
summer_athletes_df.info()

## Task 4: Merging The Two Datasets Into One

**Goal:** Merge datasets.

Before we can begin analyzing the data, we need to combine the two datasets:  
- `athlete_events.csv` (athlete information)
- `noc_regions.csv` (region/country information)

Use the `pandas.merge()` function to do this.

###  Steps:

1. **Call `pd.merge()`**  
   This function merges two DataFrames based on one or more common columns (known as keys).

2. **Set merge type and key**  
   We'll perform a **left join** on the `NOC` column:
   - This keeps **all records** from `athlete_events` (left DataFrame).
   - It adds matching `region` data from `noc_regions` (right DataFrame).
   - Rows with no match in the right DataFrame will have `NaN` values in those columns.


In [13]:
# 1. Call `pd.merge()` and 2. Set merge type and key
summer_athletes_regions_df = pd.merge(summer_athletes_df, regions_df, how='left', on='NOC')
summer_athletes_regions_df.info()

In [14]:
# filas_region_nan = summer_athletes_regions_df[summer_athletes_regions_df['Name'].isnull()]
# filas_region_nan.tail(10)

summer_athletes_regions_df.tail()

## Task 5: Finding and Replacing The Null Values In Our Dataset

**Goal:** Data cleaning and exploratory analysis.

### Cleaning Tasks:

- Visualize the distribution of missing values using pie charts or bar plots.

- Calculate and list the percentage of null values for each column. Replace missing values with the mean of the respective column when appropriate.

- Remove duplicate entries from the dataset to ensure accuracy.

### Exploratory Questions:

1. Which country has sent the most athletes to the Summer Olympics?

2. How has the number of athletes, countries, and events changed over time?

3. Which nations have won the most Olympic medals?

4. How has participation by male and female athletes evolved over time?

5. What is the correlation between the height and weight of Olympic participants?

6. In which sports has India won Olympic medals?

7. Which sports have contributed the most medals overall?

In [15]:
# Remove duplicate entries from the dataset to ensure accuracy.
summer_athletes_regions_df = summer_athletes_regions_df.drop_duplicates()
summer_athletes_regions_df.tail()

In [16]:
# Visualize the distribution of missing values using pie charts or bar plots.

    # Calculating the porcentages of null values
missing_values = summer_athletes_regions_df.isnull().sum()
missing_values = missing_values[missing_values>0]
percentage = (missing_values / len(summer_athletes_regions_df)) * 100

    # Plotting
fig, ax = plt.subplots(ncols=len(missing_values), figsize=(5*len(missing_values), 6))

titles = ['Age', 'Height', 'Weight']
for i in range (len(percentage)):
    img1 = ax[i].pie([percentage.iloc[i], 100 - percentage.iloc[i]], labels=[f'NaN ({percentage.iloc[i]:.2f}%)', f'No NaN ({100-percentage.iloc[i]:.2f}%)'])
    ax[i].set_title(titles[i])

In [17]:
titles = ['Age', 'Height', 'Weight']
for i in range (len(percentage)):
    print (percentage.iloc[i])
    print (titles[i])

In [18]:
# Replace missing values with the mean of the respective column when appropriate.

# Our first approach was to filter by the mean considering the Sport and Sex, however some sports doesn't have enough data to consider a mean.
# Then we only consider a mean by sex for each category:
 
mean_ages = round(summer_athletes_regions_df.groupby('Sex')['Age'].mean())
print(mean_ages)
mean_heights = round(summer_athletes_regions_df.groupby('Sex')['Height'].mean(), 1)
print(mean_heights)
mean_weights = round(summer_athletes_regions_df.groupby('Sex')['Weight'].mean(), 1)
print(mean_weights)

# Replace the NaN cell with the corresponding average values.
summer_athletes_regions_df['Age'] = summer_athletes_regions_df['Age'].fillna(summer_athletes_regions_df['Sex'].map(mean_ages))
summer_athletes_regions_df['Height'] = summer_athletes_regions_df['Height'].fillna(summer_athletes_regions_df['Sex'].map(mean_heights))
summer_athletes_regions_df['Weight'] = summer_athletes_regions_df['Weight'].fillna(summer_athletes_regions_df['Sex'].map(mean_weights))
print(summer_athletes_regions_df.isnull().sum())

In [19]:
# 1. Which country has sent the most athletes to the Summer Olympics?
athlete_counts = summer_athletes_regions_df.groupby('region')['ID'].nunique()
# print(athlete_counts)
top_country = athlete_counts.idxmax()
top_count   = athlete_counts.max()
print(f"{top_country} has sent the most athletes with {top_count} participants")

In [20]:
# 2. How has the number of athletes, countries, and events changed over time?
yearly = (summer_athletes_regions_df.groupby('Year').agg({
              'ID':     'nunique',   # unique athletes
              'region': 'nunique',   # unique countries
              'Event':  'nunique'    # unique events
          })
          .rename(columns={
              'ID':     'n_athletes',
              'region': 'n_countries',
              'Event':  'n_events'
          })
          .reset_index()
         )
yearly.head()

In [21]:
# Athletes over the time
plt.figure()
plt.plot(yearly['Year'], yearly['n_athletes'])
plt.title('Number of athletes per edition')
plt.xlabel('Year')
plt.ylabel('Athletes')
plt.show()

In [22]:
# Countries over the time
plt.figure()
plt.plot(yearly['Year'], yearly['n_countries'])
plt.title('Number of countries per edition')
plt.xlabel('Year')
plt.ylabel('Countries')
plt.show()

In [23]:
# Events over the time
plt.figure()
plt.plot(yearly['Year'], yearly['n_events'])
plt.title('Number of events per edition')
plt.xlabel('Year')
plt.ylabel('Events')
plt.show()

Answer: The number of athletes, countries, and events have increased over the time.

In [24]:
# 3. Which nations have won the most Olympic medals?
# There are 3 types of medals: 'Gold', 'Silver' and 'Bronze'
gold_medals_df = (summer_athletes_regions_df.loc[summer_athletes_regions_df['Medal'] == "Gold"])
gold_medal_counts = (gold_medals_df.groupby('region')['Medal'].count().sort_values(ascending=False))
# print(gold_medal_counts)
print(f"{gold_medal_counts.idxmax()} has the most Golden medals with {gold_medal_counts.max()}")

silver_medals_df = (summer_athletes_regions_df.loc[summer_athletes_regions_df['Medal'] == "Silver"])
silver_medals_counts = (silver_medals_df.groupby('region')['Medal'].count().sort_values(ascending=False))
# print(silver_medals_counts)
print(f"{silver_medals_counts.idxmax()} has the most Silver medals with {silver_medals_counts.max()}")

bronze_medals_df = (summer_athletes_regions_df.loc[summer_athletes_regions_df['Medal'] == "Bronze"])
bronze_medals_counts = (bronze_medals_df.groupby('region')['Medal'].count().sort_values(ascending=False))
# print(bronze_medals_counts)
print(f"{bronze_medals_counts.idxmax()} has the most Bronze medals with {bronze_medals_counts.max()}")

In [25]:
# 4. How has participation by male and female athletes evolved over time?
gender_yearly = summer_athletes_regions_df.groupby(['Year', 'Sex'])['ID'].nunique().unstack(fill_value=0)
gender_yearly.head()

In [26]:
plt.figure()
plt.plot(gender_yearly.index, gender_yearly['M'], label='Male')
plt.plot(gender_yearly.index, gender_yearly['F'], label='Female')
plt.title('Genders over time')
plt.xlabel('Year')
plt.ylabel('Athletes')
plt.legend()
plt.show()

Answer: The number of female participants has increased over the time.

In [27]:
# 5. What is the correlation between the height and weight of Olympic participants?
correlation = summer_athletes_regions_df[['Height', 'Weight']].corr()
print(correlation)

Answer: The correlation between the height and weight of the Summer Olympic participants is 0.793861

In [28]:
# 6. In which sports has India won Olympic medals?
india_medals = summer_athletes_regions_df[
    (summer_athletes_regions_df['region'] == 'India') & ((summer_athletes_regions_df['Medal'] != 'None'))
]
# india_medals
india_medals_by_sport = india_medals['Sport'].unique()
print("India has won medals in the following sports:")
for sport in india_medals_by_sport:
    print(f"- {sport}")


In [29]:
# 7. Which sports have contributed the most medals overall?
medals_only = summer_athletes_regions_df[summer_athletes_regions_df['Medal'] != 'None']
# medals_only

# Medals per sport
medals_by_sport = (medals_only.groupby('Sport')['Medal'].count().sort_values(ascending=False))
medals_by_sport.head()

Answer: the sports that has the most medals are: Athletics with 3969, Swimming with 3048, Rowing with 2945, Gymnastics with 2256 and Fencing with 1743


##  Task 6: Exploratory Analysis and Visualisations

**Goal:** Data analysis and visualization.

### 1. Create a word-cloud that graphically shows the nations that have sent the maximum number of athletes over the years.

In [30]:
from wordcloud import WordCloud
athlete_counts = summer_athletes_regions_df.groupby('region')['ID'].nunique()
athlete_counts_dict = athlete_counts.to_dict()
# athlete_counts_dict

wordcloud = WordCloud(width=500, height=300, background_color='white').generate_from_frequencies(athlete_counts_dict)
plt.figure(figsize=(15, 8))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

### 2. Show the relation between various features and labels in the Olympics dataset and infere/discuss any trends and correlations.

In [31]:
# The features and labels in the Olympics dataset are:
# Sex, Age, Height, Weight, Team, NOC, Games, Year, Season,	City, Sport, Event,	Medal, region
# We can make correlation with the numeric values (Age, Height and Weight)
# but also, those values may differ by taking in count their sex.

print('################################################################')
# Correlation of the athletes by Sex taking in count Age, Height and Weight
correlation_SHW = summer_athletes_regions_df.groupby('Sex')[['Age', 'Height', 'Weight']].corr()
print(correlation_SHW)

print('################################################################')
# Correlation of the athletes by Medal taking in count Age, Height and Weight
correlation_MHW = summer_athletes_regions_df.groupby('Medal')[['Age', 'Height', 'Weight']].corr()
print(correlation_MHW)
print('################################################################')

Answer: 

Sex Correlations:
1. Height and Weight are highly related between both sexs (F: 0.74 and M: 0.72), which is reasonable since the higher the weighter.
2. In females, Age has a moderate correlation between Height and Weight.
3. In males, Age is poorly correlated with Height and Weight which may result in a bigger age distribution.

Medal Correlations:
1. In all the cases, the correlation between Height and Weight is strong.
2. Those who did not win a medal have a higher correlation between age and physical characteristics that the medalists. This may be because non-medal athletes have a greater age variability or participate in events where age is a more determining factor.

### 3. Make a plot of the overall spread of the age of athletes in the Summer Olympics and discuss your findings.

In [32]:
plt.figure(figsize=(9, 5))
plt.boxplot(summer_athletes_regions_df['Age'], vert=False, showmeans=True, whis=10)
plt.title('Age Distribution of Athletes in the Summer Olympics')

edad_min = summer_athletes_regions_df['Age'].min()
edad_max = summer_athletes_regions_df['Age'].max()
edad_media = summer_athletes_regions_df['Age'].mean()
edad_mediana = summer_athletes_regions_df['Age'].median()
edad_std = summer_athletes_regions_df['Age'].std()

stats_text = (
    f'Estadistics:\n'
    f'Min: {edad_min} years\n'
    f'Max: {edad_max} years\n'
    f'Media: {edad_media:.1f} years\n'
    f'Mediana: {edad_mediana:.1f} years\n'
    f'Desv. Estándar: {edad_std:.1f} years'
)

plt.figtext(0.6, 0.62, stats_text, bbox=dict(facecolor='white'), fontsize=12)
plt.show()

Answer:

A wide age range is evident from 10 to 97, which indicates a large inclusion of participants by age.

The most common age of participants is between 20 and 30 years old. This indicates that these are the age ranges in which athletes are most competitive. In fact, the median and mean are 25 and 25.6, respectively.

There are also older participants because there are competitions where age is neither an impediment nor a determining factor.


### 4. Make a plot of the number of participants in the Summer Olympics over the years and discuss the overall trends.

In [33]:
# summer_athletes_regions_df.info()
# Note: There are several athletes that participate in different Sports in the same Year. 
#       We decided not to take in count repeated participants in the same year.
participants_by_year = summer_athletes_regions_df.groupby('Year')['ID'].nunique()
# participants_by_year

plt.figure(figsize=(10, 5))
plt.plot(participants_by_year.index, participants_by_year.values, marker='o')
plt.title('Number of Participants in the Summer Olympics over the Years')
plt.xlabel('Year')
plt.ylabel('Athletes')
plt.grid(True)
plt.xticks(rotation=-45)
plt.tight_layout()
plt.show()

Answer: There has been sustained growth from the first Games until the 21st century. Furthermore, declines in the trend can be observed during the war years (1914, 1940, 1944, etc.). Starting in 1980, more countries joined the Olympic Games, and women's participation began to become standardized.

### 5. Describe the variation in the number of female participants over the years in the Summer Olympics.

In [34]:
plt.figure()
plt.plot(gender_yearly.index, gender_yearly['F'].values, label='Female')
plt.title('Number of Female Participants Over the Years')
plt.xlabel('Year')
plt.ylabel('Athletes')
plt.legend()
plt.show()

Answer:

Overall, female participation in the Summer Olympics is growing exponentially.

In the begining (1900 - 1920) the participating of famales is very low,and has a slow increase between 1920 and 1960. After that, the increase is very pronounced due to the struggles for gender equality rights.

### 6. Show graphically the variation of the number of female participants in comparison to male participants over the years.

In [35]:
plt.figure()
plt.plot(gender_yearly.index, gender_yearly['M'], label='Male')
plt.plot(gender_yearly.index, gender_yearly['F'], label='Female')
plt.title('Number of Participants Over the Years')
plt.xlabel('Year')
plt.ylabel('Athletes')
plt.legend()
plt.show()

### 7. Create a scatter plot of the relationship between Height Vs Weight Vs Age of participants across sports. Any conclusions?

In [None]:
top_sports = summer_athletes_regions_df['Sport'].value_counts().head(40).index.tolist()
sports_df = summer_athletes_regions_df[summer_athletes_regions_df['Sport'].isin(top_sports)]

In [None]:
1+1

In [62]:
# Scatter plot simple: Altura vs Deporte

# Seleccionamos los deportes más representativos
top_sports = summer_athletes_regions_df['Sport'].value_counts().head(40).index.tolist()
sports_df = summer_athletes_regions_df[summer_athletes_regions_df['Sport'].isin(top_sports)]

plt.figure(figsize=(12, 8))

for i, sport in enumerate(top_sports):
    sport_data = top_sports[top_sports['Sport'] == sport]
    plt.scatter([i] * len(sport_data), sport_data['Height'], alpha=0.5, s=5)

# Configuramos el gráfico
plt.xlabel('Sports')
plt.ylabel('Height (cm)')
plt.title('Height vs Sports')
plt.xticks(range(len(top_sports)), top_sports, rotation=90, ha='right')
plt.grid(True, linestyle='--', alpha=0.7, axis='y')

plt.tight_layout()
plt.show()

TypeError: list indices must be integers or slices, not str

<Figure size 1200x800 with 0 Axes>

In [49]:
len(sport_data)

In [43]:
sport_stats = summer_athletes_regions_df.groupby('Sport').agg({
    'Age': 'mean',
    'Weight': 'mean',
    'Height': 'mean'
}).reset_index()

x = range(1000)
width = 0.2

fig, ax1 = plt.subplots(figsize=(14, 8))

# Years
scat_age = ax1.scatter([i - width for i in x], summer_athletes_regions_df['Age'], color='blue', label="Edad (años)", s=10)
ax1.set_ylabel('Years', color='blue')
ax1.tick_params(axis='y', labelcolor='blue')

# Height
ax2 = ax1.twinx()
scat_height = ax2.scatter(x, summer_athletes_regions_df['Height'][:1000], color='green', label="Height (cm)", s=10, alpha=0.7)
ax2.set_ylabel('Height (cm)', color='green')
ax2.tick_params(axis='y', labelcolor='green')

# Weight
ax3 = ax1.twinx()
ax3.spines['right'].set_position(('outward', 60))
scat_weight = ax3.scatter([i + width for i in x], summer_athletes_regions_df['Weight'][:1000], color='red', label="Weight (kg)", s=10)
ax3.set_ylabel('Weight (kg)', color='red')
ax3.tick_params(axis='y', labelcolor='red')

# X (Sport)
plt.xticks(x, sport_stats['Sport'])
plt.xlabel('Sport')
plt.title('Height Vs Weight Vs Age across sports')

# Añadir leyenda combinada
lines = [scat_age, scat_height, scat_weight]
labels = ["Years", "Height (cm)", "Weight (kg)"]
plt.legend(lines, labels, loc='upper right')

plt.tight_layout()
plt.show()

### 8. Find and list the top 10 nations that have won the most Gold, Silver, and Bronze Medals, respectively, in the history of the Summer Olympics.

In [37]:
medals_df = summer_athletes_regions_df[summer_athletes_regions_df['Medal'] != 'None']
medals_by_type = (medals_df.groupby(['region', 'Medal'])['ID'].count().unstack(fill_value=0))

top_10_gold = medals_by_type.sort_values(by='Gold', ascending=False).head(10)
top_10_silver = medals_by_type.sort_values(by='Silver', ascending=False).head(10)
top_10_bronze = medals_by_type.sort_values(by='Bronze', ascending=False).head(10)

print('################################################################')
print(top_10_gold['Gold'])
print('################################################################')
print(top_10_silver['Silver'])
print('################################################################')
print(top_10_bronze['Bronze'])
print('################################################################')

### 9. Create a word-cloud showing sports in which India has won medals over the years.

In [38]:
india_medals_df = summer_athletes_regions_df[
    (summer_athletes_regions_df['region'] == 'India') &
    (summer_athletes_regions_df['Medal'] != 'None')
]

sport_counts = india_medals_df['Sport'].value_counts()
sport_freq = sport_counts.to_dict()
# sport_freq

wordcloud = WordCloud(width=600, height=400, background_color='white').generate_from_frequencies(sport_freq)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
# plt.title('Sports in which India has Won Medals Over the Years')
plt.show()


### 10. Look up and list the top 3 female athletes by the number of awarded medals across all sports.

In [None]:
female_medalists = summer_athletes_regions_df[
    (summer_athletes_regions_df['Sex'] == 'F') &
    (summer_athletes_regions_df['Medal'] != 'None')
]

medals_by_female_athlete = (
    female_medalists
    .groupby('Name')['Medal']
    .count()
    .sort_values(ascending=False)
)

# the top 3 female athletes by the number of awarded medals across all sports are:
medals_by_female_athlete.head(3)

  - Does Wealth (GDP) have any effect on a country's performance in the Olympics?



### DGP Dataset

In [None]:
import folium
import json
import requests

country_participation = summer_athletes_regions_df.groupby('region')['ID'].nunique().reset_index()
country_participation.columns = ['country', 'athletes']

# Download the geojson of countries
geo_url = 'https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json'
geo_json = requests.get(geo_url).json()

# Base map
choropleth_map = folium.Map(location=[20, 0], zoom_start=2)

folium.Choropleth(
    geo_data=geo_json,
    name='choropleth',
    data=country_participation,
    columns=['country', 'athletes'],
    key_on='feature.properties.name',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Número total de atletas únicos por país (Juegos Olímpicos de Verano)'
).add_to(choropleth_map)

folium.LayerControl().add_to(choropleth_map)

# choropleth_map.save('olympics_participation_map.html')
choropleth_map

In [9]:
url = "https://raw.githubusercontent.com/bhushanrane29/Summer-Olympics-EDA/master/gdp_data.csv"
gdp_df = pd.read_csv(url)
gdp_df

Unnamed: 0,Country,Code,Year,GDP-Growth,GDP-Per-Capita,GDP
0,Aruba,ABW,1960,,,
1,Afghanistan,AFG,1960,,,
2,Angola,AGO,1960,,,
3,Albania,ALB,1960,,,
4,Andorra,AND,1960,,,
...,...,...,...,...,...,...
15043,Kosovo,XKX,2016,4.145372,4193.631327,7.738508e+09
15044,"Yemen, Rep.",YEM,2016,-2.701475,667.945437,1.903557e+10
15045,South Africa,ZAF,2016,0.787056,7439.919412,4.298757e+11
15046,Zambia,ZMB,2016,3.794901,1672.345428,2.901824e+10


1. What is the relation between a countries climate and their olympic medal tally?

2. Does home advantage give countries an edge in their medals tally? (Linearcurve)

3. Does an athele's height have any role to play in winning an olympic medal? (Heatmap)

4. You can add the Paralympics dataset to this link data too

5. Replace the pie charts with sunburst charts at places where it is possible