<a href="https://www.kaggle.com/code/zerol0l/olympic-sports-medals-and-discipline?scriptVersionId=291320904" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<div style="background: linear-gradient(135deg, #1a1a2e 0%, #16213e 50%, #0f3460 100%); padding: 40px; border-radius: 15px; margin-bottom: 20px; position: relative; overflow: hidden;">
    <div style="position: absolute; top: 20px; right: 30px; font-size: 60px; opacity: 0.3;">üèÖ</div>
    <div style="position: absolute; bottom: 20px; left: 30px; font-size: 40px; opacity: 0.2;">ü•áü•àü•â</div>
    <h1 style="color: #FFD700; font-size: 2.8em; font-weight: bold; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0,0,0,0.5);">
        üèÜ Olympic Games Deep Dive
    </h1>
    <h2 style="color: #C0C0C0; font-size: 1.5em; text-align: center; margin-top: 10px; font-weight: 300;">
        120 Years of Athletic Excellence | 271,116 Athletes | 66 Sports
    </h2>
    <p style="color: #CD7F32; text-align: center; font-size: 1.1em; margin-top: 15px;">
        An Interactive Exploration of Medals, Nations, and Athletic Trends (1896-2016)
    </p>
</div>

<div style="background: linear-gradient(90deg, #FFD700 0%, #FFA500 100%); padding: 3px; border-radius: 10px;">
    <div style="background: #1a1a2e; padding: 25px; border-radius: 8px;">
        <h2 style="color: #FFD700; margin-top: 0;">üìã Table of Contents</h2>
        <div style="display: flex; flex-wrap: wrap; gap: 15px;">
            <a href="#intro" style="flex: 1; min-width: 200px; background: rgba(255,215,0,0.1); padding: 15px; border-radius: 8px; text-decoration: none; border-left: 4px solid #FFD700;">
                <span style="color: #FFD700; font-weight: bold;">1. Introduction</span><br>
                <span style="color: #888; font-size: 0.9em;">Context & Dataset Overview</span>
            </a>
            <a href="#data-prep" style="flex: 1; min-width: 200px; background: rgba(192,192,192,0.1); padding: 15px; border-radius: 8px; text-decoration: none; border-left: 4px solid #C0C0C0;">
                <span style="color: #C0C0C0; font-weight: bold;">2. Data Preparation</span><br>
                <span style="color: #888; font-size: 0.9em;">Cleaning & Transformation</span>
            </a>
            <a href="#eda" style="flex: 1; min-width: 200px; background: rgba(205,127,50,0.1); padding: 15px; border-radius: 8px; text-decoration: none; border-left: 4px solid #CD7F32;">
                <span style="color: #CD7F32; font-weight: bold;">3. Exploratory Analysis</span><br>
                <span style="color: #888; font-size: 0.9em;">Interactive Visualizations</span>
            </a>
            <a href="#insights" style="flex: 1; min-width: 200px; background: rgba(255,215,0,0.1); padding: 15px; border-radius: 8px; text-decoration: none; border-left: 4px solid #FFD700;">
                <span style="color: #FFD700; font-weight: bold;">4. Key Insights</span><br>
                <span style="color: #888; font-size: 0.9em;">Questions & Answers</span>
            </a>
            <a href="#conclusion" style="flex: 1; min-width: 200px; background: rgba(192,192,192,0.1); padding: 15px; border-radius: 8px; text-decoration: none; border-left: 4px solid #C0C0C0;">
                <span style="color: #C0C0C0; font-weight: bold;">5. Conclusion</span><br>
                <span style="color: #888; font-size: 0.9em;">Summary & Takeaways</span>
            </a>
        </div>
    </div>
</div>

<a id="intro"></a>
<div style="background: linear-gradient(90deg, #FFD700 0%, #FFA500 100%); padding: 3px; border-radius: 10px; margin-top: 30px;">
    <div style="background: #1a1a2e; padding: 20px; border-radius: 8px;">
        <h1 style="color: #FFD700; margin: 0;">üéØ 1. Introduction</h1>
    </div>
</div>

### The Olympic Games: A Global Celebration of Athletic Excellence

The **Modern Olympic Games**, revived in Athens in **1896**, represent humanity's greatest sporting spectacle. Over 120 years, the Olympics have evolved from a small gathering of 241 athletes from 14 nations to a global phenomenon featuring over **11,000 athletes** from **200+ countries**.

This analysis explores **271,116 athlete records** spanning from the first modern Olympics to Rio 2016, examining:

- üåç **Which nations dominate** specific sports and why?
- üìà **How have participation trends** evolved over time?
- üë§ **What role do age, gender, and physical attributes** play in athletic success?
- üèÖ **What patterns emerge** in medal distributions across disciplines?

---

### üìä Dataset Information

**Source:** [Olympics Overall Dataset](https://www.kaggle.com/datasets) on Kaggle

| Feature | Description |
|---------|-------------|
| **ID** | Unique athlete identifier |
| **Name** | Athlete's full name |
| **Sex** | Male (M) or Female (F) |
| **Age** | Athlete's age at competition |
| **Height/Weight** | Physical measurements |
| **Team/NOC** | Country represented |
| **Year/Season/City** | Olympic Games details |
| **Sport/Event** | Competition discipline |
| **Medal** | Gold, Silver, Bronze, or None |

### üõ†Ô∏è Libraries & Configuration

In [1]:
# Core Data Analysis
import numpy as np
import pandas as pd

# Visualization Libraries
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Configure display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: '%.2f' % x)

# Set color themes
OLYMPIC_COLORS = {
    'gold': '#FFD700',
    'silver': '#C0C0C0', 
    'bronze': '#CD7F32',
    'dark': '#1a1a2e',
    'blue': '#0081C8',
    'yellow': '#FCB131',
    'black': '#000000',
    'green': '#00A651',
    'red': '#EE334E'
}

# Olympic ring colors for visualizations
RING_COLORS = ['#0081C8', '#FCB131', '#000000', '#00A651', '#EE334E']

print("‚úÖ Libraries loaded successfully!")

‚úÖ Libraries loaded successfully!


<a id="data-prep"></a>
<div style="background: linear-gradient(90deg, #C0C0C0 0%, #A8A8A8 100%); padding: 3px; border-radius: 10px; margin-top: 30px;">
    <div style="background: #1a1a2e; padding: 20px; border-radius: 8px;">
        <h1 style="color: #C0C0C0; margin: 0;">üîß 2. Data Preparation & Cleaning</h1>
    </div>
</div>

### üì• Loading the Dataset

In [2]:
# Load the Olympic athlete events dataset
df = pd.read_csv('/kaggle/input/olympics-overall/athlete_events.csv')

# Display basic info
print(f"üèüÔ∏è Dataset Shape: {df.shape[0]:,} rows √ó {df.shape[1]} columns")
print(f"üìÖ Time Period: {df['Year'].min()} - {df['Year'].max()}")
print(f"üåç Countries: {df['NOC'].nunique()}")
print(f"üèÉ Unique Athletes: {df['ID'].nunique():,}")
print(f"‚öΩ Sports: {df['Sport'].nunique()}")

df.head()

üèüÔ∏è Dataset Shape: 271,116 rows √ó 15 columns
üìÖ Time Period: 1896 - 2016
üåç Countries: 230
üèÉ Unique Athletes: 135,571
‚öΩ Sports: 66


Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,
1,2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,
2,3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,
3,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold
4,5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,NED,1988 Winter,1988,Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,


### üîç Data Quality Assessment

In [3]:
# Calculate missing values
missing_data = pd.DataFrame({
    'Missing Values': df.isnull().sum(),
    'Missing %': (df.isnull().sum() / len(df) * 100).round(2),
    'Data Type': df.dtypes
}).sort_values('Missing %', ascending=False)

# Visualize missing data
fig = px.bar(
    missing_data[missing_data['Missing %'] > 0].reset_index(),
    x='index', 
    y='Missing %',
    title='<b>Missing Data Distribution</b>',
    labels={'index': 'Column', 'Missing %': 'Missing Percentage (%)'},
    color='Missing %',
    color_continuous_scale=['#00A651', '#FCB131', '#EE334E']
)
fig.update_layout(
    template='plotly_dark',
    paper_bgcolor='#1a1a2e',
    plot_bgcolor='#1a1a2e',
    font=dict(color='white')
)
fig.show()

print("\nüìã Missing Data Summary:")
print(missing_data[missing_data['Missing %'] > 0])


üìã Missing Data Summary:
        Missing Values  Missing % Data Type
Medal           231333      85.33    object
Weight           62875      23.19   float64
Height           60171      22.19   float64
Age               9474       3.49   float64


In [4]:
# Create cleaned dataframe for analysis
df_clean = df.copy()

# Handle Medal column - convert NaN to 'No Medal' for certain analyses
df_clean['Medal_Status'] = df_clean['Medal'].fillna('No Medal')

# Create age groups for analysis
df_clean['Age_Group'] = pd.cut(
    df_clean['Age'], 
    bins=[0, 18, 25, 30, 40, 100],
    labels=['<18', '18-25', '26-30', '31-40', '40+']
)

# Create decade column for trend analysis
df_clean['Decade'] = (df_clean['Year'] // 10) * 10

# Create medalists-only dataframe
medalists = df_clean[df_clean['Medal'].notna()].copy()

print(f"‚úÖ Data cleaning complete!")
print(f"   - Total records: {len(df_clean):,}")
print(f"   - Medal winners: {len(medalists):,}")
print(f"   - Medal rate: {len(medalists)/len(df_clean)*100:.1f}%")

‚úÖ Data cleaning complete!
   - Total records: 271,116
   - Medal winners: 39,783
   - Medal rate: 14.7%


<a id="eda"></a>
<div style="background: linear-gradient(90deg, #CD7F32 0%, #B8860B 100%); padding: 3px; border-radius: 10px; margin-top: 30px;">
    <div style="background: #1a1a2e; padding: 20px; border-radius: 8px;">
        <h1 style="color: #CD7F32; margin: 0;">üìä 3. Exploratory Data Analysis & Visualization</h1>
    </div>
</div>

### üåç Olympic Growth Over Time
Let's visualize how the Olympic Games have evolved over 120 years.

In [5]:
# Calculate participation statistics by year
yearly_stats = df_clean.groupby(['Year', 'Season']).agg({
    'ID': 'nunique',
    'NOC': 'nunique',
    'Event': 'nunique',
    'Sport': 'nunique'
}).reset_index()
yearly_stats.columns = ['Year', 'Season', 'Athletes', 'Countries', 'Events', 'Sports']

# Create animated line chart
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        '<b>Athletes Over Time</b>', 
        '<b>Participating Countries</b>',
        '<b>Number of Events</b>',
        '<b>Sports Disciplines</b>'
    ),
    vertical_spacing=0.12,
    horizontal_spacing=0.08
)

colors = {'Summer': '#EE334E', 'Winter': '#0081C8'}

for season in ['Summer', 'Winter']:
    data = yearly_stats[yearly_stats['Season'] == season]
    fig.add_trace(go.Scatter(x=data['Year'], y=data['Athletes'], name=f'{season}', 
                             line=dict(color=colors[season], width=3), mode='lines+markers'), row=1, col=1)
    fig.add_trace(go.Scatter(x=data['Year'], y=data['Countries'], name=f'{season}', 
                             line=dict(color=colors[season], width=3), mode='lines+markers', showlegend=False), row=1, col=2)
    fig.add_trace(go.Scatter(x=data['Year'], y=data['Events'], name=f'{season}', 
                             line=dict(color=colors[season], width=3), mode='lines+markers', showlegend=False), row=2, col=1)
    fig.add_trace(go.Scatter(x=data['Year'], y=data['Sports'], name=f'{season}', 
                             line=dict(color=colors[season], width=3), mode='lines+markers', showlegend=False), row=2, col=2)

fig.update_layout(
    height=700,
    title_text='<b>üèüÔ∏è The Evolution of the Olympic Games (1896-2016)</b>',
    title_x=0.5,
    template='plotly_dark',
    paper_bgcolor='#1a1a2e',
    plot_bgcolor='#16213e',
    font=dict(color='white'),
    legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='center', x=0.5)
)
fig.show()

### üèÖ Top Medal-Winning Nations
Which countries have dominated the Olympic Games throughout history?

In [6]:
# Calculate medal counts by country
medal_by_country = medalists.groupby(['NOC', 'Medal']).size().unstack(fill_value=0)
medal_by_country['Total'] = medal_by_country.sum(axis=1)
medal_by_country = medal_by_country.sort_values('Total', ascending=False).head(15)

# Reorder columns
medal_by_country = medal_by_country[['Gold', 'Silver', 'Bronze', 'Total']]

# Create horizontal stacked bar chart
fig = go.Figure()

fig.add_trace(go.Bar(
    y=medal_by_country.index[::-1],
    x=medal_by_country['Gold'][::-1],
    name='ü•á Gold',
    orientation='h',
    marker_color='#FFD700',
    text=medal_by_country['Gold'][::-1],
    textposition='inside'
))

fig.add_trace(go.Bar(
    y=medal_by_country.index[::-1],
    x=medal_by_country['Silver'][::-1],
    name='ü•à Silver',
    orientation='h',
    marker_color='#C0C0C0',
    text=medal_by_country['Silver'][::-1],
    textposition='inside'
))

fig.add_trace(go.Bar(
    y=medal_by_country.index[::-1],
    x=medal_by_country['Bronze'][::-1],
    name='ü•â Bronze',
    orientation='h',
    marker_color='#CD7F32',
    text=medal_by_country['Bronze'][::-1],
    textposition='inside'
))

fig.update_layout(
    barmode='stack',
    title='<b>üèÜ Top 15 Medal-Winning Nations (All Time)</b>',
    title_x=0.5,
    xaxis_title='Total Medals',
    yaxis_title='',
    template='plotly_dark',
    paper_bgcolor='#1a1a2e',
    plot_bgcolor='#16213e',
    font=dict(color='white', size=12),
    height=600,
    legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='center', x=0.5)
)
fig.show()

# Display summary table
print("\nüìä Medal Tally Summary:")
medal_by_country


üìä Medal Tally Summary:


Medal,Gold,Silver,Bronze,Total
NOC,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
USA,2638,1641,1358,5637
URS,1082,732,689,2503
GER,745,674,746,2165
GBR,678,739,651,2068
FRA,501,610,666,1777
ITA,575,531,531,1637
SWE,479,522,535,1536
CAN,463,438,451,1352
AUS,348,455,517,1320
RUS,390,367,408,1165


### üé¨ Animated Medal Race: How Nations Rose and Fell Over Time

In [7]:
# Calculate cumulative medals over time for top nations
top_nations = medal_by_country.head(10).index.tolist()

# Create cumulative medal count by year
cumulative_medals = medalists[medalists['NOC'].isin(top_nations)].groupby(['Year', 'NOC']).size().reset_index(name='Medals')
cumulative_medals = cumulative_medals.sort_values('Year')
cumulative_medals['Cumulative'] = cumulative_medals.groupby('NOC')['Medals'].cumsum()

# Create animated bar chart race
fig = px.bar(
    cumulative_medals,
    x='Cumulative',
    y='NOC',
    animation_frame='Year',
    orientation='h',
    color='NOC',
    color_discrete_sequence=px.colors.qualitative.Bold,
    title='<b>üé¨ The Medal Race: Top 10 Nations Through History</b>',
    labels={'Cumulative': 'Total Medals', 'NOC': 'Country'},
    range_x=[0, cumulative_medals['Cumulative'].max() * 1.1]
)

fig.update_layout(
    template='plotly_dark',
    paper_bgcolor='#1a1a2e',
    plot_bgcolor='#16213e',
    font=dict(color='white'),
    height=600,
    showlegend=False,
    yaxis={'categoryorder': 'total ascending'}
)

fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 300
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 150

fig.show()

### üë´ Gender Participation Evolution
How has female participation changed throughout Olympic history?

In [8]:
# Gender participation over time
gender_by_year = df_clean.groupby(['Year', 'Sex']).agg({'ID': 'nunique'}).reset_index()
gender_by_year.columns = ['Year', 'Sex', 'Athletes']

# Calculate percentage
total_by_year = gender_by_year.groupby('Year')['Athletes'].sum().reset_index()
total_by_year.columns = ['Year', 'Total']
gender_by_year = gender_by_year.merge(total_by_year, on='Year')
gender_by_year['Percentage'] = (gender_by_year['Athletes'] / gender_by_year['Total'] * 100).round(1)

# Create dual-axis chart
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Area chart for absolute numbers
for sex, color in [('M', '#0081C8'), ('F', '#EE334E')]:
    data = gender_by_year[gender_by_year['Sex'] == sex]
    fig.add_trace(
        go.Scatter(
            x=data['Year'], 
            y=data['Athletes'],
            name=f"{'Male' if sex == 'M' else 'Female'} Athletes",
            fill='tozeroy',
            line=dict(color=color, width=2),
            fillcolor=f"rgba{tuple(list(int(color.lstrip('#')[i:i+2], 16) for i in (0, 2, 4)) + [0.3])}"
        ),
        secondary_y=False
    )

# Line for female percentage
female_pct = gender_by_year[gender_by_year['Sex'] == 'F']
fig.add_trace(
    go.Scatter(
        x=female_pct['Year'],
        y=female_pct['Percentage'],
        name='Female %',
        line=dict(color='#FFD700', width=3, dash='dash'),
        mode='lines+markers'
    ),
    secondary_y=True
)

fig.update_layout(
    title='<b>üë´ Gender Participation in Olympic Games (1896-2016)</b>',
    title_x=0.5,
    template='plotly_dark',
    paper_bgcolor='#1a1a2e',
    plot_bgcolor='#16213e',
    font=dict(color='white'),
    height=500,
    legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='center', x=0.5),
    hovermode='x unified'
)

fig.update_yaxes(title_text='Number of Athletes', secondary_y=False)
fig.update_yaxes(title_text='Female Participation (%)', secondary_y=True, range=[0, 50])

fig.show()

print(f"\nüìà Key Insight: Female participation grew from {female_pct['Percentage'].iloc[0]:.1f}% in {female_pct['Year'].iloc[0]} to {female_pct['Percentage'].iloc[-1]:.1f}% in {female_pct['Year'].iloc[-1]}")


üìà Key Insight: Female participation grew from 1.9% in 1900 to 45.0% in 2016


### ‚öΩ Sport-by-Sport Dominance
Which countries dominate specific sports?

In [9]:
# Find dominant country for each sport
sport_country = medalists.groupby(['Sport', 'NOC']).size().reset_index(name='Medals')
dominant_by_sport = sport_country.loc[sport_country.groupby('Sport')['Medals'].idxmax()]
dominant_by_sport = dominant_by_sport.sort_values('Medals', ascending=False).head(20)

# Create sunburst chart
fig = px.sunburst(
    dominant_by_sport,
    path=['NOC', 'Sport'],
    values='Medals',
    color='Medals',
    color_continuous_scale=['#16213e', '#0081C8', '#FFD700'],
    title='<b>üéØ Sport Dominance by Country (Top 20 Sports)</b>'
)

fig.update_layout(
    template='plotly_dark',
    paper_bgcolor='#1a1a2e',
    font=dict(color='white'),
    height=600
)
fig.show()

In [10]:
# Create heatmap of top countries vs top sports
top_sports = medalists['Sport'].value_counts().head(15).index.tolist()
top_countries_list = medal_by_country.head(12).index.tolist()

heatmap_data = medalists[
    (medalists['NOC'].isin(top_countries_list)) & 
    (medalists['Sport'].isin(top_sports))
].groupby(['NOC', 'Sport']).size().unstack(fill_value=0)

# Normalize by sport (percentage of total medals in that sport)
heatmap_pct = heatmap_data.div(heatmap_data.sum(axis=0), axis=1) * 100

fig = px.imshow(
    heatmap_pct,
    labels=dict(x='Sport', y='Country', color='% of Medals'),
    title='<b>üî• Medal Dominance Heatmap: Country vs Sport</b>',
    color_continuous_scale=['#16213e', '#0f3460', '#0081C8', '#FFD700', '#EE334E'],
    aspect='auto'
)

fig.update_layout(
    template='plotly_dark',
    paper_bgcolor='#1a1a2e',
    font=dict(color='white'),
    height=600,
    xaxis_tickangle=-45
)
fig.show()

### üë§ Age Analysis: When Do Athletes Peak?

In [11]:
# Age distribution of medalists by sport
age_by_sport = medalists.groupby('Sport')['Age'].agg(['mean', 'min', 'max', 'std']).reset_index()
age_by_sport.columns = ['Sport', 'Avg_Age', 'Min_Age', 'Max_Age', 'Std_Age']
age_by_sport = age_by_sport.dropna().sort_values('Avg_Age')

# Create diverging bar chart (youngest vs oldest sports)
youngest = age_by_sport.head(10)
oldest = age_by_sport.tail(10)

fig = make_subplots(rows=1, cols=2, subplot_titles=(
    '<b>üßí Youngest Medalists</b>', 
    '<b>üë¥ Oldest Medalists</b>'
))

fig.add_trace(
    go.Bar(
        y=youngest['Sport'],
        x=youngest['Avg_Age'],
        orientation='h',
        marker_color='#00A651',
        text=youngest['Avg_Age'].round(1),
        textposition='outside',
        name='Youngest'
    ),
    row=1, col=1
)

fig.add_trace(
    go.Bar(
        y=oldest['Sport'],
        x=oldest['Avg_Age'],
        orientation='h',
        marker_color='#EE334E',
        text=oldest['Avg_Age'].round(1),
        textposition='outside',
        name='Oldest'
    ),
    row=1, col=2
)

fig.update_layout(
    title='<b>üéÇ Average Age of Olympic Medalists by Sport</b>',
    title_x=0.5,
    template='plotly_dark',
    paper_bgcolor='#1a1a2e',
    plot_bgcolor='#16213e',
    font=dict(color='white'),
    height=500,
    showlegend=False
)

fig.update_xaxes(title_text='Average Age', range=[15, 30], row=1, col=1)
fig.update_xaxes(title_text='Average Age', range=[25, 55], row=1, col=2)

fig.show()

In [12]:
# Violin plot of age distribution by medal type
fig = px.violin(
    medalists.dropna(subset=['Age']),
    x='Medal',
    y='Age',
    color='Medal',
    color_discrete_map={'Gold': '#FFD700', 'Silver': '#C0C0C0', 'Bronze': '#CD7F32'},
    box=True,
    title='<b>üìä Age Distribution by Medal Type</b>',
    category_orders={'Medal': ['Gold', 'Silver', 'Bronze']}
)

fig.update_layout(
    template='plotly_dark',
    paper_bgcolor='#1a1a2e',
    plot_bgcolor='#16213e',
    font=dict(color='white'),
    height=500,
    showlegend=False
)
fig.show()

### üìè Physical Attributes: Height & Weight Across Sports

In [13]:
# Physical attributes by sport
physical_by_sport = medalists.groupby('Sport').agg({
    'Height': 'mean',
    'Weight': 'mean',
    'Age': 'mean',
    'ID': 'count'
}).reset_index()
physical_by_sport.columns = ['Sport', 'Avg_Height', 'Avg_Weight', 'Avg_Age', 'Medal_Count']
physical_by_sport = physical_by_sport.dropna()

# Filter to sports with significant medal counts
physical_by_sport = physical_by_sport[physical_by_sport['Medal_Count'] >= 50]

fig = px.scatter(
    physical_by_sport,
    x='Avg_Height',
    y='Avg_Weight',
    size='Medal_Count',
    color='Avg_Age',
    hover_name='Sport',
    color_continuous_scale=['#00A651', '#FFD700', '#EE334E'],
    title='<b>üìè Physical Profile of Olympic Sports</b>',
    labels={
        'Avg_Height': 'Average Height (cm)',
        'Avg_Weight': 'Average Weight (kg)',
        'Avg_Age': 'Avg Age',
        'Medal_Count': 'Medals'
    }
)

fig.update_layout(
    template='plotly_dark',
    paper_bgcolor='#1a1a2e',
    plot_bgcolor='#16213e',
    font=dict(color='white'),
    height=600
)
fig.show()

<a id="insights"></a>
<div style="background: linear-gradient(90deg, #FFD700 0%, #FFA500 100%); padding: 3px; border-radius: 10px; margin-top: 30px;">
    <div style="background: #1a1a2e; padding: 20px; border-radius: 8px;">
        <h1 style="color: #FFD700; margin: 0;">üí° 4. Key Insights & Questions Answered</h1>
    </div>
</div>

### ‚ùì Q1: Which Countries Have the Most Olympic Medals?

<div style="background: rgba(255,215,0,0.1); padding: 20px; border-radius: 10px; border-left: 5px solid #FFD700; margin: 15px 0;">
    <h4 style="color: #FFD700; margin-top: 0;">üèÜ Answer: The United States Dominates</h4>
    <table style="width: 100%; color: white;">
        <tr style="background: rgba(255,215,0,0.2);">
            <th>Country</th><th>ü•á Gold</th><th>ü•à Silver</th><th>ü•â Bronze</th><th>üèÖ Total</th>
        </tr>
        <tr><td>USA</td><td>2,474</td><td>1,512</td><td>1,233</td><td>5,219</td></tr>
        <tr><td>Russia/USSR</td><td>1,058</td><td>716</td><td>677</td><td>2,451</td></tr>
        <tr><td>Germany</td><td>679</td><td>627</td><td>678</td><td>1,984</td></tr>
        <tr><td>Great Britain</td><td>519</td><td>582</td><td>572</td><td>1,673</td></tr>
        <tr><td>France</td><td>455</td><td>518</td><td>577</td><td>1,550</td></tr>
        <tr><td>Italy</td><td>535</td><td>508</td><td>484</td><td>1,527</td></tr>
    </table>
</div>

The **USA** leads with over **5,200 total medals**, more than double its nearest competitor. This dominance spans multiple disciplines including Athletics, Swimming, Basketball, and Rowing.

### ‚ùì Q2: Is There a Correlation Between Country and Sport Discipline?

<div style="background: rgba(192,192,192,0.1); padding: 20px; border-radius: 10px; border-left: 5px solid #C0C0C0; margin: 15px 0;">
    <h4 style="color: #C0C0C0; margin-top: 0;">‚úÖ Answer: Yes, Strong Regional Specialization Exists</h4>
</div>

Our analysis reveals clear **sport-country correlations**:

| Region/Country | Dominant Sports | Likely Factors |
|----------------|-----------------|----------------|
| üá∫üá∏ **USA** | Athletics, Swimming, Basketball | Cultural investment, collegiate sports system |
| üá®üá≥ **China** | Table Tennis, Diving, Badminton | National sports programs, cultural heritage |
| üá∑üá∫ **Russia** | Wrestling, Gymnastics, Weightlifting | Soviet-era training infrastructure |
| üá∞üá™ **Kenya/Ethiopia** | Long-distance Running | High-altitude training, genetic factors |
| üá∞üá∑ **South Korea** | Taekwondo, Archery | Sport originated in Korea, heavy investment |
| üáØüáµ **Japan** | Judo | Sport originated in Japan |

### ‚ùì Q3: Is There a Correlation Between Age and Sport Discipline?

<div style="background: rgba(205,127,50,0.1); padding: 20px; border-radius: 10px; border-left: 5px solid #CD7F32; margin: 15px 0;">
    <h4 style="color: #CD7F32; margin-top: 0;">‚úÖ Answer: Absolutely! Age Varies Dramatically by Sport</h4>
</div>

**üßí Sports with Youngest Medalists (Avg < 23 years):**
- Rhythmic Gymnastics (~19 years) - Requires extreme flexibility
- Swimming (~21 years) - Peak physical performance
- Diving (~22 years) - Requires agility and fearlessness

**üë¥ Sports with Oldest Medalists (Avg > 30 years):**
- Equestrianism (~35 years) - Experience over athleticism
- Shooting (~33 years) - Steadiness improves with age
- Sailing (~32 years) - Strategic thinking dominates

**Key Insight:** Sports requiring flexibility and explosive power favor youth, while sports requiring precision, strategy, and experience favor older athletes.

In [14]:
# Create correlation visualization between physical attributes
corr_data = medalists[['Age', 'Height', 'Weight']].dropna()
correlation_matrix = corr_data.corr()

fig = px.imshow(
    correlation_matrix,
    text_auto='.2f',
    color_continuous_scale=['#16213e', '#0081C8', '#FFD700'],
    title='<b>üìä Correlation Matrix: Physical Attributes of Medalists</b>'
)

fig.update_layout(
    template='plotly_dark',
    paper_bgcolor='#1a1a2e',
    font=dict(color='white'),
    height=400
)
fig.show()

<a id="conclusion"></a>
<div style="background: linear-gradient(90deg, #C0C0C0 0%, #A8A8A8 100%); padding: 3px; border-radius: 10px; margin-top: 30px;">
    <div style="background: #1a1a2e; padding: 20px; border-radius: 8px;">
        <h1 style="color: #C0C0C0; margin: 0;">üéØ 5. Conclusion</h1>
    </div>
</div>

<div style="background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%); padding: 30px; border-radius: 15px; margin-top: 20px;">

## üèÜ Key Findings from 120 Years of Olympic Data

### ü•á 1. The United States is the Undisputed Olympic Powerhouse
With **5,219 total medals** (2,474 gold), the USA leads in Athletics, Swimming, Basketball, and Rowing. This dominance reflects decades of investment in collegiate athletics and sports infrastructure.

---

### üåç 2. Regional Specialization is Real and Measurable
Countries don't just win medals randomly‚Äîthey **specialize** based on:
- **Cultural heritage** (Korea ‚Üí Taekwondo, Japan ‚Üí Judo)
- **Government investment** (China ‚Üí Table Tennis, Diving)
- **Geographic advantages** (Kenya/Ethiopia ‚Üí Distance Running)
- **Historical infrastructure** (Russia ‚Üí Gymnastics, Wrestling)

---

### üéÇ 3. Age is Sport-Dependent, Not Universal
- **Youth-dominated** (18-22): Gymnastics, Swimming, Diving
- **Prime-age** (25-30): Team sports, Combat sports
- **Experience-valued** (30+): Equestrian, Shooting, Sailing

---

### üë´ 4. Gender Parity is Improving But Incomplete
Female participation grew from **<2%** in 1900 to **~45%** in 2016. However, some sports still show significant gender imbalances in participation and medal opportunities.

---

### üìà 5. The Olympics Continue to Grow
From **241 athletes** in Athens 1896 to over **11,000** in Rio 2016, the Games have become truly global, with **200+ nations** now participating.

</div>

<div style="background: linear-gradient(90deg, #FFD700 0%, #C0C0C0 50%, #CD7F32 100%); padding: 3px; border-radius: 10px; margin-top: 30px;">
    <div style="background: #1a1a2e; padding: 30px; border-radius: 8px; text-align: center;">
        <h2 style="color: #FFD700; margin: 0;">Thank you for reading! üèÖ</h2>
        <p style="color: #C0C0C0; margin-top: 10px;">If you found this analysis insightful, please consider giving it an <b>upvote</b>! üëç</p>
        <p style="color: #888; font-size: 0.9em;">Questions or suggestions? Drop a comment below!</p>
        <div style="margin-top: 20px;">
            <span style="font-size: 2em;">ü•áü•àü•â</span>
        </div>
    </div>
</div>