<a href="https://colab.research.google.com/github/Syeda-Eman/Machine_Learning/blob/main/Summer_Olympics_1976_2008.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.
import kagglehub
divyansh22_summer_olympics_medals_path = kagglehub.dataset_download('divyansh22/summer-olympics-medals')

print('Data source import complete.')


![rio-de-janeiro-2016-summer-olympics-e1467812135773.png](attachment:89d504fd-6767-4dd4-b0cf-aa93c9b6596e.png)

# **Breif Analysis of Summer Olympics (1976-2008)**


**From 1976 to 2008, the Summer Olympics experienced significant transformations in both sports and global dynamics. Initially dominated by the Soviet Union and Eastern Bloc countries, this era saw remarkable athletic achievements, such as Nadia ComÄƒneci's perfect 10 in gymnastics in 1976. The inclusion of more women's events marked progress toward gender equality. However, the Games also faced challenges, including financial strains like those in Montreal 1976, political tensions leading to boycotts in 1980 and 1984, and doping controversies. Despite these issues, the Olympics grew into a highly commercialized and professionally organized event, with host cities investing heavily in infrastructure and marketing. This evolution underscored the Olympics' role as a platform for showcasing athletic excellence and fostering international unity amidst complex geopolitical landscapes.**




# **Importing libraries and loading data**

In [None]:
#importing libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import kagglehub

In [None]:
#loading data

# load in the dataset into pandas DF and asigned it to the "df" I also display first 5 rows.
df = pd.read_csv('/kaggle/input/summer-olympics-medals/Summer-Olympic-medals-1976-to-2008.csv', encoding='latin-1' )

# **Exploring Data**

In [None]:
#printing first few rows
df.head()

In [None]:
#printing the num of total rows and columns
df.shape

In [None]:
#checking the type of data
df.dtypes

# **Cleaning Data**

In [None]:
#to check null values
df.info()

In [None]:
#display null values
df[df.isnull().any(axis=1)]

In [None]:
#dropping null values and then checking the shape of dataframe
df.dropna(inplace=True)
df.shape

# **Questions to Answer:**

* What has been the increase in number of athletes over time?
* What has been the increase in participating atheletes over time by gender?
* What countries have been awarded the most number of medals?
* What sports have have the highest number of medals being awarded?

**1. What has been the increase in number of athletes over time?**

In [None]:
#the number of unique athletes per year
athlete_counts = df.groupby('Year')['Athlete'].nunique().reset_index()

#line plot
sns.lineplot(x="Year", y="Athlete", data=athlete_counts)

#adding title and labels
plt.title("Number of athletes by year")
plt.xlabel("Year")
plt.ylabel("Number of Athletes")

#printing plot
plt.show()

**2. What has been the increase in participating atheletes over time by gender?**

In [None]:
# Group by Year and Gender, count medals, and pivot the result
gender_counts = df.groupby(['Year', 'Gender'])['Medal'].count().unstack(fill_value=0)

# Rename columns for clarity
gender_counts.columns = ['Men', 'Women']

# Reset index to make Year a column
gender_counts = gender_counts.reset_index()

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Create the line plot
plt.figure(figsize=(12, 6))
sns.lineplot(data=gender_counts, x='Year', y='Men', label='Men')
sns.lineplot(data=gender_counts, x='Year', y='Women', label='Women')

# Customize the plot
plt.title('Number of Participants by Gender Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Participants')
plt.legend(title='Gender')

# Show the plot
plt.show()

**3. What countries have been awarded the most number of medals?**

In [None]:
#group by country and count the medals
medal_counts = df.groupby('Country')['Medal'].count().reset_index()

#sort the DataFrame by Total Medals in descending order
medal_counts = medal_counts.sort_values('Medal', ascending=False)

#bar plot
plt.figure(figsize=(12, 6))
plt.bar(medal_counts['Country'], medal_counts['Medal'], color='skyblue')

#customize the plot
plt.title('All countries with the Olympic Medals')
plt.xlabel('Country')
plt.ylabel('Total Medals')
plt.xticks(rotation=45, ha='right')

#add value labels on top of each bar
for i, v in enumerate(medal_counts['Medal']):
    plt.text(i, v, str(v), ha='center', va='bottom')

#adjust layout and display the plot
plt.tight_layout()
plt.show()

In [None]:
#group by country and count the medals
medal_counts = df.groupby('Country')['Medal'].count().reset_index()

#sort the DataFrame by Total Medals in descending order and get top 20
top_20_countries = medal_counts.sort_values('Medal', ascending=False).head(20)

#bar plot
plt.figure(figsize=(12, 6))
plt.bar(top_20_countries['Country'], top_20_countries['Medal'], color='skyblue')

#customize the plot
plt.title('Top 20 Countries with the Most Olympic Medals')
plt.xlabel('Country')
plt.ylabel('Total Medals')
plt.xticks(rotation=45, ha='right')

#add value labels on top of each bar
for i, v in enumerate(top_20_countries['Medal']):
    plt.text(i, v, str(v), ha='center', va='bottom')

#adjust layout and display the plot
plt.tight_layout()
plt.show()

**4. What sports have have the highest number of medals being awarded?**

In [None]:
#group by Sport and count the medals
sport_medal_counts = df.groupby('Sport')['Medal'].count().sort_values(ascending=False).reset_index()

#top 10 sports
top_10_sports = sport_medal_counts.head(10)

#bar plot
plt.figure(figsize=(12, 6))
sns.barplot(x='Sport', y='Medal', data=top_10_sports)

#customize the plot
plt.title('Top 10 Sports with the Highest Number of Medals Awarded')
plt.xlabel('Sport')
plt.ylabel('Number of Medals')
plt.xticks(rotation=45, ha='right')

#add value labels on top of each bar
for i, v in enumerate(top_10_sports['Medal']):
    plt.text(i, v, str(v), ha='center', va='bottom')

#adjust layout and display the plot
plt.tight_layout()
plt.show()