# Report

## Exploring 120 Years of Olympic History: Trends in Athletes and Results

## Objective

```Analyzing Olympic Trends: A Data-Driven Exploration of Athlete Demographics, Sport Evolution, and Participation Over 120 Years```

The objective of this project is to examine trends in athlete ages, sex, and physical attributes (e.g., height and weight) across 120 years of Olympic history. This includes investigating how the number of participating countries has increased over time and analyzing the rise of female participation, particularly as female categories of sports were introduced—something that was not prevalent in the early 1920s.

Additionally, the project aims to explore the representation of smaller countries, examining their ability to win medals compared to historically dominant nations. Lastly, the analysis seeks to understand age diversity by studying how participants span across different age groups.

## Introduction

This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. The data contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). 
The data can be interpreted in different ways. I aim to understand the trends of the Olympics over the 120 years. I started with understanding the data by going through the different columns for what data can be used to understand the trend. The data had good percentage of missing values for columns like Age, Weight, Height and Medals. These were important columns needed to understand the demographic trends and also to understand what are the top teams with most wins. So, I had to clean up the data for visualization and analysis. 

## Method

```Handling Missing Values```:
* Age, Height, and Weight: Missing values for these columns were replaced with median values to preserve the underlying trends and ensure meaningful analysis.
* Medals: Missing medals were assigned as "None" to maintain the integrity of the dataset and prevent the exclusion of any athlete records.

```Data Cleaning```:
* Removing Duplicate Rows: Ensured data quality by removing duplicate records that could skew the analysis.
* Consistency Checks: Uniform capitalization for country names and renaming columns (e.g., "Team" to "Country") to maintain consistency.


```Exploratory Data Analysis```:
* Grouping and summarizing data to identify patterns over time : For example for understanding the trends of several sports over the years, grouping the data in several decades helped me better understand the trends of the popularity of the sports over the recent decades compared to the past decades. The trends of the sports does not changes over few years. So, this gives more clarity.
* Identifying outliers and anomalies in the data is essential if you have large set of data some outliers can be disregarded.

### Understand dataset

```Data Wrangling```:
* Loading and Inspecting: Loaded the dataset and inspected its structure, including columns, data types, and missing value percentages.
* Copying the DataFrame: Created a working copy to ensure that any modifications did not affect the original dataset.

### Validate Data

```Data Validation```: Checked for duplicates, inconsistencies, and ensured that all numeric columns were in a valid format, free of outliers, and properly standardized.

## Story Telling

![Bar Graph for Gender Participation](bar_graph_sex.png)


```Participants Trends```: Trying to understand the participants based on the sex over the different decades. This shows that majority of the females started participating in late 2000s. However, the first time women started participating in the Olympics was from 1990s. However, the participants only increased significantly once since 1960s. Due to the World War I, **1916 Olympics** was cancelled. Again, World War II, the Olympics for years **1940 & 1944 Olympics** were cancelled.

![Bar Plot](medal_gender.png)

```Bar Plot for Gender Participation```: There has been a steady increase in the number of participants in the Olympics over the years. Small bumps in the trend are noticeable due to higher participation in the Summer Olympics compared to the Winter Olympics, which naturally features fewer sports and athletes. Grouping the data by Year, Season, and Sex reveals distinct trends in gender participation. There has been a notable rise in female participation since the 1980s, especially for the Summer Olympics, where the gap between male and female athletes began to narrow significantly. A similar increase in female participation is also evident in the Winter Olympics starting in 1984.

![Bar Plot](bar_plot_gender_output_expected.png)

```Bar Plot for Gender Representation Over Time:```The trend can be seen here as well in the plot above which again proves that there is significant increase in the number of female athletes in recent years. Also, we can see that the number of participants in the Summer Olympics are significantly more. Possible reason could be since there are more sports events in Summer Olympics compared to the Winter Olympics.

---

![Heatmap](heatmap.png)

``` Heatmap of Medals by Sport and Decade```A heatmap visualized which sports have gained popularity over time. Athletics and Swimming dominated the medal count across decades, while other sports saw fluctuating engagement. The heatmap illustrated how medal distribution has evolved across different sports over time. Athletics and Swimming consistently dominated the medal count across decades, highlighting their popularity and wide range of events.

---

![Scatter Plot](scatterplot.png)

```Countries with higher Participation have higher Medal Counts```:In the Olympics, countries that send larger athlete delegations tend to win more medals. This positive correlation makes intuitive sense: more athletes mean more opportunities to participate in different events, which naturally leads to a higher probability of winning medals. Countries that are able to participate in a wide variety of sports increase their chances of winning medals across multiple disciplines. By fielding athletes in a range of sports, these countries are more likely to secure medals in at least a few events.

---

![Medal Season](imgs/medal_season.png)

```Seasonal Trends for Medals```: The Summer Olympics have consistently attracted a larger number of participants and awarded more medals compared to the Winter Olympics, mainly due to a broader range of sports and events.

---

![BoxPlot](imgs/box_plot.png)

```BMI Distribution by Sport```: A box plot highlighted the diversity of physical requirements across sports, demonstrating how different body types excel in different disciplines. Sports like gymnastics favored leaner body types, whereas weightlifting showed a preference for higher BMI.

![Bar Plot](imgs/top10winners.png)

```Top 10 COuntries with most medal wins```:
* The United States and the Soviet Union have been dominant forces in the Olympics, with large delegation sizes and investments in sports programs leading to significant medal counts.
* Germany, Italy, France, and other European nations have had balanced success across all medal categories, indicating sustained performance in a diverse range of sports.
* Smaller nations like Finland and Australia have managed to compete among the top 10 by specializing in particular disciplines, highlighting that focused training programs can lead to significant success.
* These are mostly the economically flourished countries which are able to invest a lot in these athletes which might be the reason - they have the greatest number of wins.

---

![Bar Plot](imgs/height_sex.png)

Male athletes tend to be taller, with the average height peaking around 180 cm, while female athletes peak around 165 cm. These differences reflect both biological factors and the specific physical requirements of different sports.
The analysis also highlights overlap in height for both genders, particularly in sports where height plays a less critical role, such as gymnastics or diving.

---

## Conclusion

The Olympics have grown not only in size and scope but also in terms of inclusivity and diversity. The increasing representation of female athletes and the rise of new sports have contributed to making the games more equitable and engaging.
Medal success is often a reflection of a nation's ability to invest in and nurture its athletes. Countries with larger athlete delegations generally perform better, but focused training and specialization allow even smaller nations to achieve greatness.
The analysis of athlete physical attributes across sports shows that different body types are favored in different disciplines, emphasizing the unique demands of each sport and the diversity of talents required to excel.

The Olympic Games serve as a reflection of global progress in sports, inclusivity, and competitive excellence. The trends observed in participation, medal success, and the evolution of sports underscore the dynamic nature of the Olympics—continually adapting to reflect societal changes and the pursuit of human excellence.



### References

* https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results/data
* https://www.sports-reference.com/
* https://plotly.com/python/getting-started/

## Acknowlegment

I would like to thank Dr. Nerolu for her great support and guidance to explore these techniques and I really think this would be helpful in the field that I will working on. 