# 🏅 Tidy Data Project: Olympics 2008 Medalists
## 📌 Objective
This project applies **tidy data principles** to clean, reshape, and visualize a dataset of **2008 Olympics medalists**.

## 📌 Why Tidy Data?
Tidy data ensures:
1. **Each variable** has its own column.
2. **Each observation** has its own row.
3. **Each type of observational unit** forms its own table.

By structuring data in a **tidy format**, we can easily analyze trends, visualize data, and perform transformations.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Loading data
df = pd.read_csv("TidyData-Project/data/olympics_08_medalists.csv")

# Display basic info
print("Dataset Overview:\n")
print(df.info())

# Display first few rows
df.head()

# Convert wide format to long format
df_long = df.melt(id_vars=["medalist_name"], var_name="sport_gender", value_name="medal")

# Drop rows with missing medal values
df_long = df_long.dropna(subset=["medal"])

# Extract gender and sport separately
df_long[["gender", "sport"]] = df_long["sport_gender"].str.extract(r'^(male|female)_(.*)')

# Clean sport names
df_long["sport"] = df_long["sport"].str.replace("_", " ").str.title()

# Keep only necessary columns
df_long = df_long[["medalist_name", "sport", "gender", "medal"]].reset_index(drop=True)

# Display transformed data
df_long.head()