In [13]:
import pandas as pd

# Load CSV files
files = ["mens1500_2006.csv", "mens1500_2010.csv", "mens1500_2014.csv", "mens1500_2018.csv", "mens1500_2022.csv"]
dfs = [pd.read_csv(file) for file in files]

# Concatenate all result files
results_df = pd.concat(dfs, ignore_index=True)

# read skater information
skaters_df = pd.read_csv("speed_skaters.csv")

# Merge results with athlete information based on the athlete's name
merged_df = results_df.merge(skaters_df, left_on="Athlete", right_on="Name", how="left").drop(columns=["Name"])

# Fill missing values in the "Weight" column with the average weight
merged_df["Weight"] = merged_df["Weight"].fillna(merged_df["Weight"].mean())

# Boolean Indexing: Filter skaters taller than 1.80 and heavier than 75
filtered_df = merged_df[(merged_df["Height"] > 1.80) & (merged_df["Weight"] > 75)]

# Perform analysis: Calculate mean height and weight per country
analysis_df = filtered_df.groupby("Country")[["Height", "Weight"]].mean()

print(analysis_df.head())  # First few rows of the analysis


           Height     Weight
Country                     
CAN      1.851429  81.895731
CHN      1.880000  81.816092
FIN      1.880000  88.000000
FRA      1.830000  78.000000
GBR      1.870000  84.000000


### Explanation
I combined Olympic men's 1500m speed skating data from 2006 to 2022 with athlete details like height and weight. After merging the files, i handled missing weight data by replacing NaNs with the average weight.

Then, i filtered the data to focus only on skaters taller than 1.80 and heavier than 75 basically the bigger athletes. To analyze this group, i used the split-apply-combine method to check the mean mean weight and height of athletes from each country.