# Athlete Filtering

*IMPORTANT: This notebook uses the **..\data\processed\athletes_overview.csv** file created by the **..\src\data\athletes-overview.py** script. Make sure that it exists before running this notebook.*

This notebook aims to filter the entire GoldenCheetah database to select athletes for further analysis. We are interested in selecting athletes who have been active for a substantial duration and ride frequently.

Our current filtering criteria are:
1.  $\text{duration} \geq 183 \text{ days}$
2. $0.5 \text{ rides/day} \leq \text{frequency} \leq 1.1 \text{ rides/day}$

In [68]:
# Importing the required libraries
import pandas as pd

In [69]:
# Reading the data from the file
df = pd.read_csv(r"..\data\processed\athletes_overview.csv")

In [70]:
# Displaying dataframe information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6576 entries, 0 to 6575
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             6576 non-null   object 
 1   gender         6576 non-null   object 
 2   yob            6576 non-null   int64  
 3   numberOfRides  6576 non-null   int64  
 4   duration       6576 non-null   int64  
 5   rideFrequency  6576 non-null   float64
dtypes: float64(1), int64(3), object(2)
memory usage: 308.4+ KB


In [71]:
# Applying filtering criteria to the dataframe
df = df.loc[df["duration"] >= 183].reset_index(drop=True)
df = df.loc[(df["rideFrequency"] >= 0.5) & (df["rideFrequency"] <= 1.1)].reset_index(
    drop=True
)

In [72]:
# Displaying filtered dataframe information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1256 entries, 0 to 1255
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             1256 non-null   object 
 1   gender         1256 non-null   object 
 2   yob            1256 non-null   int64  
 3   numberOfRides  1256 non-null   int64  
 4   duration       1256 non-null   int64  
 5   rideFrequency  1256 non-null   float64
dtypes: float64(1), int64(3), object(2)
memory usage: 59.0+ KB


In [73]:
# Saving the filtered dataframe to a csv file
df.to_csv(r"..\data\interim\athletes_overview_filtered.csv", index=False)

# Female Focus

Exploring number of females in the filtered dataframe

In [74]:
# Counting the number of females in the filtered dataframe
no_females = len(df.loc[df["gender"] == "F"])
print(f"There are {no_females} females in the filtered dataframe.")

There are 35 females in the filtered dataframe.
