-
Notifications
You must be signed in to change notification settings - Fork 132
Closed
Labels
Description
- Day: 04
- Exercise: 02
In this exercise, there is a special hint:
Hint: here and further, if needed, drop duplicated sportspeople to count only unique ones.
Beware to call the dropping function at the right moment and with the right parameters, in order not to omit any individuals.
And I finally found that in the given example, we found this result by removing people having the same name,
BUT, I think we should rather know which rows are duplicated people by their IDs and not names.
Examples
import pandas as pd
from FileLoader import FileLoader
def proportionBySport_ID(df: pd.DataFrame, yr: int, sport: str, gdr: str) -> float:
df = df[(df["Year"]==yr) & (df["Sex"]==gdr)]
df = df[~df.duplicated(subset=["ID"])] # <-- By ID
df_res = df[df["Sport"]==sport]
return (df_res.shape[0] / df.shape[0])
def proportionBySport_Name(df: pd.DataFrame, yr: int, sport: str, gdr: str) -> float:
df = df[(df["Year"]==yr) & (df["Sex"]==gdr)]
df = df[~df.duplicated(subset=["Name"])] # <-- By Name
df_res = df[df["Sport"]==sport]
return (df_res.shape[0] / df.shape[0])
if __name__ == "__main__":
loader = FileLoader()
data = loader.load('./resources/athlete_events.csv')
print(proportionBySport_ID(data, 2004, 'Tennis', 'F')) # 0.019302325581395347
print(proportionBySport_Name(data, 2004, 'Tennis', 'F')) # 0.01935634328358209
print(0.01935634328358209, "-> Example's result")==> prints:
Loading dataset of dimensions 271116 x 15
0.019302325581395347
0.01935634328358209
0.01935634328358209 -> Example's result
MedAymenF