ProportionBySport by ID (and not by Name)

* Day: 04
* Exercise: 02

In this exercise, there is a special hint:
```
Hint: here and further, if needed, drop duplicated sportspeople to count only unique ones.
Beware to call the dropping function at the right moment and with the right parameters, in order not to omit any individuals.
```
And I finally found that in the given example, we found this result by removing people having the **same name**,
BUT, I think we should rather know which rows are duplicated people by their **IDs** and not names.

**Examples**

```python
import pandas as pd
from FileLoader import FileLoader

def proportionBySport_ID(df: pd.DataFrame, yr: int, sport: str, gdr: str) -> float:
    df = df[(df["Year"]==yr) & (df["Sex"]==gdr)]
    df = df[~df.duplicated(subset=["ID"])] # <-- By ID
    df_res = df[df["Sport"]==sport]
    return (df_res.shape[0] / df.shape[0])

def proportionBySport_Name(df: pd.DataFrame, yr: int, sport: str, gdr: str) -> float:
    df = df[(df["Year"]==yr) & (df["Sex"]==gdr)]
    df = df[~df.duplicated(subset=["Name"])] # <-- By Name
    df_res = df[df["Sport"]==sport]
    return (df_res.shape[0] / df.shape[0])

if __name__ == "__main__":
    loader = FileLoader()
    data = loader.load('./resources/athlete_events.csv')
    print(proportionBySport_ID(data, 2004, 'Tennis', 'F')) # 0.019302325581395347
    print(proportionBySport_Name(data, 2004, 'Tennis', 'F')) # 0.01935634328358209
    print(0.01935634328358209, "-> Example's result")
```
==> prints:
```
Loading dataset of dimensions 271116 x 15
0.019302325581395347
0.01935634328358209
0.01935634328358209 -> Example's result
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ProportionBySport by ID (and not by Name) #115

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ProportionBySport by ID (and not by Name) #115

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions