Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minutes estimation dataset for Hackweek #573

Merged
merged 5 commits into from
Aug 7, 2023
Merged

Conversation

rchan26
Copy link
Contributor

@rchan26 rchan26 commented Jun 5, 2023

Creating dataset for minutes estimation task for hack week.

  • Fixed an error in TransferMarkt scraper where it wouldn't catch that a player was in the youth or reserve team for a Premier League club (e.g. wouldn't recognise that playing for Arsenal U18 means you'd be able to play for Arsenal)
  • Re-run TransferMarkt scraper to update absence CSVs
  • Create dataset with minutes played for each player and any absence reason (according to TransferMarkt) in airsenal/data/minutes_estimation_challenge.csv
    • Notebook to create this dataset is in notebooks/minutes_estimation_data.ipynb
  • Made changes to get_player_history_df function in airsenal/framework/prediction_utils.py to allow getting the player history for all players (if all_players=True) and to not create blank entries (if fill_blank=False). By default, all_players=False and fill_blank=True to keep the original behaviour
    • Dataframe now queries the absence table to check if there was an absence for a game (according to data scraped from TransferMarkt)

Warning: May have some errors in airsenal/data/minutes_estimation_challenge.csv if TransferMarkt data was incorrect, or how we collect that data was incorrect - have found a few and deleted but there probably is more...

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@rchan26
Copy link
Contributor Author

rchan26 commented Aug 7, 2023

can we merge this @jack89roberts? it didn't get used in the hackweek, but can be useful for future work

@jack89roberts
Copy link
Contributor

Sure 👍

@jack89roberts jack89roberts merged commit 3e60987 into develop Aug 7, 2023
2 checks passed
@jack89roberts jack89roberts deleted the minutes_estimation branch August 7, 2023 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants