Skip to content

Python library for scraping & analyzing sports statistics

License

Notifications You must be signed in to change notification settings

chickenandstats/chickenstats

Repository files navigation

chickenstats

Hero image - scatter plot with drumsticks and tooltips

PyPI - Version PyPI - Python Version tests codecov GitHub Release Date - Published_At GitHub License


About

chickenstats is a Python package for scraping & analyzing sports data. With just a few lines of code:

  • Scrape & manipulate data from various NHL endpoints, leveraging chickenstats.chicken_nhl, which includes a proprietary xG model for shot quality metrics
  • Augment play-by-play data & generate custom aggregations from raw csv files downloaded from Evolving-Hockey (subscription required) with chickenstats.evolving_hockey

For more in-depth explanations, tutorials, & detailed reference materials, consult the Documentation.


Compatibility

chickenstats requires Python 3.10 or greater & runs on the latest stable versions of Linux, macOS, & Windows operating systems.


Installation

Very simple - install using PyPi. Best practice is to develop in an isolated virtual environment (conda or otherwise), but who's a chicken to judge?

pip install chickenstats

To confirm installation & confirm the latest version (1.7.8):

pip show chickenstats

Usage

chickenstats is structured as two underlying modules, each used with different data sources:

  • chickenstats.chicken_nhl
  • chickenstats.evolving_hockey

The package is under active development - features will be added or modified over time.

chicken_nhl

The chickenstats.chicken_nhl module scrapes & manipulates data directly from various NHL endpoints, with outputs including schedule & game results, rosters, & play-by-play data.

The below example scrapes the schedule for the Nashville Predators, extracts the game IDs, then scrapes play-by-play data for the first ten regular season games.

from chickenstats.chicken_nhl import Season, Scraper

# Create a Season object for the current season
season = Season(2023)

# Download the Nashville schedule & filter for regular season games
nsh_schedule = season.schedule('NSH')
nsh_schedule_reg = nsh_schedule.loc[nsh_schedule.game_state == "OFF"].reset_index(drop=True)

# Extract game IDs, excluding pre-season games
game_ids = nsh_schedule_reg.game_id.tolist()[:10]

# Create a scraper object using the game IDs
scraper = Scraper(game_ids)

# Scrape play-by-play data
play_by_play = scraper.play_by_play

evolving_hockey

The chickenstats.evolving_hockey module manipulates raw csv files downloaded from Evolving-Hockey. Using their original shifts & play-by-play data, users can add additional information & aggregate for individual & on-ice statistics, including high-danger shooting events, xG & adjusted xG, faceoffs, & changes.

import pandas as pd
from chickenstats.evolving_hockey import prep_pbp, prep_stats, prep_lines

# The prep_pbp function takes the raw event and shifts dataframes
raw_shifts = pd.read_csv('./raw_shifts.csv')
raw_pbp = pd.read_csv('./raw_pbp.csv')

play_by_play = prep_pbp(raw_pbp, raw_shifts)

# You can use the play_by_play dataframe in various aggregations
# These are individual game statistics, including on-ice & usage,
# accounting for teammates & opposition on-ice
individual_game = prep_stats(play_by_play, level='game', teammates=True, opposition=True)

# These are game statistics for forward-line combinations, accounting for opponents on-ice
forward_lines = prep_lines(play_by_play, level='game', position='f', opposition=True)

Acknowledgements

chickenstats wouldn't be possible without the support & efforts of countless others. I am obviously extremely grateful, even if there are too many of you to thank individually. However, this chicken will do his best.

First & foremost is my wife - the lovely Mrs. Chicken has been patient, understanding, & supportive throughout the countless hours of development, sometimes to her detriment.

Sincere apologies to the friends & family that have put up with me since my entry into Python, programming, & data analysis in January 2021. Thank you for being excited for me & with me throughout all of this, especially when you've had to fake it...

Thank you to the hockey analytics community on (the artist formerly known as) Twitter. You're producing & reacting to cutting-edge statistical analyses, while providing a supportive, welcoming environment for newcomers. Thank y'all for everything that you do. This is by no means exhaustive, but there are a few people worth calling out specifically:

I'm also grateful to the thriving community of Python educators & open-source contributors on Twitter. Thank y'all for your knowledge & practical advice. Matt Harrison (@mharrison) deserves a special mention for his books on Pandas and XGBoost, both of which are available at his online store. Again, not exhaustive, but others worth thanking individually:

Finally, this library depends on a host of other open-source packages. chickenstats is possible because of the efforts of thousands of individuals, represented below:

About

Python library for scraping & analyzing sports statistics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages