Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

This is a python module to scrape basketball-reference.com and convert various stats into usable data structures for analysis

branch: master

adding TODO.md

for now, listing some advanced stats to add.  can't do all of this
because we don't have the points in the game when the player was in---
so all the % stats (assist percentage, etc) can't really be calculated.
 we can, however, calculate PER, TS, eFG, etc.
latest commit 87e51d7e3f
andrew giessel authored
README.md

basketballcrawler

This is a python module to scrape basketball-reference.com and convert various stats into usable data structures for analysis.

Here is a link to a sample IPython Notebook file demonstrating the library.

Requirements

Beautiful Soup (version 4 or above)

Pandas (0.11 or above)

Usage

Still developing the API. Right now you can get a list of all player overview urls, generate a list of game log urls for a given player, and convert that list into pandas dataframe.

Notes

players.json was generated on 03/09/2013 by buildPlayerDictionary() and savePlayerDictionary(). It is a good way to jumpstart your analysis and can be loaded with ... loadPlayerDictoinary(). Note that it's a pretty large (13M) file. I'd recommend building your own, fresh copy. Note that it takes about 10 minutes due to spacing out the web requests.

TODO

I'm considering turning this into a class, instead of using a dictionary, so one doesn't have to pass around this dictionary all the time. Hesitant.

Local Database construction.

League-wide statistics.

Extract other key information from the player overview page- position might be an especially useful variable to use for supervised learning and height/weight is a clearly important variable as well.

Something went wrong with that request. Please try again.