Skip to content

Commit

Permalink
Added functionality for live scraping. Also fixed warning messages.
Browse files Browse the repository at this point in the history
  • Loading branch information
HarryShomer committed Nov 18, 2018
1 parent a04fddc commit fe6939c
Show file tree
Hide file tree
Showing 53 changed files with 2,280 additions and 418 deletions.
Binary file modified .DS_Store
Binary file not shown.
8 changes: 7 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,10 @@ v1.2.6
using 'docs_dir' will make us check if a file was already scraped and saved before getting it from the source. It will
also provide a location for us to save it if we don't have it yet. 'rescrape' only applies when a valid directory
is provided with 'docs_dir'. Setting 'rescrape' equal to True will have us scrape the file from the source even if
it's saved and save this new one.
it's saved and save this new one.

v1.2.7
------

* Added functionality to easier scrape live games
* Fixed user warnings
24 changes: 0 additions & 24 deletions LICENSE.rst

This file was deleted.

18 changes: 18 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
License
=======

The MIT License (MIT)

Copyright (c) 2018 Harry Shomer

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom
the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
60 changes: 59 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ To install all you need to do is open up your terminal and type in:
Usage
-----

Standard Scrape Functions
~~~~~~~~~~~~~~~~~~~~~~~~~

Scrape data on a season by season level:

::
Expand Down Expand Up @@ -101,7 +104,7 @@ files deposited in (it must exist beforehand).
import hockey_scraper

# Path to the given directory
USER_PATH = /....
USER_PATH = "/...."

# Scrapes the 2015 & 2016 season with shifts and stores the data in a Csv file
# Also includes a path for an existing directory for the scraped files to be placed in or retrieved from.
Expand All @@ -111,6 +114,61 @@ files deposited in (it must exist beforehand).
hockey_scraper.scrape_seasons([2015, 2016], True, docs_dir=USER_PATH, rescrape=True)


Live Scraping
~~~~~~~~~~~~~

Here is a simple example of a way to setup live scraping. I strongly suggest checking out
`this section <https://hockey-scraper.readthedocs.io/en/latest/live_scrape.html>`_ of the docs if you plan on using this.
::

import hockey_scraper as hs


def to_csv(game):
"""
Store each game DataFrame in a file

:param game: LiveGame object

:return: None
"""

# If the game:
# 1. Started - We recorded at least one event
# 2. Not in Intermission
# 3. Not Over
if game.is_ongoing():
# Get both DataFrames
pbp_df = game.get_pbp()
shifts_df = game.get_shifts()

# Print the description of the last event
print(game.game_id, "->", pbp_df.iloc[-1]['Description'])

# Store in CSV files
pbp_df.to_csv(f"../hockey_scraper_data/{game.game_id}_pbp.csv", sep=',')
shifts_df.to_csv(f"../hockey_scraper_data/{game.game_id}_shifts.csv", sep=',')

if __name__ == "__main__":
# B4 we start set the directory to store the files
# You don't have to do this but I recommend it
hs.live_scrape.set_docs_dir("../hockey_scraper_data")

# Scrape the info for all the games on 2018-11-15
games = hs.ScrapeLiveGames("2018-11-15", if_scrape_shifts=True, pause=20)

# While all the games aren't finished
while not games.finished():
# Update for all the games currently being played
games.update_live_games(sleep_next=True)

# Go through every LiveGame object and apply some function
# You can of course do whatever you want here.
for game in games.live_games:
to_csv(game)



The full documentation can be found `here <http://hockey-scraper.readthedocs.io/en/latest/>`_.


Expand Down
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file removed docs/build/doctrees/hockey_scraper.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/license_link.doctree
Binary file not shown.
Binary file added docs/build/doctrees/live_scrape.doctree
Binary file not shown.
Binary file added docs/build/doctrees/scrape_functions.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 09036852b3471979817369ae25cb8a8e
config: a7febc38e5eac04579b5fb5253806860
tags: 645f666f9bcd5a90fca523b33c5a78b7
3 changes: 2 additions & 1 deletion docs/build/html/_sources/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ Contents
.. toctree::
:maxdepth: 1

hockey_scraper
scrape_functions
live_scrape
license_link


Expand Down
2 changes: 1 addition & 1 deletion docs/build/html/_sources/license_link.rst.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
.. include:: ../../LICENSE.rst
.. include:: ../../LICENSE.txt

0 comments on commit fe6939c

Please sign in to comment.