Skip to content

Commit

Permalink
Deprecated ability to return scraped data as json string. Added way t…
Browse files Browse the repository at this point in the history
…o return as Pandas DataFrames
  • Loading branch information
HarryShomer committed Mar 4, 2018
1 parent 39181b6 commit bb318de
Show file tree
Hide file tree
Showing 14 changed files with 131 additions and 245 deletions.
26 changes: 19 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,8 @@ Scrape data on a season by season level:
# Scrapes the 2015 & 2016 season with shifts and stores the data in a Csv file
hockey_scraper.scrape_seasons([2015, 2016], True)

# Scrapes the 2008 season without shifts and returns a json string of the data
scraped_data = hockey_scraper.scrape_seasons([2008], False, data_format='Json')

# Scrapes the 2008 season without shifts and returns a dictionary containing the pbp Pandas DataFrame
scraped_data = hockey_scraper.scrape_seasons([2008], False, data_format='Pandas')

Scrape a list of games:

Expand All @@ -63,8 +62,8 @@ Scrape a list of games:
# Scrapes the first game of 2014, 2015, and 2016 seasons with shifts and stores the data in a Csv file
hockey_scraper.scrape_games([2014020001, 2015020001, 2016020001], True)

# Scrapes the first game of 2007, 2008, and 2009 seasons with shifts and returns a Json string of the data
scraped_data = hockey_scraper.scrape_games([2007020001, 2008020001, 2009020001], True, data_format='Json')
# Scrapes the first game of 2007, 2008, and 2009 seasons with shifts and returns a Dictionary with the Pandas DataFrames
scraped_data = hockey_scraper.scrape_games([2007020001, 2008020001, 2009020001], True, data_format='Pandas')

Scrape all games in a given date range:

Expand All @@ -75,8 +74,21 @@ Scrape all games in a given date range:
# Scrapes all games between 2016-10-10 and 2016-10-20 without shifts and stores the data in a Csv file
hockey_scraper.scrape_date_range('2016-10-10', '2016-10-20', False)

# Scrapes all games between 2015-1-1 and 2015-1-15 without shifts and returns a Json string of the data
scraped_data = hockey_scraper.scrape_date_range('2015-1-1', '2015-1-15', False, data_format='Json')
# Scrapes all games between 2015-1-1 and 2015-1-15 without shifts and returns a Dictionary with the pbp Pandas DataFrame
scraped_data = hockey_scraper.scrape_date_range('2015-1-1', '2015-1-15', False, data_format='Pandas')


The dictionary returned by setting the default argument "data_format" equal to "Pandas" is structured like:

::

{
# This is always included
'pbp': pbp_df,

# This is only included when the argument 'if_scrape_shifts' is set equal to True
'shifts': shifts_df
}


The full documentation can be found `here <http://hockey-scraper.readthedocs.io/en/latest/>`_.
Expand Down
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/hockey_scraper.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/index.doctree
Binary file not shown.
65 changes: 13 additions & 52 deletions docs/build/html/_sources/hockey_scraper.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ So you would refer to the 2016-2017 season as 2016).
hockey_scraper.scrape_seasons([2015, 2016], True)
hockey_scraper.scrape_seasons([2015, 2016], True, data_format='Csv')

# Scrapes the 2008 season without shifts and returns a json string of the data
scraped_data = hockey_scraper.scrape_seasons([2008], False, data_format='Json')
# Scrapes the 2008 season without shifts and returns a dictionary with the DataFrame
scraped_data = hockey_scraper.scrape_seasons([2008], False, data_format='Pandas')

# Scrapes 2014 season without shifts including preseason games
hockey_scraper.scrape_seasons([2014], False, preseason=True)
Expand All @@ -38,8 +38,8 @@ Scrape a list of games provided. All game ID's can be found using `this link
# Scrapes the first game of 2014, 2015, and 2016 seasons with shifts and stores the data in a Csv file
hockey_scraper.scrape_games([2014020001, 2015020001, 2016020001], True)

# Scrapes the first game of 2007, 2008, and 2009 seasons with shifts and returns a Json string of the data
scraped_data = hockey_scraper.scrape_games([2007020001, 2008020001, 2009020001], True, data_format='Json')
# Scrapes the first game of 2007, 2008, and 2009 seasons with shifts and returns a a dictionary with the DataFrames
scraped_data = hockey_scraper.scrape_games([2007020001, 2008020001, 2009020001], True, data_format='Pandas')

\3. *Scrape by Date Range*:

Expand All @@ -52,8 +52,8 @@ Scrape all games between a specified date range. All dates must be written in a
hockey_scraper.scrape_date_range('2016-10-10', '2016-10-20', False)
hockey_scraper.scrape_date_range('2016-10-10', '2016-10-20', False, preseason=False)

# Scrapes all games between 2015-1-1 and 2015-1-15 without shifts and returns a Json string of the data
scraped_data = hockey_scraper.scrape_date_range('2015-1-1', '2015-1-15', False, data_format='Json')
# Scrapes all games between 2015-1-1 and 2015-1-15 without shifts and returns a a dictionary with the DataFrame
scraped_data = hockey_scraper.scrape_date_range('2015-1-1', '2015-1-15', False, data_format='Pandas')

# Scrapes all games from 2014-09-15 to 2014-11-01 with shifts including preseason games
hockey_scraper.scrape_date_range('2014-09-15', '2014-11-01', True, preseason=True)
Expand All @@ -66,57 +66,18 @@ Play is automatically scraped.

\2. When scraping by date range or by season, preseason games aren't scraped unless otherwise specified.

\3. For all three functions the scraped data is deposited into a Csv file unless it's specified to return it as a Json string.
\3. For all three functions the scraped data is deposited into a Csv file unless it's specified to return the DataFrames

\4. The Json string returned is structured like so:
\4. The Dictionary with the DataFrames returned by setting data_format='Pandas' is structured like:
::


# When scraping by game or date range
{
'pbp': [
Plays
],
'shifts': [
Shifts
]
}

# When scraping by season
{
'pbp': {
'Seasons': [
Plays
]
},
'shifts': {
'Seasons': [
Plays
]
}
}


# For example, if you scraped the 2008 and 2009 seasons the Json will look like this:
{
'pbp': {
'2008': [
Plays
],
'2009': [
Plays
]
},
'shifts': {
'2008': [
Shifts
],
'2009': [
Shifts
]
}
}
# This is always included
'pbp': pbp_df,

# This is only included specified that you want to also scrape shifts
'shifts': shifts_df
}


Functions
Expand Down

0 comments on commit bb318de

Please sign in to comment.