# More Web Scraping Practice

I built this notebook for some more practice web scraping data. In this one, we are scraping the scores of NHL games from ESPN.com. I initially tried loading these scores from nhl.com, but I encountered some problems I did not know how to fix until later on in this project when I was doing more research into scraping. Thus, the data seen below is coming from a source I don't particularly use too often. Anyway, it was good practice.

The gist of the problem is that when requesting content from a URL, the requests module does not read dynamically changing elements from webpages. It was frustrating to see that I could not scrape a simple time, i.e. 6:00 PM, from a 'div' element, when it clearly was there while I was inspecting the page in Chrome. The solution was using the selenium module to load an instance of a webpage which does capture dynamically changing elements.

This would obviously be more useful as a script you can rig to run every now and then.. but again, this was just for practice and was done in an afternoon.

In [222]:
from bs4 import BeautifulSoup
#import requests
import re
from selenium import webdriver #for dynamically loaded page elements that requests won't read in
from selenium.webdriver.chrome.service import Service

In [226]:
service = Service('C:/Users/cbarg/Downloads/chromedriver_win32/chromedriver.exe')
driver = webdriver.Chrome(service=service)
driver.get('https://www.espn.com/nhl/scoreboard')
html = driver.page_source
soup = BeautifulSoup(html)
driver.close()

In [227]:
there_are_games = True
#If this header is found on the page there are no games that day
if soup.find('h4', class_ = 'n5 tc pv6 clr-gray-05'):
    there_are_games = False

In [228]:
if there_are_games:
    games = soup.find_all('section', class_ = 'Scoreboard bg-clr-white flex flex-auto justify-between')
    for game in games:
        #get hyperlink to display for more info
        more_info = game.find('a', class_ = 'AnchorLink Button Button--sm Button--anchorLink Button--alt mb4 w-100')['href']
        more_info = 'https://www.espn.com' + more_info
        
        #get away team name, record
        away_team = game.find('li', class_ = re.compile('ScoreboardScoreCell__Item flex items-center relative pb2 ScoreboardScoreCell__Item--away'))
        away_name = away_team.find('div', class_ = 'ScoreCell__TeamName ScoreCell__TeamName--shortDisplayName truncate db').text
        away_record = away_team.find('span', class_ = 'ScoreboardScoreCell__Record').text
        
        #home team name, record
        home_team = game.find('li', class_ = re.compile('ScoreboardScoreCell__Item flex items-center relative pb2 ScoreboardScoreCell__Item--home'))
        home_name = home_team.find('div', class_ = 'ScoreCell__TeamName ScoreCell__TeamName--shortDisplayName truncate db').text
        home_record = home_team.find('span', class_ = 'ScoreboardScoreCell__Record').text

        #if this header is present, the game hasn't started yet, so we get the starting time and display it
        if game.find('div', class_ = 'ScoreCell__Time ScoreboardScoreCell__Time h9 clr-gray-03'):
            time_start = game.find('div', class_ = "ScoreCell__Time ScoreboardScoreCell__Time h9 clr-gray-03")
            print(f'{away_name} ({away_record}) @ {home_name} ({home_record})')
            print('This game starts at {}'.format(time_start.text))
            print('More info:', more_info)
            print('\n')

        #games that have started
        else:
            #game status is either going to be when the game ended (Final, Final/OT, etc...) or where we are currently at in the game
            game_status = game.find('div', class_ = re.compile('ScoreCell__Time ScoreboardScoreCell__Time h9')).text

            periods = game.find_all('div', class_ = 'ScoreboardScoreCell__Heading flex justify-center pl2 h9')
            periods.append(game.find('div', class_ = 'ScoreboardScoreCell__Heading ScoreboardScoreCell__Heading--total h9 flex justify-end pl2'))
            periods = [period.text for period in periods]
            
            #get the scores for the away team during each period
            away_scores_periods = away_team.find_all('div', class_ = 'ScoreboardScoreCell__Value flex justify-center pl2 n9')
            away_scores_periods.append(away_team.find('div', class_ = 'ScoreCell__Score h4 clr-gray-01 fw-heavy tar ScoreCell_Score--scoreboard pl2'))
            away_scores_periods = [period.text for period in away_scores_periods]
            
            #scores for the home team during each period
            home_scores_periods = home_team.find_all('div', class_ = 'ScoreboardScoreCell__Value flex justify-center pl2 n9')
            home_scores_periods.append(home_team.find('div', class_ = 'ScoreCell__Score h4 clr-gray-01 fw-heavy tar ScoreCell_Score--scoreboard pl2'))
            home_scores_periods = [period.text for period in home_scores_periods]

            #if the length of the periods is this long, the game went into either overtime or a shootout
            if(len(periods) == 5):
                print(f'''{away_name} ({away_record}) @ {home_name} ({home_record})\t{game_status}
{'Period':15}{periods[0]}\t{periods[1]}\t{periods[2]}\t{periods[3]}\t{periods[4]}
{away_name:15}{away_scores_periods[0]}\t{away_scores_periods[1]}\t{away_scores_periods[2]}\t{away_scores_periods[3]}\t{away_scores_periods[4]}
{home_name:15}{home_scores_periods[0]}\t{home_scores_periods[1]}\t{home_scores_periods[2]}\t{home_scores_periods[3]}\t{home_scores_periods[4]}
More info: {more_info}
''')
            #games that didn't go into overtime or a shootout
            else:
                print(f'''{away_name} ({away_record}) @ {home_name} ({home_record})\t{game_status}
{'Period':15}{periods[0]}\t{periods[1]}\t{periods[2]}\t{periods[3]}
{away_name:15}{away_scores_periods[0]}\t{away_scores_periods[1]}\t{away_scores_periods[2]}\t{away_scores_periods[3]}
{home_name:15}{home_scores_periods[0]}\t{home_scores_periods[1]}\t{home_scores_periods[2]}\t{home_scores_periods[3]}
More info: {more_info}
''')

else: print('There are no games today.')

Rangers (4-1-1) @ Senators (2-3-0)	Final
Period         1	2	3	T
Rangers        0	0	3	3
Senators       1	0	1	2
More info: https://www.espn.com/nhl/game/_/gameId/401349200

Flames (2-1-1) @ Capitals (3-0-2)	Final/OT
Period         1	2	3	OT	T
Flames         3	0	0	1	4
Capitals       0	3	0	0	3
More info: https://www.espn.com/nhl/game/_/gameId/401349205

Ducks (2-3-0) @ Wild (3-0-0)
This game starts at 6:00 PM
More info: https://www.espn.com/nhl/game/_/gameId/401349208


Red Wings (2-1-1) @ Canadiens (0-5-0)
This game starts at 7:00 PM
More info: https://www.espn.com/nhl/game/_/gameId/401349199


Avalanche (1-3-0) @ Lightning (2-2-0)
This game starts at 7:00 PM
More info: https://www.espn.com/nhl/game/_/gameId/401349201


Sabres (3-1-0) @ Devils (2-1-0)
This game starts at 7:00 PM
More info: https://www.espn.com/nhl/game/_/gameId/401349202


Panthers (4-0-0) @ Flyers (2-0-1)
This game starts at 7:00 PM
More info: https://www.espn.com/nhl/game/_/gameId/401349203


Maple Leafs (2-2-1) @ Pengui