# PWHL Attendance

## Overview

As women’s sports gain popularity, I am interested in the evolution of game attendance over time. I am particularly interested in the case of the Professional Women’s Hockey League (PWHL).

## Introduction

I had the chance to witness the Montreal Victoire’s first home game, on January 13, 2024. The team did not have a name yet and was simply referred to as "PWHL Montreal". The game was held at the Verdun auditorium, the 4114-seat venue where the Victoire trains. The ambiance was amazing.

While the Victoire still trains in Verdun, the team now holds all of its regular home matches in a bigger venue, Place Bell, which can accomodate 10,062 spectators.

As women’s hockey gains popularity and athletes are getting paid full-time salaries to train together, we are seeing a higher level of play than ever before. I hope the attendance numbers will be maintained over time or even grow, ensuring the PWHL’s future and setting an example which, I hope, will lead to better funding for women’s sports and high level female athletes flourishing.

## Dataset

### Data source

The PWHL’s website’s [schedule page](https://www.thepwhl.com/en/stats/schedule/all-teams/5/all-months) lists all games for the current season in a table.

![the top of a table titled Schedule and listing the date, time, teams, scores, venues and broadcasters for PWHL games](./img/2025-04-07_ScheduleTable.png)

The R button at the end of each line leads to a webpage containing an official game report which specifies, among other things, the venue, date, teams and attendance.

The current season is selected by default but it is possible to obtain the same table for past seasons.

It is therefore possible to use the webpage to get a list of URLs corresponding to game reports for all of a season’s games. Then, information can be collected from each of the reports, which appear to all be formatted in the same way.

### Data scraping

I scraped a list of game report links from each season’s webpage. Then, I scraped the following information from each report, storing it in a DataFrame: visiting team, home team, venue, date, game start and attendance.

In [1]:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

In [41]:
game_report_urls = []

def scrape_game_report_urls(url):
    game_report_urls = []
    # set up the WebDriver
    driver = webdriver.Edge() #EDIT THIS LINE if not using Edge <<<<<<<<<<<<<<<<<<<< IMPORTANT <<<<<<<<<<<<<<<<<<<<
    # open the webpage
    driver.get(url)
    # wait for the table to load
    try:
        table = WebDriverWait(driver, 2).until(
            EC.presence_of_element_located((By.XPATH, '//table'))
        )
        all_links = driver.find_elements(By.XPATH, "//a[@href]")
        for link in all_links:
            link_url = link.get_attribute("href")
            if "official-game-report" in link_url:
                game_report_urls.append(link_url)
    finally:
        driver.quit()
    return game_report_urls

for i in range(1, 6): # currently, there are 5 seasons
    game_report_urls.extend(scrape_game_report_urls("https://www.thepwhl.com/en/stats/schedule/all-teams/" + str(i) + "/all-months"))

In [42]:
game_report_urls[0:5]

['https://lscluster.hockeytech.com/game_reports/official-game-report.php?client_code=pwhl&game_id=2&lang_id=1',
 'https://lscluster.hockeytech.com/game_reports/official-game-report.php?client_code=pwhl&game_id=3&lang_id=1',
 'https://lscluster.hockeytech.com/game_reports/official-game-report.php?client_code=pwhl&game_id=4&lang_id=1',
 'https://lscluster.hockeytech.com/game_reports/official-game-report.php?client_code=pwhl&game_id=5&lang_id=1',
 'https://lscluster.hockeytech.com/game_reports/official-game-report.php?client_code=pwhl&game_id=6&lang_id=1']

Now that I have the URL for each report, I want to extract specific information: visiting team, home team, venue, date, game start and attendance.

In [52]:
import requests 
from bs4 import BeautifulSoup as bs 
r = requests.get('https://lscluster.hockeytech.com/game_reports/official-game-report.php?client_code=pwhl&game_id=6&lang_id=1') 
# convert to beautiful soup 
soup = bs(r.content)

In [54]:
game_info = {}

In [72]:
table = soup.select('table')[0]
game_data = pd.read_html(str(table))[0]
game_data

  game_data = pd.read_html(str(table))[0]


Unnamed: 0,0,1,2,3,4
0,,PWHL Game #2 Ottawa 3 at Montréal 4 Place Bel...,Referee:Beatrice Fortin (26) Jack Hennigan (44...,Three Stars: 1. MTL - M. Poulin 2. OTT - E. ...,Game Start: 5:18 PM EST Game End: 7:47 PM Game...
1,Referee:,Beatrice Fortin (26) Jack Hennigan (44),,,
2,linespersons:,Ali Beres (67) Laura Gutauskas (68),,,
3,,,,,
4,Game Start:,5:18 PM EST,,,
5,Game End:,7:47 PM,,,
6,Game Length:,2:29,,,
7,Attendance:,10033,,,


In [86]:
r = requests.get('https://lscluster.hockeytech.com/game_reports/official-game-report.php?client_code=pwhl&game_id=105&lang_id=1') 
soup = bs(r.content)
table = soup.select('table')[0]
game_data = pd.read_html(str(table))[0]
game_info["visiting team"] = "testOtt"
game_info["home team"] = "testMtl"
game_info["venue"] = "testPBell"
game_info["date"] = "testDate"
game_info["game start"] = game_data.iloc[4][1]
game_info["attendance"] = game_data.iloc[7][1]
game_info

  game_data = pd.read_html(str(table))[0]


{'visiting team': 'testOtt',
 'home team': 'testMtl',
 'venue': 'testPBell',
 'date': 'testDate',
 'game start': '2:15 PM EST',
 'attendance': '8089'}

In [87]:
game_data.iloc[0][1]

'PWHL Game #1 Boston 1 at Toronto 3  Coca-Cola Coliseum Nov 30, 2024'

TODO:

- Add day of the week and seat capacity to df
- Add column specifying if the local NHL team is playing that day to df
- Check if I can find attendance records for the CWHL (prior women’s hockey league) for comparison