### NHL Travel-Across-Timezones Analysis

Inspired by an interesting tweet that made this observation (or close to it): for the rest of this (2019-2020) season, the Hurricanes only travel to a different time zone three times.

That got me wondering if traveling to a different time zone had any effect on in-game performance, and whether that effect became more pronounced as the number of crossed zones increased. This is an ongoing work in progress as I have time to explore a little more.

In [6]:
import re
import pandas as pd
from scipy import stats

In [7]:
# Some constants and mappings
EST, CST, MST, PST = 0, 1, 2, 3
HOME_TIMEZONES = {
    'Boston Bruins': EST,
    'Washington Capitals': EST,
    'Montreal Canadiens': EST,
    'Toronto Maple Leafs': EST,
    'Anaheim Ducks': PST,
    'San Jose Sharks': PST,
    'Calgary Flames': MST,
    'Vancouver Canucks': PST,
    'Chicago Blackhawks': CST,
    'Ottawa Senators': EST,
    'Columbus Blue Jackets': EST,
    'Detroit Red Wings': EST,
    'Buffalo Sabres': EST,
    'Pittsburgh Penguins': EST,
    'Arizona Coyotes': MST,
    'Dallas Stars': CST,
    'Philadelphia Flyers': EST,
    'Vegas Golden Knights': PST,
    'Carolina Hurricanes': EST,
    'New York Islanders': EST,
    'St Louis Blues': CST,
    'Winnipeg Jets': CST,
    'Nashville Predators': CST,
    'New York Rangers': EST,
    'Colorado Avalanche': MST,
    'Minnesota Wild': CST,
    'Los Angeles Kings': PST,
    'Edmonton Oilers': MST,
    'New Jersey Devils': EST,
    'Florida Panthers': EST,
    'Tampa Bay Lightning': EST
}

TEAM_TO_FULL = {
    'Bruins': 'Boston Bruins',
    'Capitals': 'Washington Capitals',
    'Canadiens': 'Montreal Canadiens',
    'Maple Leafs': 'Toronto Maple Leafs',
    'Ducks': 'Anaheim Ducks',
    'Sharks': 'San Jose Sharks',
    'Flames': 'Calgary Flames',
    'Canucks': 'Vancouver Canucks',
    'Blackhawks': 'Chicago Blackhawks',
    'Senators': 'Ottawa Senators',
    'Blue Jackets': 'Columbus Blue Jackets',
    'Red Wings': 'Detroit Red Wings',
    'Sabres': 'Buffalo Sabres',
    'Penguins': 'Pittsburgh Penguins',
    'Coyotes': 'Arizona Coyotes',
    'Stars': 'Dallas Stars',
    'Flyers': 'Philadelphia Flyers',
    'Golden Knights': 'Vegas Golden Knights',
    'Hurricanes': 'Carolina Hurricanes',
    'Islanders': 'New York Islanders',
    'Blues': 'St Louis Blues',
    'Jets': 'Winnipeg Jets',
    'Predators': 'Nashville Predators',
    'Rangers': 'New York Rangers',
    'Avalanche': 'Colorado Avalanche',
    'Wild': 'Minnesota Wild',
    'Kings': 'Los Angeles Kings',
    'Oilers': 'Edmonton Oilers',
    'Devils': 'New Jersey Devils',
    'Panthers': 'Florida Panthers',
    'Lightning': 'Tampa Bay Lightning'
}

This data is comprised of team stats from every game in the 2018-2019 season.

In [12]:
DATA_PATH = 'nhl_data.csv'
df = pd.read_csv(DATA_PATH).drop(columns='Unnamed: 2')

# A little bit of feature engineering
df['date'] = pd.to_datetime(df['Game'].apply(lambda v: v.split(' - ')[0].strip()))
df['win'] = df.apply(lambda row: int(row['GF'] > row['GA']), axis=1)
df['home_team'] = df['Game'].apply(
    lambda v: TEAM_TO_FULL[re.match(r'\d+-\d+-\d+ - \w+( \w+)? \d+, (\w+( \w+)?) \d+', v).group(2)]
)
df['is_home'] = df.apply(lambda row: int(row['home_team'] == row['Team']), axis=1)
df['game_in_tz'] = df['home_team'].apply(lambda v: HOME_TIMEZONES[v])
df.drop(columns='Game', inplace=True)

# Calculate the Effective Timezone Difference
# If last game was <= 2 days ago, calculate TZ difference from last game
# If last game was > 2 days ago, caluclate TZ difference from home TZ
def calc_effective_tz_diff(df):
    """
    Expects a dataframe of the schedule for a single team
    """
    team = df.iloc[0].Team
    home_tz = HOME_TIMEZONES[team]
    df.loc[:, 'prev_tz'] = df['game_in_tz'].shift(1).fillna(home_tz).astype(int)
    df.loc[:, 'prev_date'] = df['date'].shift(1)

    df.loc[:, 'game_days_diff'] = (df['prev_date'] - df['date']).fillna(pd.Timedelta(days=0)).apply(
        lambda d: abs(d.days)
    )
    df.loc[:, 'tz_travel'] = df.apply(
        lambda row: int(abs(
            row['prev_tz'] - row['game_in_tz']
            if row['game_days_diff'] <= 2
            else row['game_in_tz'] - home_tz
        )),
        axis=1
    )
    return df.drop(columns=['prev_tz', 'prev_date', 'game_days_diff'])


df = pd.concat([calc_effective_tz_diff(df_) for _, df_ in df.groupby('Team')])

In [14]:
team_stats = df.columns[3:-7]
team_stats

Index(['CA', 'CF%', 'FF', 'FA', 'FF%', 'SF', 'SA', 'SF%', 'GF', 'GA', 'GF%',
       'xGF', 'xGA', 'xGF%', 'SCF', 'SCA', 'SCF%', 'HDCF', 'HDCA', 'HDCF%',
       'HDSF', 'HDSA', 'HDSF%', 'HDGF', 'HDGA', 'HDGF%', 'HDSH%', 'HDSV%',
       'MDCF', 'MDCA', 'MDCF%', 'MDSF', 'MDSA', 'MDSF%', 'MDGF', 'MDGA',
       'MDGF%', 'MDSH%', 'MDSV%', 'LDCF', 'LDCA', 'LDCF%', 'LDSF', 'LDSA',
       'LDSF%', 'LDGF', 'LDGA', 'LDGF%', 'LDSH%', 'LDSV%', 'SH%', 'SV%',
       'PDO'],
      dtype='object')

In [15]:
no_tz_change_df = df[df['tz_travel'] == 0]
tz_change_df = df[df['tz_travel'] != 0]

for t_stat in team_stats:
    
    try:
        _, pval = stats.ttest_ind(no_tz_change_df[t_stat], tz_change_df[t_stat], equal_var=False)
        if pval <= 0.05:
            print('{}: {}'.format(t_stat, pval))
    except:
        pass  # Silently fail

FF: 0.024608180339480062
xGF: 0.0020866398679000144
xGF%: 0.028578364195480554
HDCF: 0.0016888490330271323
HDCF%: 0.01830904573933463
HDSF: 7.439305303227772e-05
HDSF%: 0.005776433455912921




It's interesting that, on a leaguewide scale, it's offense that is affected while defensive stats don't change in a significant way.