# Game Score Notebook

## References and Links

### All of these metrics were created and scaled to approximate a corilation to Points in a game. 
Inspired by this article I found about a metric that was developed in 2016, Dom Luszczyszyn which is intended to provide a single number that approximates a player’s performance in a given game. In that original story, he included the formula for the metric and how he arrived at it, so it’s inspired various adaptations since then.
https://hockey-graphs.com/2016/07/13/measuring-single-game-productivity-an-introduction-to-game-score/

The stats I used are goals, primary assists, secondary assists, shots on goal, blocked shots, penalty differential, faceoffs, 5-on-5 corsi differential, 5-on-5 goal differential.

Player Game Score = (0.75 * G) + (0.7 * A1) + (0.55 * A2) + (0.075 * SOG) + (0.05 * BLK) + (0.15 * PD) – (0.15 * PT) + (0.01 * FOW) – (0.01 * FOL) + (0.05 * CF) – (0.05 * CA) + (0.15 * GF) – (0.15* GA)



## Adjusted / Simplified
- found on the blog frshice.substack.com
- Bailey Johnson created her simplified version to work on college hockey data because not all of the factors (specifically the defensive portion of the Corsi Metric) is not track or is not available for NCAA games

### Original
For clarity’s sake, this was my original formula after removing stats I didn’t have access to from Dom’s: Player Game Score = (0.75*G)+(0.7*A1)+(0.55*A2)+(0.075*SOG)+(0.05*BLK)+(0.01*FOW)–(0.01*FOL)+(0.15*GF)–(0.15*GA)

### Bailey Final
- I also followed Shawn’s method from his NWHL game score work and used league-wide power-play percentage to weight the impact of taking a penalty.

- Dom scaled his formula down 75% to make the game scores roughly equivalent to points so people would be familiar with what the game score represented, and I kept to that methodology here because I used some of the same weights he did. Also keep in mind that the goals for and goals against are just goals scored at even strength — it does not include special teams, empty-net or extra-attacker goals.

Player Game Score = (G*0.75)+(A1*0.715)+(A2*0.555)+(SOG*0.075)+(BLK*0.05)+(FOW*0.01)-(FOL*0.01)+(GF*0.15)-(GA*0.15)-(PNT*0.138)

### PNT*.138
- FROM SOURCE: I took the frequency of powerplay goals to penalties, otherwise known as PP%. (https://hockey-graphs.com/2018/03/22/an-introduction-to-nwhl-game-score/)
- this is her attempt to account for penalties - CHN NCAA stats don't include penalties drawn 
- PNT is penalties taken so I will want to grab the penalty incidents, not the minutes. 
- The 0.138 factor come from the league's power play percentage 
    - Want to use a static figure like overall average PP % for entire NCAA 
        - could update it to be dynamic and create a new average every time the data is called

        - IDEA: Could create a custom factor for each team in each game 
            - teams power play effectiveness can vary greatly, as can a teams PK%
            - Take the each teams previous success on PP or even on both PP and PK
            - Compare to NCAA wide average
            - get a factor that could be used and could scale the danger of taking a penalty based on how good the opponent is on PP or how poor your team is on PP


In [None]:
## Blocking out the Game Score formula

#### METRIC Formula
# 
# METRIC = SCORE [ (Goals*0.75) + (Assist1*0.715) + (Assist2*0.555) ] 
#               + SHOTS [ (Shot_On_Net*0.075) + (Shots_Off_Net*0.075) - (Shots_Blocked*0.075) ]
#               + FACEOFFS [ (Faceoff_Wins*0.01) - (Faceoff_Losses*0.01) ]
#               + TEAM [ (Goals_For_Team*0.15) + (Goals_Against_Team*0.15) ]
#               -  [ (Penalties_Taken * Overall_PP_Success_Rate) ]

# Map to each factor in the Game_Stats Database

## Calculating a Game Score for each player on each team for each game
Each Game has a unique Game_ID column in every relevant table

scoring_summary table
    Goals - in scoring_summary - count of player's name in Player Column
    Assist1 - scoring_summary - Count in Assist1
    Assist2 - scoring_summary - Count in Assist2


abdvanced_metrics_combined table
    Shots_On_Net -  - EVEN_Saved + EVEN_Goals
    Shots_Off_Net = EVENE_Miss
    Shots_Blocked = EVEN_Block

    Defensive_Blocks = D_Blocks

player_stats
    Faceoff_Wins = FO_W
    Faceoff_Loses = FO_L

### This takes into account team goals for and goals against but only counts Even strength goals in close games (+/- 1)
- can get it from the advanced metrics combined

advanced_metrics_advanced
    Goals_For_Team = SUM of CLOSE_Goals grouped by Team -NOTE
#### NOTE- need to filter out any rows in the advanced_metrics_combined with player = 'TOTAL'
    Goals_Against_Team - do the same but for the opposing team - if Game_ID matches and Team =/= the player's team

penalty_summary
    Penalties_Taken = Count of Player's name in 



In [2]:

import sqlite3
import pandas as pd
import numpy as np

import sqlite3

db_path ='../data/2023_YTD_Game_Stats_NEW.db'
conn = sqlite3.connect(db_path)


# Connect to the provided database

cursor = conn.cursor()

# Retrieve the list of tables in the database
tables = cursor.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()
tables = [table[0] for table in tables]

# Retrieve the columns from each table to get a better understanding of the data structure
table_columns = {}
for table in tables:
    columns = cursor.execute(f"PRAGMA table_info({table});").fetchall()
    table_columns[table] = [column[1] for column in columns]

table_columns


# Load Data

# Calculate Goals, Assist1, and Assist2 for each player in each game

query_goals_assists = """
SELECT 
    Player,
    Game_ID,
    SUM(CASE WHEN Player IS NOT NULL THEN 1 ELSE 0 END) AS Goals,
    SUM(CASE WHEN Assist1 = Player THEN 1 ELSE 0 END) AS Assist1,
    SUM(CASE WHEN Assist2 = Player THEN 1 ELSE 0 END) AS Assist2
FROM 
    scoring_summary
GROUP BY 
    Player, Game_ID;
"""

goals_assists_data = cursor.execute(query_goals_assists).fetchall()

# Let's preview the first few rows of the result
goals_assists_data[:5]



[('A.J. Hodges', '2023-10-07-Boston University-Bentley', 1, 0, 0),
 ('A.J. Hodges', '2023-10-28-Bentley-Robert Morris', 1, 0, 0),
 ('Aaron Bohlinger', '2023-10-13-Michigan-Massachusetts', 1, 0, 0),
 ('Aaron Grounds', '2023-10-30-Long Island-Stonehill', 1, 0, 0),
 ('Aaron Huglen', '2023-10-13-St. Thomas-Minnesota', 1, 0, 0)]

In [3]:
# Calculate Goals, Assist1, and Assist2 for each player in each game

query_goals_assists = """
SELECT 
    Player,
    Game_ID,
    SUM(CASE WHEN Player IS NOT NULL THEN 1 ELSE 0 END) AS Goals,
    SUM(CASE WHEN Assist1 = Player THEN 1 ELSE 0 END) AS Assist1,
    SUM(CASE WHEN Assist2 = Player THEN 1 ELSE 0 END) AS Assist2
FROM 
    scoring_summary
GROUP BY 
    Player, Game_ID;
"""

goals_assists_data = cursor.execute(query_goals_assists).fetchall()

# Let's preview the first few rows of the result
goals_assists_data[:5]


[('A.J. Hodges', '2023-10-07-Boston University-Bentley', 1, 0, 0),
 ('A.J. Hodges', '2023-10-28-Bentley-Robert Morris', 1, 0, 0),
 ('Aaron Bohlinger', '2023-10-13-Michigan-Massachusetts', 1, 0, 0),
 ('Aaron Grounds', '2023-10-30-Long Island-Stonehill', 1, 0, 0),
 ('Aaron Huglen', '2023-10-13-St. Thomas-Minnesota', 1, 0, 0)]

In [4]:
# Calculate Goals, Assist1, and Assist2 for each player in each game

query_goals_assists = """
SELECT 
    Player,
    Game_ID,
    SUM(CASE WHEN Player IS NOT NULL THEN 1 ELSE 0 END) AS Goals,
    SUM(CASE WHEN Assist1 = Player THEN 1 ELSE 0 END) AS Assist1,
    SUM(CASE WHEN Assist2 = Player THEN 1 ELSE 0 END) AS Assist2
FROM 
    scoring_summary
GROUP BY 
    Player, Game_ID;
"""

goals_assists_data = cursor.execute(query_goals_assists).fetchall()

# Let's preview the first few rows of the result
goals_assists_data[:5]


[('A.J. Hodges', '2023-10-07-Boston University-Bentley', 1, 0, 0),
 ('A.J. Hodges', '2023-10-28-Bentley-Robert Morris', 1, 0, 0),
 ('Aaron Bohlinger', '2023-10-13-Michigan-Massachusetts', 1, 0, 0),
 ('Aaron Grounds', '2023-10-30-Long Island-Stonehill', 1, 0, 0),
 ('Aaron Huglen', '2023-10-13-St. Thomas-Minnesota', 1, 0, 0)]

In [5]:
# Calculate Faceoff Metrics for each player in each game

query_faceoff_metrics = """
SELECT 
    Player,
    Game_ID,
    FOW AS Faceoff_Wins,
    FOL AS Faceoff_Losses
FROM 
    player_stats;
"""

faceoff_metrics_data = cursor.execute(query_faceoff_metrics).fetchall()

# Preview the first few rows of the result
faceoff_metrics_data[:5]


[('Michigan State', '2023-10-07-Lake Superior-Michigan State', None, None),
 ("Gavin O'Connell", '2023-10-07-Lake Superior-Michigan State', None, None),
 ('Tommi Männistö', '2023-10-07-Lake Superior-Michigan State', None, None),
 ('Maxim Štrbák', '2023-10-07-Lake Superior-Michigan State', None, None),
 ('Artyom Levshunov', '2023-10-07-Lake Superior-Michigan State', None, None)]

In [6]:
# Calculate Team Goals Metrics for each player in each game

query_team_goals = """
WITH TeamGoals AS (
    SELECT 
        Game_ID,
        Team,
        SUM(CLOSE_Goals) AS Goals_For_Team
    FROM 
        advanced_metrics_combined
    WHERE 
        Player != 'TOTAL'
    GROUP BY 
        Game_ID, Team
),
OpponentGoals AS (
    SELECT 
        a.Game_ID,
        a.Team AS Player_Team,
        b.Team AS Opponent_Team,
        b.Goals_For_Team AS Goals_Against_Team
    FROM 
        TeamGoals a
    JOIN 
        TeamGoals b ON a.Game_ID = b.Game_ID AND a.Team != b.Team
)
SELECT 
    amc.Player,
    amc.Game_ID,
    amc.Team,
    tg.Goals_For_Team,
    og.Goals_Against_Team
FROM 
    advanced_metrics_combined amc
JOIN 
    TeamGoals tg ON amc.Game_ID = tg.Game_ID AND amc.Team = tg.Team
JOIN 
    OpponentGoals og ON amc.Game_ID = og.Game_ID AND amc.Team = og.Player_Team
WHERE 
    amc.Player != 'TOTAL';
"""

team_goals_data = cursor.execute(query_team_goals).fetchall()

# Preview the first few rows of the result
team_goals_data[:5]


[('Alexander\xa0Malinowski',
  "2023-10-07-American Int'l-Massachusetts",
  "American Int'l",
  0.0,
  0.0),
 ('Alfred\xa0Lindberg',
  "2023-10-07-American Int'l-Massachusetts",
  "American Int'l",
  0.0,
  0.0),
 ('Austen\xa0Long',
  "2023-10-07-American Int'l-Massachusetts",
  "American Int'l",
  0.0,
  0.0),
 ('Blake\xa0Wells',
  "2023-10-07-American Int'l-Massachusetts",
  "American Int'l",
  0.0,
  0.0),
 ('Brett\xa0Callahan',
  "2023-10-07-American Int'l-Massachusetts",
  "American Int'l",
  0.0,
  0.0)]

### Overall PP Success

In [8]:
### Find the Overall Power Play Success rate for the entire database

# Count the total number of Power Play (PP) goals from the scoring_summary table.
pp_goals_count = cursor.execute("SELECT COUNT(*) FROM scoring_summary WHERE PP != '';").fetchone()[0]

# Count the total number of Power Plays from the penalty_summary table.
total_pp_count = cursor.execute("SELECT COUNT(*) FROM penalty_summary;").fetchone()[0]

# Calculate the Power Play success rate.
pp_success_rate = pp_goals_count / total_pp_count

pp_success_rate


0.24649176327028677

In [11]:
# Adjusting the TEAM component query
team_query = """
    SELECT 
        ss.Game_ID,
        ss.Player,
        SUM(CASE WHEN ss.Team = amc.Team THEN ss.Player_Goals ELSE 0 END) * 0.15 AS Goals_For_Team,
        SUM(CASE WHEN ss.Team != amc.Team THEN ss.Player_Goals ELSE 0 END) * 0.15 AS Goals_Against_Team
    FROM scoring_summary ss
    JOIN advanced_metrics_combined amc ON ss.Game_ID = amc.Game_ID AND REPLACE(ss.Player, '\xa0', ' ') = REPLACE(amc.Player, '\xa0', ' ')
    GROUP BY ss.Game_ID, ss.Player
"""

# Checking the output for the TEAM component
team_data = cursor.execute(team_query).fetchall()
team_data[:5]


[("2023-10-07-American Int'l-Massachusetts", 'Brett Callahan', 0.0, 0.3),
 ("2023-10-07-American Int'l-Massachusetts", 'Brian Kramer', 0.0, 0.3),
 ("2023-10-07-American Int'l-Massachusetts", 'Logan Jenuwine', 0.0, 0.3),
 ('2023-10-07-Bowling Green-Robert Morris', 'Ben Wozney', 0.0, 0.3),
 ('2023-10-07-Bowling Green-Robert Morris', 'Dalton Norris', 0.0, 0.3)]

In [None]:
## Penalty Component



## Where to find these values in my DB
scoring_summary
    - Goals
    -First Assist
    -Second Assist

player_stats
    - Shots on Goal
    - FOW
    - FOL
    - PIM 
        - (Maybe worth weighting differently based on period and time it was taken)
        - Penalty late in a close game hurts a team more than something taken in the first perios
        - a penalty that is taken when already short handed hurts much more than one 5-on-5
            - I should be able to seperate out these types of occurences in the data from penalty_summary

CAN'T Get Penalties Drawn from current data

### Formulating the final 3 factors
GA & GF should only use even strength goals - need to figure out how to filter those

Shots blocked (overall - defensive) can be found in advanced metrics as well as SOG, Offensive shots blocked and shots missed net for each of these situations (total, close, even and PP)





## Corsi Differnal = Corsi For - Corsi Defence
Corsi is an advanced statistic used in the game of ice hockey to measure shot attempt differential while at even strength play. This includes shots on goal, missed shots on goal, and blocked shot attempts towards the opposition's net minus the same shot attempts directed at your own team's net.

History
The Corsi number was named by Tim Barnes, a financial analyst from Chicago working under the pseudonym Vic Ferrari. He had heard former Buffalo Sabres general manager Darcy Regier talking about shot differential on the radio, and then proceeded to develop a formula to accurately display shot differential. Ferrari originally wanted to name it the Regier number, but he didn't think it sounded right. He then considered calling it the Ruff number after former Buffalo Sabres head coach Lindy Ruff but he didn't think that was appropriate either. Ferrari ended up searching Buffalo Sabres staff, found a picture of Jim Corsi, and chose his name because he liked Corsi's mustache.[1]

Formulae
Corsi For (CF) = Shot attempts for at even strength: Shots + Blocks + Misses[2]
Corsi Against (CA) = Shot attempts against at even strength: Shots + Blocks + Misses
Corsi (C) = CF - CA
Corsi For % (CF%) = CF / (CF + CA)
Corsi For % Relative (CF% Rel) = CF% - CFOff%
Corsi Per 60 Minutes at Even Strength (C/60) = (CF - CA) * 60 / TOI
Relative Corsi per 60 Minutes at Even Strength (Crel/60) = CF/60 - CFoff/60 = On-Ice Corsi For / 60 Minutes - Off-Ice Corsi For / 60 Minutes