# Pitcher Data Demo

Inspired by: Thomas Nestico [(@TJStats)](https://x.com/TJStats)

The end result of this code will output a specific pitcher's outing from a specific game in Spring Training with specific values attributed to each type of pitch from the pitcher's pitch mix. This notebook will explain each attribute itself and how they are calculated. I wanted to try and replicate the incredible work that people like TJ put out for baseball fans so that I could better understand pitchers and the game we all love.

The end result will look as such:


In [1]:
import pandas as pd
pd.set_option("display.max_columns", None)  # Ensure all columns are displayed

def dropUnnamed(df):
  df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
  return(df)

df = pd.read_csv("pitch_type_counts_example.csv")

# Drop the unnamed columns by reassigning the result
df = dropUnnamed(df)

df

Unnamed: 0,Pitcher,Pitch,Count,Usage %,Velocity,Spin Rate,iVB,HB,VAA,HAA,vRel,hRel,Swings,Whiffs,Whiff%,CS,CSW%,Zone%,Chase%,Extension,Max Exit Velo,Batter
0,Jon Gray,Four-Seam Fastball,20,50.0,94.4,1938.4,15.1,8.5,-4.7,1.0,5.4,-1.6,11,0,0.0,4,20.0,60.0,37.5,6.3,98.2,Lourdes Gurriel Jr.
1,Jon Gray,Slider,15,37.5,86.7,2473.7,1.9,-3.4,-7.3,2.4,5.6,-1.6,9,3,33.3,3,40.0,66.7,40.0,6.1,109.4,Alek Thomas
2,Jon Gray,Changeup,3,7.5,87.3,1502.7,8.1,14.2,-6.7,0.6,5.5,-1.5,2,1,50.0,0,33.3,66.7,0.0,6.3,,
3,Jon Gray,Curveball,2,5.0,77.3,2622.5,-8.3,-10.6,-10.7,3.7,5.6,-1.5,0,0,,0,0.0,0.0,0.0,6.2,,


As we can see above, there are a lot of different attributes describing the pitches for Jon Gray.

We can see each pitch type that Jon has, as well as how many times he threw each pitch respectively (20 fastballs, 15 sliders, 3 changeups, 2 curveballs).

From that information we can calculate the next column, usage rate, by combining each pitch type count to give us a total amount of pitches thrown. With a total amount of pitches thrown, we can divide each pitch type count by the total to give us the usage rate.

I will go through this final output column by column and showcase the code used as well as an explanation for the code.

## Import Packages

In [2]:
# MLB Scraper Pitcher Data
import pandas as pd
import pybaseball as pyb
import numpy as np
from api_scraper import MLB_Scrape



## Display Options to ensure that all of the output is displayed


In [3]:
# Set display options to print all columns without truncation
pd.set_option("display.max_columns", None)  # Ensure all columns are displayed
pd.set_option("display.max_rows", None)  # Display all rows, be cautious with large DataFrames
pd.set_option("display.width", None)  # Remove column width limit

## Retrieving game data with MLB Scraper model by Tnestico

Specific game IDs can be found in baseball savant URLs

For example: `https://baseballsavant.mlb.com/gamefeed?gamePk=778935`

The last six digits `778935` at the end of the URL is the gameID for a Rangers/Diamondbacks Spring Training Game

We are retrieving this data from `scraper.get_data(game_list_input=[778935])` and assigning the data to the variable `game_data`

The following line converts the retrieved game data (stored in `game_data`) into a Polars DataFrame and is then stored in the variable `data_df`

The last line converts the Polars DataFrame (`data_df`) to a Pandas DataFrame (`pandas_df`). This is necessary so that we can utilize Pandas' features later on.

In [4]:
# Initialize the scraper
scraper = MLB_Scrape()

# Retrieve game data for the specific game ID
game_data = scraper.get_data(game_list_input=[778930])

# Convert the game data to a Polars DataFrame
data_df = scraper.get_data_df(data_list=game_data)

# Convert the Polars DataFrame to a Pandas DataFrame
pandas_df = data_df.to_pandas()

This May Take a While. Progress Bar shows Completion of Data Retrieval.


Processing: 100%|██████████| 1/1 [00:00<00:00,  2.90iteration/s]

Converting Data to Dataframe.





We can now rename `pandas_df` to `df_pyb` for convenience's sake.

We will also print out the first few lines of the dataframe so that we can see all the data we have to work with.

In [5]:
df_pyb = pandas_df
df_pyb.head(5)

Unnamed: 0,game_id,game_date,batter_id,batter_name,batter_hand,batter_team,batter_team_id,pitcher_id,pitcher_name,pitcher_hand,pitcher_team,pitcher_team_id,ab_number,play_description,play_code,in_play,is_strike,is_swing,is_whiff,is_out,is_ball,is_review,pitch_type,pitch_description,strikes,balls,outs,strikes_after,balls_after,outs_after,start_speed,end_speed,sz_top,sz_bot,x,y,ax,ay,az,pfxx,pfxz,px,pz,vx0,vy0,vz0,x0,y0,z0,zone,type_confidence,plate_time,extension,spin_rate,spin_direction,vb,ivb,hb,launch_speed,launch_angle,launch_distance,launch_location,trajectory,hardness,hit_x,hit_y,index_play,play_id,start_time,end_time,is_pitch,type_type,type_ab,event,event_type,rbi,away_score,home_score
0,778930,2025-03-08,664728,Kyle Isbel,L,KC,118,594798,Jacob deGrom,R,TEX,140,0,Called Strike,C,False,True,,,,False,False,FF,Four-Seam Fastball,0,0,0,1,0,0,96.7,86.9,3.112,1.571,138.67,158.03,-9.311432,36.991917,-13.463718,-4.729828,9.506652,-0.568426,2.99064,3.684679,-140.620785,-3.753477,-1.292576,50.005609,5.238767,1,0.87,0.392106,6.617522,2488,218,-13.2,16.4,7.7,,,,,,,,,3,69d4bc86-9b99-470f-b28d-2891237b596d,2025-03-08T20:04:52.994Z,2025-03-08T20:04:56.709Z,True,pitch,,,,,,
1,778930,2025-03-08,664728,Kyle Isbel,L,KC,118,594798,Jacob deGrom,R,TEX,140,0,Ball,B,False,False,,,,False,False,SL,Slider,1,0,0,1,1,0,90.4,83.3,3.112,1.571,84.51,189.87,4.841059,27.310885,-27.581958,2.750653,2.611083,0.852268,1.811565,4.716463,-131.667965,-3.864862,-1.318032,50.005851,5.333908,14,0.9,0.415606,6.527363,2707,135,-29.3,4.1,-6.0,,,,,,,,,4,79ea53cb-c8ee-443b-8794-69aa77f51ddd,2025-03-08T20:05:05.582Z,2025-03-08T20:05:09.545Z,True,pitch,,,,,,
2,778930,2025-03-08,664728,Kyle Isbel,L,KC,118,594798,Jacob deGrom,R,TEX,140,0,Swinging Strike,S,False,True,True,True,,False,False,FF,Four-Seam Fastball,1,1,0,2,1,0,97.4,87.1,3.112,1.571,123.68,146.95,-12.150851,39.224385,-14.41611,-6.122682,8.950397,-0.175286,3.401107,5.341128,-141.604586,-2.502514,-1.311988,50.00431,5.245148,11,0.86,0.390236,6.574951,2613,221,-13.7,15.7,9.7,,,,,,,,,5,a74593b2-018c-4f7b-bf83-749c0ce519f1,2025-03-08T20:05:21.721Z,2025-03-08T20:05:24.747Z,True,pitch,,,,,,
3,778930,2025-03-08,664728,Kyle Isbel,L,KC,118,594798,Jacob deGrom,R,TEX,140,0,Ball,B,False,False,,,,False,False,FF,Four-Seam Fastball,2,1,0,2,2,0,97.2,87.1,3.112,1.571,126.26,127.69,-8.183513,38.188363,-13.695679,-4.126333,9.319007,-0.242937,4.11439,4.204736,-141.341535,-0.883479,-1.228199,50.004074,5.327976,11,0.88,0.390507,6.576693,2619,219,-12.6,16.8,6.4,,,,,,,,,7,407c9722-96a7-47fa-aad9-b4d0d349b2d9,2025-03-08T20:05:42.219Z,2025-03-08T20:05:45.660Z,True,pitch,,,,,,
4,778930,2025-03-08,664728,Kyle Isbel,L,KC,118,594798,Jacob deGrom,R,TEX,140,0,Called Strike,C,False,True,,,True,True,False,SL,Slider,2,2,0,3,2,0,90.7,83.4,3.112,1.571,128.11,183.4,6.132826,27.946679,-27.213983,3.466522,2.804646,-0.291437,2.051256,1.329798,-132.117789,-3.487453,-1.251604,50.004464,5.386846,4,0.9,0.414456,6.33956,2695,168,-28.6,4.6,-6.6,,,,,,,,,8,a213d696-f781-4536-b964-8fc11c0dce1c,2025-03-08T20:05:57.406Z,2025-03-08T20:06:00.641Z,True,pitch,atBat,Strikeout,strikeout,0.0,0.0,0.0


If we print out the shape of `df_pyb` we can see how much data there is in this game.

In [6]:
df_pyb.shape

(243, 78)

If each row is thought of as one pitch, then there were 304 pitches thrown.

Since we are only looking to gather data for one specific pitcher, we can filter out the dataframe to only return rows (or pitches) where `pitcher_name` is equal to the pitcher's name.

For this notebook, we will be looking at the data from Jon Gray. Considering Jon Gray is a starting pitcher, we can assume that he would have one of the higher pitch counts for this game. This will result in a larger sample size for us to make calculations from.

In [7]:
df_pyb = df_pyb[(df_pyb["pitcher_name"] == "Jacob deGrom")]
print(df_pyb.shape)
df_pyb.head(5)

(31, 78)


Unnamed: 0,game_id,game_date,batter_id,batter_name,batter_hand,batter_team,batter_team_id,pitcher_id,pitcher_name,pitcher_hand,pitcher_team,pitcher_team_id,ab_number,play_description,play_code,in_play,is_strike,is_swing,is_whiff,is_out,is_ball,is_review,pitch_type,pitch_description,strikes,balls,outs,strikes_after,balls_after,outs_after,start_speed,end_speed,sz_top,sz_bot,x,y,ax,ay,az,pfxx,pfxz,px,pz,vx0,vy0,vz0,x0,y0,z0,zone,type_confidence,plate_time,extension,spin_rate,spin_direction,vb,ivb,hb,launch_speed,launch_angle,launch_distance,launch_location,trajectory,hardness,hit_x,hit_y,index_play,play_id,start_time,end_time,is_pitch,type_type,type_ab,event,event_type,rbi,away_score,home_score
0,778930,2025-03-08,664728,Kyle Isbel,L,KC,118,594798,Jacob deGrom,R,TEX,140,0,Called Strike,C,False,True,,,,False,False,FF,Four-Seam Fastball,0,0,0,1,0,0,96.7,86.9,3.112,1.571,138.67,158.03,-9.311432,36.991917,-13.463718,-4.729828,9.506652,-0.568426,2.99064,3.684679,-140.620785,-3.753477,-1.292576,50.005609,5.238767,1,0.87,0.392106,6.617522,2488,218,-13.2,16.4,7.7,,,,,,,,,3,69d4bc86-9b99-470f-b28d-2891237b596d,2025-03-08T20:04:52.994Z,2025-03-08T20:04:56.709Z,True,pitch,,,,,,
1,778930,2025-03-08,664728,Kyle Isbel,L,KC,118,594798,Jacob deGrom,R,TEX,140,0,Ball,B,False,False,,,,False,False,SL,Slider,1,0,0,1,1,0,90.4,83.3,3.112,1.571,84.51,189.87,4.841059,27.310885,-27.581958,2.750653,2.611083,0.852268,1.811565,4.716463,-131.667965,-3.864862,-1.318032,50.005851,5.333908,14,0.9,0.415606,6.527363,2707,135,-29.3,4.1,-6.0,,,,,,,,,4,79ea53cb-c8ee-443b-8794-69aa77f51ddd,2025-03-08T20:05:05.582Z,2025-03-08T20:05:09.545Z,True,pitch,,,,,,
2,778930,2025-03-08,664728,Kyle Isbel,L,KC,118,594798,Jacob deGrom,R,TEX,140,0,Swinging Strike,S,False,True,True,True,,False,False,FF,Four-Seam Fastball,1,1,0,2,1,0,97.4,87.1,3.112,1.571,123.68,146.95,-12.150851,39.224385,-14.41611,-6.122682,8.950397,-0.175286,3.401107,5.341128,-141.604586,-2.502514,-1.311988,50.00431,5.245148,11,0.86,0.390236,6.574951,2613,221,-13.7,15.7,9.7,,,,,,,,,5,a74593b2-018c-4f7b-bf83-749c0ce519f1,2025-03-08T20:05:21.721Z,2025-03-08T20:05:24.747Z,True,pitch,,,,,,
3,778930,2025-03-08,664728,Kyle Isbel,L,KC,118,594798,Jacob deGrom,R,TEX,140,0,Ball,B,False,False,,,,False,False,FF,Four-Seam Fastball,2,1,0,2,2,0,97.2,87.1,3.112,1.571,126.26,127.69,-8.183513,38.188363,-13.695679,-4.126333,9.319007,-0.242937,4.11439,4.204736,-141.341535,-0.883479,-1.228199,50.004074,5.327976,11,0.88,0.390507,6.576693,2619,219,-12.6,16.8,6.4,,,,,,,,,7,407c9722-96a7-47fa-aad9-b4d0d349b2d9,2025-03-08T20:05:42.219Z,2025-03-08T20:05:45.660Z,True,pitch,,,,,,
4,778930,2025-03-08,664728,Kyle Isbel,L,KC,118,594798,Jacob deGrom,R,TEX,140,0,Called Strike,C,False,True,,,True,True,False,SL,Slider,2,2,0,3,2,0,90.7,83.4,3.112,1.571,128.11,183.4,6.132826,27.946679,-27.213983,3.466522,2.804646,-0.291437,2.051256,1.329798,-132.117789,-3.487453,-1.251604,50.004464,5.386846,4,0.9,0.414456,6.33956,2695,168,-28.6,4.6,-6.6,,,,,,,,,8,a213d696-f781-4536-b964-8fc11c0dce1c,2025-03-08T20:05:57.406Z,2025-03-08T20:06:00.641Z,True,pitch,atBat,Strikeout,strikeout,0.0,0.0,0.0


We can check the shape of the newly filtered DataFrame and can see that there are 40 rows in which `pitcher_name` is equal to Jon Gray.

We can also print out the first 5 lines of the DataFrame to see that we still have all the data we had originally, only that it now pertains specifically to Jon Gray.

There are a lot of columns that we do not need. If we want to find out Jon's pitch mix and how many times he threw each pitch, we can create a new DataFrame. This new DataFrame will contain the columns that we do want, and not ones that we don't want.

In [8]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
    ]
]
pitcher_pyb.head(5)

Unnamed: 0,game_id,game_date,pitcher_name,pitch_description
0,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball
1,778930,2025-03-08,Jacob deGrom,Slider
2,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball
3,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball
4,778930,2025-03-08,Jacob deGrom,Slider


Now we have a DataFrame that is still associated to Jon Gray, but is more concise and returns everything we need to calculate his total pitches thrown and find his usage rate for each pitch type.

One way to quickly sum the total pitches thrown is by creating a new column on the DataFrame named `PitchesThrown`.

In [9]:
pitcher_pyb.loc[:, "PitchesThrown"] = 1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pitcher_pyb.loc[:, "PitchesThrown"] = 1


Now with the `PitchesThrown` column created, we can create a DataFrame for pitch counts, total pitches, and a usage rate.

In [10]:
pitch_type_counts = pitcher_pyb.groupby("pitch_description", as_index=False)["PitchesThrown"].sum()
pitch_type_counts

Unnamed: 0,pitch_description,PitchesThrown
0,Curveball,2
1,Four-Seam Fastball,16
2,Slider,13


We now have the amount of pitches thrown for each pitch type

We can sort the data by most pitches thrown for each pitch type

In [11]:
pitch_type_counts = pitch_type_counts.sort_values(by="PitchesThrown", ascending=False)
pitch_type_counts

Unnamed: 0,pitch_description,PitchesThrown
1,Four-Seam Fastball,16
2,Slider,13
0,Curveball,2


Now we can calculate the total amout of pitches thrown

In [12]:
total_pitches = pitch_type_counts['PitchesThrown'].sum()
total_pitches

31

For the usage rate, we can create column `Usage` and divide `PitchesThrown` by `total_pitches`

We will also multiply `Usage` by 100 as well as round the answer to three decimal places

In [13]:
pitch_type_counts['Usage %'] = ((pitch_type_counts['PitchesThrown']/total_pitches)*100).round(3)
pitch_type_counts

Unnamed: 0,pitch_description,PitchesThrown,Usage %
1,Four-Seam Fastball,16,51.613
2,Slider,13,41.935
0,Curveball,2,6.452


The next attribute to calculate is the average velocity for each pitch.

The first thing to do is update how we filtered the data originally. We need to add the column `start_speed`.

In [14]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
    ]
]

Now we can do what we previously did for `pitch_type_counts`.

Since `start_speed` is already a column in the raw data, we can simply use `.mean()` to find the average for each `pitch_description`. Additionally, we can round the average velocity to 1 decimal place.

In [15]:
pitch_type_velo = pitcher_pyb.groupby(['pitcher_name','pitch_description'],as_index=False)['start_speed'].mean().round(1)
pitch_type_velo

Unnamed: 0,pitcher_name,pitch_description,start_speed
0,Jacob deGrom,Curveball,82.9
1,Jacob deGrom,Four-Seam Fastball,97.0
2,Jacob deGrom,Slider,90.3


The next thing we must do is merge our `pitch_type_counts` DataFrame and our `pitch_type_velo` DataFrame so that we can use our velocity we calculated on `pitch_type_counts`.

In [16]:
pitch_type_counts = (pitch_type_counts.merge(pitch_type_velo, on="pitch_description", how="left"))

Now, when we print `pitch_type_counts` we have `start_speed` as a column for each `pitch_description`.

In [17]:
pitch_type_counts

Unnamed: 0,pitch_description,PitchesThrown,Usage %,pitcher_name,start_speed
0,Four-Seam Fastball,16,51.613,Jacob deGrom,97.0
1,Slider,13,41.935,Jacob deGrom,90.3
2,Curveball,2,6.452,Jacob deGrom,82.9


Next, we can calculate the average spin rate for each pitch type following the same pattern as before.

First step is to update `pitcher_pyb` and include `spin_rate` from the scraped data.

In [18]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
        "spin_rate"
    ]
]

After that, it is the same process as before when we calculated average velocity.

In [19]:
pitch_type_spin_rate = (pitcher_pyb.groupby("pitch_description", as_index=False)["spin_rate"].mean()).round(1)
pitch_type_spin_rate

Unnamed: 0,pitch_description,spin_rate
0,Curveball,2691.5
1,Four-Seam Fastball,2529.2
2,Slider,2612.9


Followed by a merging of `pitch_type_spin_rate` and `pitch_type_counts`.

We can also quickly re-order the columns in the output

In [20]:
pitch_type_counts = (pitch_type_counts.merge(pitch_type_spin_rate, on="pitch_description", how="left"))

pitch_type_counts = pitch_type_counts[
    [
        "pitcher_name",
        "pitch_description",
        "PitchesThrown",
        "Usage %",
        "start_speed",
        "spin_rate",
    ]
]

pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5


## Induced Vertical Break (iVB)

The next value to calcuate is the Induced Vertical Break (iVB) for each pitch type.

Per [Fangraphs](https://blogs.fangraphs.com/a-visual-scouting-primer-pitching-part-two/), Induced Vertical Break aims to quantify a pitcher’s ability to fight gravity.

iVB does not require a calculation from our end as it is already listed in the raw data as `ivb`. We can simply update the `pitcher_pyb` to include this.

In [21]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
        "spin_rate",
        "ivb"
    ]
]

pitch_type_ivb = (pitcher_pyb.groupby("pitch_description", as_index=False)["ivb"].mean()).round(1)
pitch_type_ivb

Unnamed: 0,pitch_description,ivb
0,Curveball,-1.5
1,Four-Seam Fastball,16.6
2,Slider,4.1


Horizontal break is the amount of lateral (side-to-side) movement a pitch experiences due to spin, measured in inches.

We can get the horizontal break the same way as Induced Vertical Break.

In [22]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
        "spin_rate",
        "ivb",
        "hb"
    ]
]

pitch_type_hb = (pitcher_pyb.groupby("pitch_description", as_index=False)["hb"].mean()).round(1)
pitch_type_hb

Unnamed: 0,pitch_description,hb
0,Curveball,-6.4
1,Four-Seam Fastball,8.8
2,Slider,-5.3


Now, we can merge these two new DataFrames to `pitch_type_counts` and then see how our output is looking.

In [23]:
pitch_type_counts = (pitch_type_counts.merge(pitch_type_ivb, on="pitch_description", how="left"))
pitch_type_counts = (pitch_type_counts.merge(pitch_type_hb, on="pitch_description", how="left"))
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4


## vRel and hRel

The next piece of information to gather for our pitcher is their `vRel` and `hRel` values.

`vRel` is the vertical release position of the ball, measured in feet, from the catcher's perspective

Similarly, `hRel` is the horizontal release position of the ball, measured in feet, from the catcher's perspective

These values are listed as `z0` and `x0` in the raw data where `z0` is `vRel` and `x0` is `hRel`

We can update `pitcher_pyb` to include these values, create a new DataFrame for each value, calculate the mean of each value for each pitch type, check the output and merge the DataFrames to `pitch_type_counts`. Additionally, we can rename `z0` and `x0` to `vRel` and `hRel` respectively.

In [24]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
        "spin_rate",
        "ivb",
        "hb",
        "z0",
        "x0",
        "pitcher_hand"
    ]
]

In [25]:
pitch_type_vrel = (pitcher_pyb.groupby("pitch_description", as_index=False)["z0"].mean()).round(1)
pitch_type_vrel.rename(columns={"z0": "vRel"}, inplace=True)
pitch_type_vrel

Unnamed: 0,pitch_description,vRel
0,Curveball,5.5
1,Four-Seam Fastball,5.2
2,Slider,5.3


In [26]:
pitch_type_hrel = (pitcher_pyb.groupby("pitch_description", as_index=False)["x0"].mean()).round(1)
pitch_type_hrel.rename(columns={"x0": "hRel"}, inplace=True)

# Adjust hRel if pitcher_hand is 'L'
pitch_type_hrel["hRel"] = pitch_type_hrel.apply(
    lambda row: -row["hRel"] if pitcher_pyb[pitcher_pyb["pitch_description"] == row["pitch_description"]]["pitcher_hand"].iloc[0] == 'L' else row["hRel"],
    axis=1
)

pitch_type_hrel

Unnamed: 0,pitch_description,hRel
0,Curveball,-1.2
1,Four-Seam Fastball,-1.2
2,Slider,-1.2


In [27]:
pitch_type_counts = (pitch_type_counts.merge(pitch_type_vrel, on="pitch_description", how="left"))
pitch_type_counts = (pitch_type_counts.merge(pitch_type_hrel, on="pitch_description", how="left"))
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2


## Whiffs

One thing that is nice to know about a pitcher's performance is how many whiffs they generated.

A whiff refers to a swing and miss (when a batter swings at a pitch but fails to make contact.)

Pitchers with a lot of whiffs (or a high Whiff%, a metric we will calculate later) tend to be effective strikeout pitchers, as more swings and misses (whiffs) lead to more strikeouts.

Fortunately, the raw data has a column called `is_whiff` has the value `True` when a swing and miss occurs, and `NaN` when a whiff did not occur.

We can add `is_whiff` to `pitcher_pyb` and create a new DataFrame `whiff_pitches` that returns the rows where a whiff has occured.

In [28]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
        "spin_rate",
        "ivb",
        "hb",
        "z0",
        "x0",
        "pitcher_hand",
        "is_whiff"
    ]
]

whiff_pitches = pitcher_pyb[pitcher_pyb["is_whiff"] == True]
whiff_pitches

Unnamed: 0,game_id,game_date,pitcher_name,pitch_description,start_speed,spin_rate,ivb,hb,z0,x0,pitcher_hand,is_whiff
2,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.4,2613,15.7,9.7,5.245148,-1.311988,R,True
8,778930,2025-03-08,Jacob deGrom,Slider,91.5,2504,3.7,-3.8,5.291237,-1.132274,R,True
44,778930,2025-03-08,Jacob deGrom,Slider,89.3,2614,3.0,-4.8,5.274827,-1.254165,R,True
46,778930,2025-03-08,Jacob deGrom,Slider,90.5,2495,4.2,-3.6,5.240181,-1.236075,R,True


From here we can then create a `pitch_type_whiff` DataFrame that calculates the total amount of whiffs for each pitch type and merge this DataFrame to `pitch_type_counts`

In [29]:
pitch_type_whiff = (
    whiff_pitches.groupby("pitch_description").size().reset_index(name="Whiffs")
)
pitch_type_whiff

Unnamed: 0,pitch_description,Whiffs
0,Four-Seam Fastball,1
1,Slider,3


We know that Gray threw 4 pitches this day, but only 2 of his 4 pitches generated whiffs. We can assume that the other half of his arsenal did not generate any whiffs for this outing.

During the merge to `pitch_type_counts` we can replace `NaN` with 0, and ensure the pitch types that did generate whiffs have the whiffs represented as an integer as a pitcher can only have a whole number of whiffs.

In [30]:
pitch_type_counts = (
    pitch_type_counts.merge(pitch_type_whiff, on="pitch_description", how="left")
    .fillna({"Whiffs": 0})
    .astype({"Whiffs": int})
)
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0


## Whiff%

Knowing the amount of whiffs generated is important, but another metric that is useful is a pitchers Whiff%.

Whiff% measures how often a batter swings and misses at a pitch, relative to their total swings. It is a key stat for evaluating a pitcher's ability to generate swings and misses.

Since we already know how many whiffs there are for each pitch type, we simply need to calculate the total amount of swings for each pitch type.

Once we have the total amount of swings, calculating Whiff% is quite simple.

Similar to `is_whiff`, `is_swing` exists within our data and also has `True` or `NaN` to indicate when a batter swung.

We can add `is_swing` to `pitcher_pyb` and use the same logic behind calculating `Whiffs` to calculate `Whiff%`.

In [31]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
        "spin_rate",
        "ivb",
        "hb",
        "z0",
        "x0",
        "pitcher_hand",
        "is_whiff",
        "is_swing"
    ]
]

Here we are creating a DataFrame `swings` that will return every row in which `is_swing` is True

In [32]:
swings = pitcher_pyb[pitcher_pyb["is_swing"] == True]

Now we are grouping each pitch type with the total amount of swings each pitch type generated

In [33]:
pitch_type_swing = (
    swings.groupby("pitch_description").size().reset_index(name="Swings")
)
pitch_type_swing

Unnamed: 0,pitch_description,Swings
0,Four-Seam Fastball,7
1,Slider,7


Since there appears to be one pitch type that did not generate >0 swings, we can replace the NaN value it would return with a 0 and ensure that swings are represented as an integer.

In [34]:
pitch_type_counts = (
    pitch_type_counts.merge(pitch_type_swing, on="pitch_description", how="left")
    .fillna({"Swings": 0})
    .astype({"Swings": int})
)

pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0


Now that we have both `Whiffs` and `Swings`, we can calulate `Whiff%` as the amount of `Whiffs` divided by the amount of `Swings`.

If a pitch type did not have any swings, then the result will be `NaN` since it contain division by zero.

In [35]:
pitch_type_counts["Whiff%"] = (
    (pitch_type_counts["Whiffs"] / pitch_type_counts["Swings"]) * 100
).round(1)

pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,


## Called Strikes

In [36]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
        "spin_rate",
        "ivb",
        "hb",
        "z0",
        "x0",
        "pitcher_hand",
        "is_whiff",
        "is_swing",
        "play_code"
    ]
]

In [37]:
strike_pitches = pitcher_pyb[pitcher_pyb["play_code"] == "C"]

In [38]:
pitch_type_cs = (
    strike_pitches.groupby("pitch_description").size().reset_index(name="CS")
)
pitch_type_cs

Unnamed: 0,pitch_description,CS
0,Curveball,2
1,Four-Seam Fastball,3
2,Slider,2


In [39]:
pitch_type_counts = (
    pitch_type_counts.merge(pitch_type_cs, on="pitch_description", how="left")
    .fillna({"CS": 0})
    .astype({"CS": int})
)

pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2


In [40]:
pitch_type_counts["CS+Whiffs"] = (pitch_type_counts["CS"] + pitch_type_counts["Whiffs"])
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS,CS+Whiffs
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3,4
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2,5
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2,2


## CSW%

In [41]:
pitch_type_counts["CSW%"] = ((pitch_type_counts["CS+Whiffs"] / pitch_type_counts["PitchesThrown"])*100).round(1)
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS,CS+Whiffs,CSW%
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3,4,25.0
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2,5,38.5
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2,2,100.0


## Chase%

Chase% is the percentage of swings at pitches outside the zone / pitches outside the zone

In [42]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
        "spin_rate",
        "ivb",
        "hb",
        "z0",
        "x0",
        "pitcher_hand",
        "is_whiff",
        "is_swing",
        "play_code",
        "zone"
    ]
]

In [43]:
pitches_out_of_zone = pitcher_pyb[pitcher_pyb["zone"] > 10]
pitches_out_of_zone

Unnamed: 0,game_id,game_date,pitcher_name,pitch_description,start_speed,spin_rate,ivb,hb,z0,x0,pitcher_hand,is_whiff,is_swing,play_code,zone
1,778930,2025-03-08,Jacob deGrom,Slider,90.4,2707,4.1,-6.0,5.333908,-1.318032,R,,,B,14
2,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.4,2613,15.7,9.7,5.245148,-1.311988,R,True,True,S,11
3,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.2,2619,16.8,6.4,5.327976,-1.228199,R,,,B,11
6,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.0,2619,16.6,7.1,5.311797,-1.296258,R,,,B,11
8,778930,2025-03-08,Jacob deGrom,Slider,91.5,2504,3.7,-3.8,5.291237,-1.132274,R,True,True,S,14
9,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.8,2480,16.0,9.7,5.341977,-1.304947,R,,,B,11
10,778930,2025-03-08,Jacob deGrom,Slider,91.4,2544,4.0,-4.6,5.2653,-1.081131,R,,,B,14
12,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,96.8,2566,16.9,11.0,5.288647,-1.334337,R,,,B,11
36,778930,2025-03-08,Jacob deGrom,Slider,89.7,2586,5.6,-3.8,5.263818,-1.076325,R,,,B,14
38,778930,2025-03-08,Jacob deGrom,Slider,89.1,2640,2.4,-9.1,5.345405,-1.22588,R,,True,F,14


In [44]:
pitch_type_ball = (
    pitches_out_of_zone.groupby("pitch_description").size().reset_index(name="Ball")
)
pitch_type_ball

Unnamed: 0,pitch_description,Ball
0,Curveball,1
1,Four-Seam Fastball,9
2,Slider,8


In [45]:
pitch_type_counts = (
    pitch_type_counts.merge(pitch_type_ball, on="pitch_description", how="left")
    .fillna({"Ball": 0})
    .astype({"Ball": int})
)
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS,CS+Whiffs,CSW%,Ball
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3,4,25.0,9
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2,5,38.5,8
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2,2,100.0,1


In [46]:
swings_out_of_zone = pitcher_pyb[(pitcher_pyb["is_swing"] == True) & (pitcher_pyb["zone"] > 10)]
swings_out_of_zone

Unnamed: 0,game_id,game_date,pitcher_name,pitch_description,start_speed,spin_rate,ivb,hb,z0,x0,pitcher_hand,is_whiff,is_swing,play_code,zone
2,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.4,2613,15.7,9.7,5.245148,-1.311988,R,True,True,S,11
8,778930,2025-03-08,Jacob deGrom,Slider,91.5,2504,3.7,-3.8,5.291237,-1.132274,R,True,True,S,14
38,778930,2025-03-08,Jacob deGrom,Slider,89.1,2640,2.4,-9.1,5.345405,-1.22588,R,,True,F,14
39,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.3,2457,16.8,7.6,5.260808,-1.096468,R,,True,F,12
41,778930,2025-03-08,Jacob deGrom,Slider,90.8,2666,4.7,-5.2,5.247133,-1.313631,R,,True,X,14
46,778930,2025-03-08,Jacob deGrom,Slider,90.5,2495,4.2,-3.6,5.240181,-1.236075,R,True,True,S,14


In [47]:
pitch_type_chase = (
    swings_out_of_zone.groupby("pitch_description").size().reset_index(name="Chase")
)
pitch_type_chase

Unnamed: 0,pitch_description,Chase
0,Four-Seam Fastball,2
1,Slider,4


In [48]:
pitch_type_counts = (
    pitch_type_counts.merge(pitch_type_chase, on="pitch_description", how="left")
    .fillna({"Chase": 0})
    .astype({"Chase": int})
)
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS,CS+Whiffs,CSW%,Ball,Chase
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3,4,25.0,9,2
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2,5,38.5,8,4
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2,2,100.0,1,0


In [49]:
pitch_type_counts["Chase%"] = (pitch_type_counts["Chase"] / pitch_type_counts["Ball"]*100).round(1)
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS,CS+Whiffs,CSW%,Ball,Chase,Chase%
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3,4,25.0,9,2,22.2
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2,5,38.5,8,4,50.0
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2,2,100.0,1,0,0.0


## Zone%

Zone% is the percentage of pitches in the strike zone / total pitches

In [50]:
pitches_in_the_zone = pitcher_pyb[pitcher_pyb["zone"] < 10]
pitches_in_the_zone

Unnamed: 0,game_id,game_date,pitcher_name,pitch_description,start_speed,spin_rate,ivb,hb,z0,x0,pitcher_hand,is_whiff,is_swing,play_code,zone
0,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,96.7,2488,16.4,7.7,5.238767,-1.292576,R,,,C,1
4,778930,2025-03-08,Jacob deGrom,Slider,90.7,2695,4.6,-6.6,5.386846,-1.251604,R,,,C,4
5,778930,2025-03-08,Jacob deGrom,Slider,90.4,2666,5.2,-6.8,5.357511,-1.296493,R,,,C,6
7,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,96.3,2574,16.6,9.4,5.239546,-1.061915,R,,True,F,2
11,778930,2025-03-08,Jacob deGrom,Slider,89.2,2606,3.4,-5.9,5.36936,-1.248093,R,,True,F,9
13,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.0,2524,15.4,10.4,5.241751,-1.157758,R,,True,F,4
14,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,98.0,2501,17.5,9.1,5.255657,-1.18879,R,,True,F,1
15,778930,2025-03-08,Jacob deGrom,Slider,90.8,2605,5.3,-5.4,5.380291,-1.256994,R,,True,X,5
35,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,95.9,2456,16.6,10.1,5.200358,-1.287032,R,,,C,6
37,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,95.8,2514,15.7,8.5,5.239418,-1.255339,R,,True,F,2


In [51]:
pitch_type_strike = (
    pitches_in_the_zone.groupby("pitch_description").size().reset_index(name="Strike")
)
pitch_type_strike

Unnamed: 0,pitch_description,Strike
0,Curveball,1
1,Four-Seam Fastball,7
2,Slider,5


In [52]:
pitch_type_counts = (
    pitch_type_counts.merge(pitch_type_strike, on="pitch_description", how="left")
    .fillna({"Strike": 0})
    .astype({"Strike": int})
)
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS,CS+Whiffs,CSW%,Ball,Chase,Chase%,Strike
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3,4,25.0,9,2,22.2,7
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2,5,38.5,8,4,50.0,5
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2,2,100.0,1,0,0.0,1


In [53]:
pitch_type_counts["Zone%"] = (pitch_type_counts["Strike"] / pitch_type_counts["PitchesThrown"]*100).round(1)
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS,CS+Whiffs,CSW%,Ball,Chase,Chase%,Strike,Zone%
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3,4,25.0,9,2,22.2,7,43.8
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2,5,38.5,8,4,50.0,5,38.5
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2,2,100.0,1,0,0.0,1,50.0


## Vertical Approach Angle (VAA)


In [54]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
        "spin_rate",
        "ivb",
        "hb",
        "z0",
        "x0",
        "pitcher_hand",
        "is_whiff",
        "is_swing",
        "play_code",
        "zone",
        "ax",
        "ay",
        "az",
        "vx0",
        "vy0",
        "vz0",
    ]
]

In [55]:
y0 = 50  # Release y-position (feet)
yf = 17 / 12  # Home plate y-position (feet)

In [56]:
pitcher_pyb.loc[:, "vy_f"] = -np.sqrt(
    pitcher_pyb["vy0"] ** 2 - (2 * pitcher_pyb["ay"] * (y0 - yf))
)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pitcher_pyb.loc[:, "vy_f"] = -np.sqrt(


In [57]:
# Compute time (t)
pitcher_pyb.loc[:, "t"] = (pitcher_pyb["vy_f"] - pitcher_pyb["vy0"]) / pitcher_pyb["ay"]

# Compute final z-velocity (vz_f)
pitcher_pyb.loc[:, "vz_f"] = pitcher_pyb["vz0"] + (pitcher_pyb["az"] * pitcher_pyb["t"])

# Compute final x-velocity (vx_f)
pitcher_pyb.loc[:, "vx_f"] = pitcher_pyb["vx0"] + (pitcher_pyb["ax"] * pitcher_pyb["t"])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pitcher_pyb.loc[:, "t"] = (pitcher_pyb["vy_f"] - pitcher_pyb["vy0"]) / pitcher_pyb["ay"]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pitcher_pyb.loc[:, "vz_f"] = pitcher_pyb["vz0"] + (pitcher_pyb["az"] * pitcher_pyb["t"])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pitcher_pyb.loc[:, "vx_f"] 

In [58]:
# Compute VAA
pitcher_pyb.loc[:, "VAA"] = -np.arctan(pitcher_pyb["vz_f"] / pitcher_pyb["vy_f"]) * (
    180 / np.pi
)

pitcher_pyb.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pitcher_pyb.loc[:, "VAA"] = -np.arctan(pitcher_pyb["vz_f"] / pitcher_pyb["vy_f"]) * (


Unnamed: 0,game_id,game_date,pitcher_name,pitch_description,start_speed,spin_rate,ivb,hb,z0,x0,pitcher_hand,is_whiff,is_swing,play_code,zone,ax,ay,az,vx0,vy0,vz0,vy_f,t,vz_f,vx_f,VAA
0,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,96.7,2488,16.4,7.7,5.238767,-1.292576,R,,,C,1,-9.311432,36.991917,-13.463718,3.684679,-140.620785,-3.753477,-127.199936,0.362805,-8.638179,0.306446,-3.885006
1,778930,2025-03-08,Jacob deGrom,Slider,90.4,2707,4.1,-6.0,5.333908,-1.318032,R,,,B,14,4.841059,27.310885,-27.581958,4.716463,-131.667965,-3.864862,-121.172379,0.3843,-14.464622,6.576884,-6.807315
2,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.4,2613,15.7,9.7,5.245148,-1.311988,R,True,True,S,11,-12.150851,39.224385,-14.41611,5.341128,-141.604586,-2.502514,-127.43844,0.361157,-7.708987,0.952767,-3.461709
3,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.2,2619,16.8,6.4,5.327976,-1.228199,R,,,B,11,-8.183513,38.188363,-13.695679,4.204736,-141.341535,-0.883479,-127.54134,0.361372,-5.83271,1.247446,-2.618421
4,778930,2025-03-08,Jacob deGrom,Slider,90.7,2695,4.6,-6.6,5.386846,-1.251604,R,,,C,4,6.132826,27.946679,-27.213983,1.329798,-132.117789,-3.487453,-121.406855,0.383263,-13.917571,3.680285,-6.5396


In [59]:
vaa_means = (
    pitcher_pyb.groupby("pitch_description", as_index=False)["VAA"].mean()
).round(1)

vaa_means

Unnamed: 0,pitch_description,VAA
0,Curveball,-7.2
1,Four-Seam Fastball,-3.7
2,Slider,-6.9


In [60]:
pitch_type_counts = (
    pitch_type_counts.merge(vaa_means, on="pitch_description", how="left")
)
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS,CS+Whiffs,CSW%,Ball,Chase,Chase%,Strike,Zone%,VAA
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3,4,25.0,9,2,22.2,7,43.8,-3.7
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2,5,38.5,8,4,50.0,5,38.5,-6.9
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2,2,100.0,1,0,0.0,1,50.0,-7.2


Horizontal Approach Angle (HAA)

In [61]:
# Compute HAA
pitcher_pyb.loc[:, "HAA"] = -np.arctan(pitcher_pyb["vx_f"] / pitcher_pyb["vy_f"]) * (
    180 / np.pi
)

# Adjust HAA for left-handed pitchers (pitcher_hand = 'L')
pitcher_pyb.loc[pitcher_pyb["pitcher_hand"] == 'L', "HAA"] *= -1

pitcher_pyb.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pitcher_pyb.loc[:, "HAA"] = -np.arctan(pitcher_pyb["vx_f"] / pitcher_pyb["vy_f"]) * (


Unnamed: 0,game_id,game_date,pitcher_name,pitch_description,start_speed,spin_rate,ivb,hb,z0,x0,pitcher_hand,is_whiff,is_swing,play_code,zone,ax,ay,az,vx0,vy0,vz0,vy_f,t,vz_f,vx_f,VAA,HAA
0,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,96.7,2488,16.4,7.7,5.238767,-1.292576,R,,,C,1,-9.311432,36.991917,-13.463718,3.684679,-140.620785,-3.753477,-127.199936,0.362805,-8.638179,0.306446,-3.885006,0.138035
1,778930,2025-03-08,Jacob deGrom,Slider,90.4,2707,4.1,-6.0,5.333908,-1.318032,R,,,B,14,4.841059,27.310885,-27.581958,4.716463,-131.667965,-3.864862,-121.172379,0.3843,-14.464622,6.576884,-6.807315,3.1068
2,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.4,2613,15.7,9.7,5.245148,-1.311988,R,True,True,S,11,-12.150851,39.224385,-14.41611,5.341128,-141.604586,-2.502514,-127.43844,0.361157,-7.708987,0.952767,-3.461709,0.428352
3,778930,2025-03-08,Jacob deGrom,Four-Seam Fastball,97.2,2619,16.8,6.4,5.327976,-1.228199,R,,,B,11,-8.183513,38.188363,-13.695679,4.204736,-141.341535,-0.883479,-127.54134,0.361372,-5.83271,1.247446,-2.618421,0.560376
4,778930,2025-03-08,Jacob deGrom,Slider,90.7,2695,4.6,-6.6,5.386846,-1.251604,R,,,C,4,6.132826,27.946679,-27.213983,1.329798,-132.117789,-3.487453,-121.406855,0.383263,-13.917571,3.680285,-6.5396,1.736312


In [62]:
haa_means = (
    pitcher_pyb.groupby("pitch_description", as_index=False)["HAA"].mean()
).round(1)

haa_means

Unnamed: 0,pitch_description,HAA
0,Curveball,2.8
1,Four-Seam Fastball,0.7
2,Slider,3.0


In [63]:
pitch_type_counts = (
    pitch_type_counts.merge(haa_means, on="pitch_description", how="left")
)
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS,CS+Whiffs,CSW%,Ball,Chase,Chase%,Strike,Zone%,VAA,HAA
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3,4,25.0,9,2,22.2,7,43.8,-3.7,0.7
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2,5,38.5,8,4,50.0,5,38.5,-6.9,3.0
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2,2,100.0,1,0,0.0,1,50.0,-7.2,2.8


## Extension

In [64]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
        "spin_rate",
        "ivb",
        "hb",
        "z0",
        "x0",
        "pitcher_hand",
        "is_whiff",
        "is_swing",
        "play_code",
        "zone",
        "ax",
        "ay",
        "az",
        "vx0",
        "vy0",
        "vz0",
        "extension"
    ]
]

In [65]:
pitch_avg_exten = (
    pitcher_pyb.groupby("pitch_description", as_index=False)["extension"].mean()
).round(1)

pitch_avg_exten.rename(columns={"extension": "Extension"}, inplace=True)

pitch_avg_exten

Unnamed: 0,pitch_description,Extension
0,Curveball,6.7
1,Four-Seam Fastball,6.6
2,Slider,6.5


In [66]:
pitch_type_counts = (
    pitch_type_counts.merge(pitch_avg_exten, on="pitch_description", how="left")
)
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS,CS+Whiffs,CSW%,Ball,Chase,Chase%,Strike,Zone%,VAA,HAA,Extension
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3,4,25.0,9,2,22.2,7,43.8,-3.7,0.7,6.6
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2,5,38.5,8,4,50.0,5,38.5,-6.9,3.0,6.5
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2,2,100.0,1,0,0.0,1,50.0,-7.2,2.8,6.7


## Max Exit Velocity

In [67]:
pitcher_pyb = df_pyb[
    [
        "game_id",
        "game_date",
        "pitcher_name",
        "pitch_description",
        "start_speed",
        "spin_rate",
        "ivb",
        "hb",
        "z0",
        "x0",
        "pitcher_hand",
        "is_whiff",
        "is_swing",
        "play_code",
        "zone",
        "ax",
        "ay",
        "az",
        "vx0",
        "vy0",
        "vz0",
        "extension",
        "launch_speed",
        "batter_name"
    ]
]

In [68]:
pitch_max_exitvelo = (
    pitcher_pyb.groupby(["pitch_description", "batter_name"], as_index=False)["launch_speed"].max()
)
pitch_max_exitvelo.rename(columns={"launch_speed": "Max Exit Velo"}, inplace=True)

pitch_max_exitvelo = pitch_max_exitvelo.dropna(subset=["Max Exit Velo"])

# Get the batter's name with the max exit velocity for each pitch type
pitch_max_exitvelo = pitch_max_exitvelo.loc[
    pitch_max_exitvelo.groupby("pitch_description")["Max Exit Velo"].idxmax()
]

pitch_max_exitvelo

Unnamed: 0,pitch_description,batter_name,Max Exit Velo
4,Four-Seam Fastball,Nick Pratto,92.9
13,Slider,Vinnie Pasquantino,92.9


In [69]:
pitch_type_counts = (
    pitch_type_counts.merge(pitch_max_exitvelo, on="pitch_description", how="left")
)
pitch_type_counts

Unnamed: 0,pitcher_name,pitch_description,PitchesThrown,Usage %,start_speed,spin_rate,ivb,hb,vRel,hRel,Whiffs,Swings,Whiff%,CS,CS+Whiffs,CSW%,Ball,Chase,Chase%,Strike,Zone%,VAA,HAA,Extension,batter_name,Max Exit Velo
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,5.2,-1.2,1,7,14.3,3,4,25.0,9,2,22.2,7,43.8,-3.7,0.7,6.6,Nick Pratto,92.9
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,5.3,-1.2,3,7,42.9,2,5,38.5,8,4,50.0,5,38.5,-6.9,3.0,6.5,Vinnie Pasquantino,92.9
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,5.5,-1.2,0,0,,2,2,100.0,1,0,0.0,1,50.0,-7.2,2.8,6.7,,


## Tidy up and rename columns for output

In [70]:
pitch_type_counts = pitch_type_counts[
    [
        "pitcher_name",
        "pitch_description",
        "PitchesThrown",
        "Usage %",
        "start_speed",
        "spin_rate",
        "ivb",
        "hb",
        "VAA",
        "HAA",
        "vRel",
        "hRel",
        "Swings",
        "Whiffs",
        "Whiff%",
        "CS",
        "CSW%",
        "Zone%",
        "Chase%",
        "Extension",
        "Max Exit Velo",
        "batter_name",
    ]
]

In [71]:
pitch_type_counts = pitch_type_counts.copy()

pitch_type_counts.rename(
    columns={
        "pitcher_name": "Pitcher",
        "pitch_description": "Pitch",
        "PitchesThrown": "Count",
        "start_speed": "Velocity",
        "spin_rate": "Spin Rate",
        "ivb": "iVB",
        "hb": "HB",
        "batter_name": "Batter",
    },
    inplace=True,
)

In [72]:
pitch_type_counts

Unnamed: 0,Pitcher,Pitch,Count,Usage %,Velocity,Spin Rate,iVB,HB,VAA,HAA,vRel,hRel,Swings,Whiffs,Whiff%,CS,CSW%,Zone%,Chase%,Extension,Max Exit Velo,Batter
0,Jacob deGrom,Four-Seam Fastball,16,51.613,97.0,2529.2,16.6,8.8,-3.7,0.7,5.2,-1.2,7,1,14.3,3,25.0,43.8,22.2,6.6,92.9,Nick Pratto
1,Jacob deGrom,Slider,13,41.935,90.3,2612.9,4.1,-5.3,-6.9,3.0,5.3,-1.2,7,3,42.9,2,38.5,38.5,50.0,6.5,92.9,Vinnie Pasquantino
2,Jacob deGrom,Curveball,2,6.452,82.9,2691.5,-1.5,-6.4,-7.2,2.8,5.5,-1.2,0,0,,2,100.0,50.0,0.0,6.7,,
