# Bridget Murphy: Framing Index

## Introduction
One of my favorite positions in baseball has always been catcher. My favorite player since I was very young has been Yadier Molina. It is well documented that there is not a great way to measure the value of catchers. Stats like wins above replacement (WAR) simply do not paint an accurate value for catchers compared to other players on a team.

To measure the value of a catcher we need to prioritize what is most important for the position. In my opinion one factor of a catcher's game that is under represented in sabermetrics is the catcher's ability to frame a pitch. Framing a pitch is when the catcher moves their glove to catch the pitch in a way which makes it look like a strike. This way a good catcher can get strike calls for his pitcher hat less experienced catchers may not. Similarly, a less experienced catcher may cost his pitcher strikes if he does not make a convincing catch of a pitch that is in the strike zone.

In order to quantify a catcher's ability to frame a pitch, I will be going through statcast data to count the number of strikes called that were outside of the zone and the amount of called balls in the strike zone. While we have no absolute proof that framing was the reason for the call, I think it is the best indicator we have in the data without knowing much about an umpire's strike zone.

I am going to use a month's worth of statcast data to test my statistic. I picked the period between the end of July and August 2018 for my analysis because it is in the late to middle part of the season, so players and umpires should have had time to settle into a rythm. I only used a month of data because statcast data tends to crash in my experience if more than a few months of data is used. I also wanted to limit the dataset because I had to use the playerID lookup function, which takes time to run.


In [19]:
import pandas as pd
import numpy as np

In [20]:
from pybaseball import statcast
data = statcast(start_dt='2018-07-25', end_dt='2018-08-25')
data.head(2)

This is a large query, it may take a moment to complete
Completed sub-query from 2018-07-25 to 2018-07-30
Completed sub-query from 2018-07-31 to 2018-08-05
Completed sub-query from 2018-08-06 to 2018-08-11
Completed sub-query from 2018-08-12 to 2018-08-17
Completed sub-query from 2018-08-18 to 2018-08-23
Completed sub-query from 2018-08-24 to 2018-08-25


Unnamed: 0,index,pitch_type,game_date,release_speed,release_pos_x,release_pos_z,player_name,batter,pitcher,events,...,home_score,away_score,bat_score,fld_score,post_away_score,post_home_score,post_bat_score,post_fld_score,if_fielding_alignment,of_fielding_alignment
0,560,FF,2018-08-25,89.3,2.6216,5.521,Jerry Blevins,594694.0,460283.0,field_out,...,3.0,0.0,0.0,3.0,0.0,3.0,0.0,3.0,Standard,Standard
1,576,CU,2018-08-25,74.1,2.8926,5.4154,Jerry Blevins,594694.0,460283.0,,...,3.0,0.0,0.0,3.0,0.0,3.0,0.0,3.0,Standard,Standard


## Calculation

In [21]:
#list(data)
data2 = data.copy()
avg = data.copy()
data = data[['description','zone','bb_type','pfx_x','pfx_z','fielder_2']]
data2 = data2[['description','zone','bb_type','pfx_x','pfx_z','fielder_2']]
avg = avg[['description','zone']]
data2.head(5)

Unnamed: 0,description,zone,bb_type,pfx_x,pfx_z,fielder_2
0,hit_into_play,9.0,fly_ball,0.6774,1.284,608700.0
1,called_strike,4.0,,-1.2201,-0.4813,608700.0
2,ball,13.0,,-1.336,-0.5348,608700.0
3,swinging_strike,4.0,,-1.2633,-0.5442,608700.0
4,ball,12.0,,-1.2505,-0.5129,608700.0


In [22]:
data = data.loc[(data['description']=='called_strike')]

data2 = data2.loc[(data2['description']=='ball')]
data2.head()

Unnamed: 0,description,zone,bb_type,pfx_x,pfx_z,fielder_2
2,ball,13.0,,-1.336,-0.5348,608700.0
4,ball,12.0,,-1.2505,-0.5129,608700.0
6,ball,12.0,,0.8913,1.3962,608700.0
8,ball,12.0,,0.8057,1.2577,608700.0
10,ball,14.0,,0.9947,-0.6263,608700.0


In [23]:
data = data.loc[(data['zone']== 11)|(data['zone']== 12)|(data['zone']== 13)|(data['zone']== 14)]
data.head(15)

Unnamed: 0,description,zone,bb_type,pfx_x,pfx_z,fielder_2
99,called_strike,14.0,,-0.7573,1.1488,608700.0
112,called_strike,12.0,,-0.7679,1.2971,608700.0
163,called_strike,12.0,,-0.7129,1.2785,608700.0
169,called_strike,12.0,,-0.643,1.4877,446308.0
221,called_strike,14.0,,-0.8716,1.3408,608700.0
238,called_strike,11.0,,1.2265,-1.1519,446308.0
265,called_strike,11.0,,0.5793,0.6764,460026.0
290,called_strike,11.0,,0.5853,1.3282,460026.0
294,called_strike,13.0,,-1.202,-0.3906,460026.0
326,called_strike,13.0,,-1.1085,0.9375,460026.0


In [24]:

data2 = data2.loc[(data2['zone']== 1)|(data2['zone']== 2)|(data2['zone']== 3)|(data2['zone']== 4)|(data2['zone']== 5)|(data2['zone']== 6)|(data2['zone']== 7)|(data2['zone']== 8)|(data2['zone']== 9)]
data2 = data2.loc[(data2['fielder_2']==519222)|(data2['fielder_2']==521692)|(data2['fielder_2']==592663)|(data2['fielder_2']==506702)|(data2['fielder_2']==518735)]
data2.head(15)

Unnamed: 0,description,zone,bb_type,pfx_x,pfx_z,fielder_2
1235,ball,7.0,,0.6629,1.3105,518735.0
1258,ball,1.0,,0.6484,1.4293,518735.0
2557,ball,1.0,,-0.0206,1.297,519222.0
2617,ball,3.0,,-0.0903,1.4134,519222.0
3030,ball,8.0,,0.7312,-1.22,521692.0
3219,ball,1.0,,-0.6133,1.3676,521692.0
4067,ball,9.0,,-1.3102,1.0197,506702.0
4154,ball,3.0,,-0.8941,1.7582,592663.0
4281,ball,2.0,,0.2323,1.6374,592663.0
6229,ball,3.0,,-1.5938,-1.091,518735.0


In [25]:
data2.groupby('fielder_2').count()


Unnamed: 0_level_0,description,zone,bb_type,pfx_x,pfx_z
fielder_2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
506702.0,37,37,0,37,37
518735.0,51,51,0,51,51
519222.0,49,49,0,49,49
521692.0,76,76,0,76,76
592663.0,70,70,0,70,70


In [26]:
data = data.loc[(data['fielder_2']==519222)|(data['fielder_2']==521692)|(data['fielder_2']==592663)|(data['fielder_2']==506702)|(data['fielder_2']==518735)]
data.groupby('fielder_2').count()

Unnamed: 0_level_0,description,zone,bb_type,pfx_x,pfx_z
fielder_2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
506702.0,89,89,0,89,89
518735.0,89,89,0,89,89
519222.0,113,113,0,113,113
521692.0,101,101,0,101,101
592663.0,108,108,0,108,108


In [27]:
avg = avg.loc[(avg['description']=='called_strike')]
avg_called = len(avg)
#avg.head()
print(avg_called)

21188


In [28]:
from pybaseball import playerid_reverse_lookup

# a list of mlbam ids
player_ids = [519222, 592663, 521692, 572287, 518735]
#player_ids = []
#for i in data:
#    ID = float(data['fielder_2'])
#    ID = round(ID)
#    player_ids.append(ID)
# find the names of the players in player_ids, along with their ids from other data sources
#name_data = playerid_reverse_lookup(player_ids, key_type='mlbam')

catcher = playerid_reverse_lookup(player_ids, key_type='mlbam')

Gathering player lookup table. This may take a moment.


In [29]:
catcher.head(5)

Unnamed: 0,name_last,name_first,key_mlbam,key_retro,key_bbref,key_fangraphs,mlb_played_first,mlb_played_last
0,grandal,yasmani,518735,grany001,grandya01,11368,2012.0,2019.0
1,perez,salvador,521692,peres002,perezsa02,7304,2011.0,2018.0
2,realmuto,j. t.,592663,realj001,realmjt01,11739,2014.0,2019.0
3,romine,austin,519222,romia002,rominau01,5491,2011.0,2019.0
4,zunino,mike,572287,zunim001,zuninmi01,13265,2013.0,2019.0


The following is an explanation for my statistic, the Framing Index:

[[(number of balls called strikes) - (number of strikes called balls)] / [(total called strikes)]]*100

In [30]:
balls = [37,51,49,76,70]
strikes = [89,89,113,101,108]
tot = 21188
indexs = []
#for i in strikes:
#    for j in balls:
#        index = ((i-j)/tot)*100
#    indexs.append(index)
#print(indexs)

a = (89-37)/tot *100
b = (89-51)/tot * 100
c = (113-49)/tot * 100
d = (101-76)/tot * 100
e = (108-70)/tot * 100
#print(a,b,c,d,e)
indexs.append(b)
indexs.append(d)
indexs.append(e)
indexs.append(c)
indexs.append(a)

In [31]:
catcher['Framing Index'] = indexs
catcher.head()

Unnamed: 0,name_last,name_first,key_mlbam,key_retro,key_bbref,key_fangraphs,mlb_played_first,mlb_played_last,Framing Index
0,grandal,yasmani,518735,grany001,grandya01,11368,2012.0,2019.0,0.179347
1,perez,salvador,521692,peres002,perezsa02,7304,2011.0,2018.0,0.117991
2,realmuto,j. t.,592663,realj001,realmjt01,11739,2014.0,2019.0,0.179347
3,romine,austin,519222,romia002,rominau01,5491,2011.0,2019.0,0.302058
4,zunino,mike,572287,zunim001,zuninmi01,13265,2013.0,2019.0,0.245422


## Conclusion
After running the statcast data I came up with the framing index for 5 different catchers who had the most balls called strikes for the period between the end of July and August 2018. The catchers were as follows:<p>
•	Yasmani Grandal, Los Angeles Dodgers 0.179<br>
•	Salvador Perez, Kansas City Royals - 0.118<br>
•	J.T. Realmuto, Miami Marlins - 0.179<br>
•	Austin Romine, New York Yankees - 0.302<br>
•	Mike Zunino, Tampa Bay Rays - 0.245<br>

Austin Romine leads the group with a framing index of 0.302, the Yankees catcher had a notably impressive 2018 campaign as he fought to retain his position on the starting roster according to Yankee's blogger pinstripe alley.

Yasmani Grandal and J.T. Realmuto are two names you would expect to find on this list. The two are frequently found on lists of the best catchers in baseball for their ability to "steal strikes". I think if I was able to run a full season of statcast data, these two names would remain on the list.

There are limitations to the Framing Index however. Framing Index on its own is not an all-encompassing metric to judge catchers by. Things like blocked pitches, throw outs and some offensive stats would also be taken into account. This is illustrated by Yasmani Grandal's now infamous meltdown during the Dodger's 2018 postseason campaign, where he allowed a slew of past balls that directly resulted in runs for the other team.

There is also a lot of debate as to how much framing a catcher can actually get away with in today's mlb. Umpires are expected to make the right call and are very aware of catcher's trying to frame the ball. This makes it difficult for catchers to influence the call the umpire makes. Still, the fact we see a couple of the best catchers in baseball show up on this list could mean that pitch framing still plays a part in today's mlb.

One thing that could improve the accuracy of this statistic is if we had some sort of metric about each umpire's strike zone. Each home plate umpire has a slightly different size and shape to their zone, and it would be helpful to normalize this data before running the framing index calculation. This would serve the same purpose as a park factor in other calculations.

Overall, I think framing index is a good metric to use in combination with other statistics to measure the effectiveness of a catcher. This statistic relies on the human factor to umpire's calls and would be rendered useless should they ever implement an electronic strike zone or something like that. However, for the time being, it remains a good indicator of a catcher's ability to frame a pitch.
