# Finding the Will Sennet - Timothee Chalamet At Bat

On 10/10/21 Will Sennet (@Senn_Spud) tweeted the following

> I would strike timothee chalamet out on 3 pitches. Slider outside. He feels like hes drowning. Curveball in the dirt of course he's chasing it. Ladybird sucked kill yourself. And Bang 97 inside corner. Youre out. You were was always out. Youve been out since the day you were born

This code is toy project to find if that At Bat has ever occured.

In [1]:
import pandas as pd
import glob
import time
from pybaseball import statcast

## Gather Data

Start by downloading the data via pybaseball StatCast scraper.

This takes quite a while so you should save the datasets. Skip this section if you have already downloaded the data

In [None]:
seasons = range()
#all_data = {}

for season in seasons:
    try:
        print(f"Fetching data for {season}...")
        start_date = f"{season}-01-01"
        end_date = f"{season}-12-31"

        data = statcast(start_dt=start_date, end_dt=end_date)

        # Store in dictionary
        all_data[season] = data

        # Save to CSV (optional)
        data.to_csv(f"statcast_{season}.csv", index=False)

        # Pause to be nice to the server
        time.sleep(2)
    
    except Exception as e:
        print(f"Failed to get data for {season}: {e}")

In [4]:
csv_files = glob.glob("statcast_*.csv")

In [None]:
full_df = pd.concat((pd.read_csv(f) for f in csv_files), ignore_index = True)

In [None]:
full_df.to_parquet("statcast_full.parquet", index=False)

## Import Data

In [2]:
full_df = pd.read_parquet("statcast_full.parquet")

## Explore Data

### Get Dataset Details

In [3]:
list(full_df.columns)

['pitch_type',
 'game_date',
 'release_speed',
 'release_pos_x',
 'release_pos_z',
 'player_name',
 'batter',
 'pitcher',
 'events',
 'description',
 'spin_dir',
 'spin_rate_deprecated',
 'break_angle_deprecated',
 'break_length_deprecated',
 'zone',
 'des',
 'game_type',
 'stand',
 'p_throws',
 'home_team',
 'away_team',
 'type',
 'hit_location',
 'bb_type',
 'balls',
 'strikes',
 'game_year',
 'pfx_x',
 'pfx_z',
 'plate_x',
 'plate_z',
 'on_3b',
 'on_2b',
 'on_1b',
 'outs_when_up',
 'inning',
 'inning_topbot',
 'hc_x',
 'hc_y',
 'tfs_deprecated',
 'tfs_zulu_deprecated',
 'umpire',
 'sv_id',
 'vx0',
 'vy0',
 'vz0',
 'ax',
 'ay',
 'az',
 'sz_top',
 'sz_bot',
 'hit_distance_sc',
 'launch_speed',
 'launch_angle',
 'effective_speed',
 'release_spin_rate',
 'release_extension',
 'game_pk',
 'fielder_2',
 'fielder_3',
 'fielder_4',
 'fielder_5',
 'fielder_6',
 'fielder_7',
 'fielder_8',
 'fielder_9',
 'release_pos_y',
 'estimated_ba_using_speedangle',
 'estimated_woba_using_speedangle',
 'w

In [4]:
full_df.head()

Unnamed: 0,pitch_type,game_date,release_speed,release_pos_x,release_pos_z,player_name,batter,pitcher,events,description,...,n_thruorder_pitcher,n_priorpa_thisgame_player_at_bat,pitcher_days_since_prev_game,batter_days_since_prev_game,pitcher_days_until_next_game,batter_days_until_next_game,api_break_z_with_gravity,api_break_x_arm,api_break_x_batter_in,arm_angle
0,SL,2008-10-27,85.0,-1.24,6.85,"Lidge, Brad",400134,400058,strikeout,swinging_strike,...,1,0,5.0,1.0,,,3.46,-0.49,0.49,
1,SL,2008-10-27,85.2,-1.18,6.82,"Lidge, Brad",400134,400058,,swinging_strike,...,1,0,5.0,1.0,,,3.19,-0.04,0.04,
2,SL,2008-10-27,85.4,-1.45,6.99,"Lidge, Brad",400134,400058,,foul,...,1,0,5.0,1.0,,,2.91,0.21,-0.21,
3,SL,2008-10-27,84.9,-1.36,7.03,"Lidge, Brad",450314,400058,field_out,hit_into_play,...,1,0,5.0,1.0,,,2.91,0.25,-0.25,
4,SL,2008-10-27,82.9,-1.17,7.04,"Lidge, Brad",450314,400058,,called_strike,...,1,0,5.0,1.0,,,3.42,-0.08,0.08,


In [5]:
full_df.pitch_type.value_counts()

pitch_type
FF    4102583
SI    2393636
SL    1822820
CH    1239179
CU     923352
FC     737634
KC     235380
FS     200294
ST     158699
KN      37228
IN      32098
SV      19333
FA       5535
FO       4621
PO       4278
EP       4169
SC       1142
CS        709
AB         30
Name: count, dtype: int64

## Start Filtering

### Create At Bat Key

In [6]:
full_df['at_bat'] = full_df.groupby(['game_date', 'inning', 'batter', 'pitcher']).ngroup()
max(full_df.at_bat)

3161421

### Lets Generate MLB Video Searches for each of the At Bats

In [8]:
full_df['at_bat_url'] = (
    "https://www.mlb.com/video/?q=Date+%3D+%5B%22" + full_df['game_date'].astype('str') + 
    "%22%5D+AND+Inning+%3D+%5B" + full_df['inning'].astype('str') +
    "%5D+AND+PlayerId+%3D%3D+%5B" + full_df['pitcher'].astype('str') + "%2C" + full_df['batter'].astype('str') +
    "%5D+Order+By+Timestamp+ASC&of=1"
)

### Finding a Match True Exactly to the Tweet

1. Slider Outside 
2. Curveball, below zone, swinging strike
3. Fastball, 97mph, inside corner. called strike

In [9]:
three_pitch_strikeouts =  full_df.at_bat[(full_df.events == 'strikeout') & (full_df.pitch_number == 3)]
len(three_pitch_strikeouts)


125264

In [10]:
first_pitch_sliders = full_df.at_bat[(full_df.pitch_number == 1) & (full_df.pitch_type == "SL") & (full_df.zone.isin([11, 12, 13, 14])) & (full_df.at_bat.isin(three_pitch_strikeouts))]
len(first_pitch_sliders)

4091

In [11]:
second_pitch_curveball = full_df.at_bat[(full_df.pitch_number == 2) & (full_df.pitch_type == "CU") & (full_df.zone.isin([13,14])) & (full_df.description == 'swinging_strike')  & (full_df.at_bat.isin(first_pitch_sliders))]
len(second_pitch_curveball)

25

In [12]:
third_pitch_fastball = full_df.at_bat[(full_df.pitch_number == 3) & (full_df.pitch_type == "FF") & ((full_df.release_speed >= 96) | (full_df.effective_speed >= 96)) & ((full_df.zone.isin([1, 7]) & full_df.stand == 'L') | (full_df.zone.isin([3, 9]) & full_df.stand == 'R')) &  (full_df.description == 'called_strike') & (full_df.at_bat.isin(second_pitch_curveball))]
len(third_pitch_fastball)

0

### Looking for Looser Matches

#### Keep Pitch Sequence and Outcomes, drop locations

1. Slider
2. Curveball, swinging strike
3. Fastball, 97 mph, called strike

In [13]:
first_pitch_sliders = full_df.at_bat[(full_df.pitch_number == 1) & (full_df.pitch_type == "SL") & (full_df.at_bat.isin(three_pitch_strikeouts))]
len(first_pitch_sliders)

17286

In [14]:
second_pitch_curveball = full_df.at_bat[(full_df.pitch_number == 2) & (full_df.pitch_type == "CU") & (full_df.description == 'swinging_strike')  & (full_df.at_bat.isin(first_pitch_sliders))]
len(second_pitch_curveball)

222

In [15]:
third_pitch_fastball = full_df.at_bat[(full_df.pitch_number == 3) & (full_df.pitch_type == "FF") & ((full_df.release_speed >= 96) | (full_df.effective_speed >= 96)) & (full_df.description == 'called_strike') & (full_df.at_bat.isin(second_pitch_curveball))]
len(third_pitch_fastball)

4

Looks like there are four matching results, lets look more closely

In [16]:
full_df[full_df.at_bat.isin(third_pitch_fastball)]

Unnamed: 0,pitch_type,game_date,release_speed,release_pos_x,release_pos_z,player_name,batter,pitcher,events,description,...,pitcher_days_since_prev_game,batter_days_since_prev_game,pitcher_days_until_next_game,batter_days_until_next_game,api_break_z_with_gravity,api_break_x_arm,api_break_x_batter_in,arm_angle,at_bat,at_bat_url
10283174,FF,2022-06-04,96.8,-2.59,5.3,"Oviedo, Johan",621550,670912,strikeout,called_strike,...,,1.0,4.0,0.0,1.5,0.63,0.63,26.6,2635342,https://www.mlb.com/video/?q=Date+%3D+%5B%2220...
10283175,CU,2022-06-04,77.0,-2.53,5.65,"Oviedo, Johan",621550,670912,,swinging_strike,...,,1.0,4.0,0.0,4.15,-0.97,-0.97,33.1,2635342,https://www.mlb.com/video/?q=Date+%3D+%5B%2220...
10283176,SL,2022-06-04,85.5,-2.66,5.54,"Oviedo, Johan",621550,670912,,called_strike,...,,1.0,4.0,0.0,2.49,-0.12,-0.12,27.5,2635342,https://www.mlb.com/video/?q=Date+%3D+%5B%2220...
11079378,FF,2023-05-24,98.0,-2.64,5.55,"Oviedo, Johan",669701,670912,strikeout,called_strike,...,5.0,2.0,6.0,2.0,1.58,0.84,-0.84,29.9,2832230,https://www.mlb.com/video/?q=Date+%3D+%5B%2220...
11079379,CU,2023-05-24,81.8,-2.48,5.87,"Oviedo, Johan",669701,670912,,swinging_strike,...,5.0,2.0,6.0,2.0,3.86,-0.85,0.85,35.3,2832230,https://www.mlb.com/video/?q=Date+%3D+%5B%2220...
11079380,SL,2023-05-24,87.9,-2.68,5.73,"Oviedo, Johan",669701,670912,,called_strike,...,5.0,2.0,6.0,2.0,3.05,-0.54,0.54,31.8,2832230,https://www.mlb.com/video/?q=Date+%3D+%5B%2220...
11768018,FF,2024-06-12,99.1,-1.25,6.13,"Nicolas, Kyle",680977,693312,strikeout,called_strike,...,3.0,1.0,3.0,1.0,0.89,0.82,-0.82,42.8,3052081,https://www.mlb.com/video/?q=Date+%3D+%5B%2220...
11768019,CU,2024-06-12,86.7,-1.38,5.99,"Nicolas, Kyle",680977,693312,,swinging_strike,...,3.0,1.0,3.0,1.0,4.03,-0.72,0.72,44.5,3052081,https://www.mlb.com/video/?q=Date+%3D+%5B%2220...
11768020,SL,2024-06-12,89.8,-1.42,6.02,"Nicolas, Kyle",680977,693312,,called_strike,...,3.0,1.0,3.0,1.0,2.77,-0.45,0.45,41.2,3052081,https://www.mlb.com/video/?q=Date+%3D+%5B%2220...
12034632,FF,2024-04-03,97.9,-1.27,6.06,"Glasnow, Tyler",642731,607192,strikeout,called_strike,...,6.0,1.0,6.0,2.0,1.06,0.09,0.09,62.7,2982098,https://www.mlb.com/video/?q=Date+%3D+%5B%2220...


In [19]:
list(full_df.at_bat_url[full_df.at_bat.isin(third_pitch_fastball)].drop_duplicates())

['https://www.mlb.com/video/?q=Date+%3D+%5B%222022-06-04%22%5D+AND+Inning+%3D+%5B3%5D+AND+PlayerId+%3D%3D+%5B670912%2C621550%5D+Order+By+Timestamp+ASC&of=1',
 'https://www.mlb.com/video/?q=Date+%3D+%5B%222023-05-24%22%5D+AND+Inning+%3D+%5B4%5D+AND+PlayerId+%3D%3D+%5B670912%2C669701%5D+Order+By+Timestamp+ASC&of=1',
 'https://www.mlb.com/video/?q=Date+%3D+%5B%222024-06-12%22%5D+AND+Inning+%3D+%5B8%5D+AND+PlayerId+%3D%3D+%5B693312%2C680977%5D+Order+By+Timestamp+ASC&of=1',
 'https://www.mlb.com/video/?q=Date+%3D+%5B%222024-04-03%22%5D+AND+Inning+%3D+%5B2%5D+AND+PlayerId+%3D%3D+%5B607192%2C642731%5D+Order+By+Timestamp+ASC&of=1']

None are perfect matches. There is still the opportunity to be the first to pitch a Timothee Chalamet Strikeout!