# 2023 MLB Pitch Data - Month by Month Queries using Pybaseball

In this project, we will use publicly available data from Baseball Savant ([baseballsavant.mlb.com](https://baseballsavant.mlb.com)), which will be accessed using the `pybaseball` package ([pybaseball GitHub](https://github.com/jldbc/pybaseball/blob/master/docs/statcast.md)). 

For simplicity, we will access and save this data locally. This data was accessed and saved on 21 May 2024.

## Installing Pybaseball

It is only neccessary to install `pybaseball` once, hence we leave the following code commented out.

In [1]:
# pip install pybaseball

## Importing Necessary Packages

Once `pybaseball` is installed, we import the necessary packages

In [2]:
import pandas as pd
import pybaseball as pyb
from pybaseball import statcast

## Getting Statcast Data

The 2023 MLB regular season ran from March 30 to October 1. Given the amount of data being requested, we break up the season into 6 chunks (each approximately 1 month long) and save the each respective data frame as a CSV.

Calling `statcast(date1, date2)` returns all pitch-level data from Statcast between `date1` and `date2`. Setting `verbose=True`provides progress as data is pulled.

## Month1 - March 30, 2023 through April 30, 2023

In [3]:
month1 = statcast('2023-03-30', '2023-04-30', verbose=True)

month1.to_csv("month1.csv")

This is a large query, it may take a moment to complete


100%|██████████| 32/32 [00:14<00:00,  2.24it/s]


## Month2 - May 1, 2023 through May 31, 2023

In [4]:
month2 = statcast('2023-05-01', '2023-05-31', verbose=True)

month2.to_csv("month2.csv")

This is a large query, it may take a moment to complete


100%|██████████| 31/31 [00:21<00:00,  1.44it/s]


## Month3 - June 1, 2023 through June 30, 2023

In [5]:
month3 = statcast('2023-06-01', '2023-06-30', verbose=True)

month3.to_csv("month3.csv")

This is a large query, it may take a moment to complete


100%|██████████| 30/30 [00:20<00:00,  1.48it/s]


## Month4 - July 1, 2023 through July 31, 2023

In [6]:
month4 = statcast('2023-07-01', '2023-07-31', verbose=True)

month4.to_csv("month4.csv")

This is a large query, it may take a moment to complete


100%|██████████| 31/31 [00:20<00:00,  1.53it/s]


## Month5 - August 1, 2023 through August 31, 2023

In [7]:
month5 = statcast('2023-08-01', '2023-08-31', verbose=True)

month5.to_csv("month5.csv")

This is a large query, it may take a moment to complete


100%|██████████| 31/31 [00:21<00:00,  1.45it/s]


## Month6 - September 1, 2023 through Oct 1, 2023

In [8]:
month6 = statcast('2023-09-01', '2023-10-01', verbose=True)

month6.to_csv("month6.csv")

This is a large query, it may take a moment to complete


100%|██████████| 31/31 [00:23<00:00,  1.34it/s]
