![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Baseball Analytics

Welcome to another Jupyter notebook on baseball analytics. This notebook is a free resource and is part of the Callysto project, which brings data science skills to grades 5 to 12 classrooms. 

In this notebook, weâ€™ll start by looking at some baseball statistical data, specifically on batters and where they hit the ball out into the field. 

In real major league baseball, computing statistic are key to understanding how players are valued by their team. The money Moneyball, with Brad Pitt and Jonah Hill is all about baseball analytics. 

Visualizations are coded using Python, a computer programming language. Python contains words from English and is used by data scientists. Programming languages are how people communicate with computers. Our graphics are done in Plotly, which makes it easy to create line charts, scatter plots and mores. This is all great for understanding the baseball statistics 

# Spray Charts

Knowing the hitting tendencies of a batter can be incredibly helpful when establishing defensive alignment. In today's data-driven baseball world, shifting the fielders around to increase the chances of making an out is commonplace, and becomes more popular each season. With [MLB set to ban shifting](https://www.si.com/mlb/2022/04/26/shifts-increasing-the-opener) in 2023, it remains to be seen how teams will adapt, but we can explore the data behind the strategy.

![](https://ftw.usatoday.com/wp-content/uploads/sites/90/2022/03/Screen-Shot-2022-03-07-at-10.26.27-AM.png?w=1000&h=600&crop=1)

Spray charts are a common way of showing where a ball that's been batted into play lands.

Due to the assymetric nature of the game (i.e., there's always a force play at first base), left-handed power hitters are shifted on more than right-handed hitters, so to illustrate the point we'll single out a notorious lefty pull hitter in Joey Gallo.

Run the following code cells to complete this notebook.

In [None]:
!pip install pybaseball

In [None]:
from pybaseball import playerid_lookup, statcast_batter
from scipy.interpolate import CubicSpline
import plotly.graph_objects as go
import numpy as np

In [None]:
playerID = playerid_lookup('Gallo', 'Joey')
playerID

In [None]:
## use the polayer id 608336 from the table above, for player Joey Gallo.
data = statcast_batter('2021-04-01', '2021-10-03', player_id = 608336)
data

In [None]:
# Statcast data for all barreled balls hit into play

data['barreled']=(data['launch_angle'] <= 50)&(data['launch_speed'] >= 97)& \
    (data['launch_speed']*1.5 - data['launch_angle'] >= 117) & (data['launch_speed'] + data['launch_angle'] >= 123)

data=data[data['barreled']&(data['description']=='hit_into_play')]


In [None]:
# Manipulate the data to align coordinate systems 
# Conversion constants from https://jaysfromthecouch.com/2018/12/31/using-statcast-data-to-estimate-minor-league-home-run-distance/

data.loc[:,'location_x']=2.29*(data['hc_x']-126)
data.loc[:,'location_y']=2.29*(204-data['hc_y'])

In [None]:
## Finally we plot the data points

fig = go.Figure()

# first we draw in the baseball diamond
dmd = 90/1.414  # baseball diamond dimentions (90x90 feet)
fig.add_trace(
    go.Scatter(
        x=[0,-dmd,0,dmd,0],
        y=[0, dmd,2*dmd,dmd,0],
        mode='lines',
        name='Diamond'
    ))

fig.add_trace(
    go.Scatter(
        x=data['location_x'],
        y=data['location_y'],
        mode='markers',
        name='Ball position'
    ))
fig.update_layout(
    title = "Joey Gallo, 2021 - Ball Landing Position",
    height = 600
)
fig.update_xaxes(
    constrain = "domain",
    title="Cross-field position (feet)"
)
fig.update_yaxes(
    range = [-50,450],
    scaleanchor = "x",
    scaleratio = 1,
    title="Down-field position (feet)"
)

fig.show()

## Observation

As expected, the barreled balls landed mostly on the right hand side of the field, which is what we would expect for a left-handed batter. 

If the defending team knew this in advance, they would want to move their fielders to the right side as that is where they expect the ball to land. 

## Spray charts

Just for fun, let's plot the trajectory of these balls in three dimensions. This gives us a spray chart.

The code below looks at each row in the data frame. It takes the endpoint of the ball's trajectory, and interpolates with a parabola connecting it to home plate. This uses a CubicSpline, which is a convenient way of interpolating points into a nice curve. 

We add a few update_layout commands to make the result look pretty.

In [None]:
fig = go.Figure()

dmd = 90/1.414  # baseball diamond dimentions (90x90 feet)

fig.add_trace(
    go.Scatter3d(
        x=[0,-dmd,0,dmd,0],
        y=[0, dmd,2*dmd,dmd,0],
        z=[0,0,0,0,0],
        mode='lines',
        name='Diamond'
    )
)


for index, row in data.iterrows():
    if not (np.isnan(row['location_x'])):
        xmax = row['location_x']
        ymax = row['location_y']
        rmax = np.sqrt(xmax**2 + ymax**2)
        spl = CubicSpline([0,rmax/2,rmax], [0,rmax/8,0])
        x = np.linspace(0,xmax,20)
        y = np.linspace(0,ymax,20)
        r = np.linspace(0,rmax,20)
        z = spl(r)
        fig.add_trace(
            go.Scatter3d(
                x=x,
                y=y,
                z=2*z,
                mode='lines',
                name='Ball position'
            )
        )
fig.update_layout(
    title = "Joey Gallo, 2021 - Spray chart",
    showlegend=False,
    height = 1000
)
fig.update_layout(
    scene=dict(
        xaxis=dict(showticklabels=False,title="Cross field"),
        yaxis=dict(showticklabels=False,title="Down field"),
        zaxis=dict(showticklabels=False,title="Vertical"),
    )
)

camera = dict(
    eye=dict(x=0, y=-1.5, z=.75)
)
fig.update_layout(scene_camera=camera)

fig.update_layout(
        scene = dict(
            aspectmode='manual',
            aspectratio=dict(x=1, y=1, z=.25)
        )
    )
 
fig.show() 

## Observations

The three dimensional chart looks very appealing. You can also move it around with your mouse.  Does this 3D chart contain more information than the one before? Does it it help you see, or understand more information? Which one do you feel is more useful, the 2D plot of ball positions, or the 3D plot of the trajectories?

## Going further

Can you repeat this analysis for a different player? In a different season? Who are your favourite players and how do they perform?

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)