![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Baseball - Challenges

Now that you've gone through the introduction notebook and learned how to navigate Jupyter Notebooks, Python, and some useful libraries in pandas and Plotly, we can get a bit more creative with our questions. This notebook will expand on what you've learned and allow you to modify the code as you need. Don't be afraid to refer back to the previous notebook if you have any questions.

## Prep work

In [None]:
# Import/install libraries
import pandas as pd
import plotly.express as px
try:
    import pybaseball as pbb
except:
    !pip install pybaseball --user
    import pybaseball as pbb

In [None]:
# Import data
pitch_data = pd.read_csv('data/ale_pitch_data_june.csv')
pitch_data.head()


## Grouping

Returning to our original dataset, let's do some statistics on the data we have.

Below is a pandas function that will allow us to group by a column, before calculating mean pitch speed.

In [None]:
pitcher_grp_mean = pitch_data.groupby(by='player_name')['release_speed'].mean() # Returning only the `release_speed` column
pitcher_grp_mean

We can repeat this looking at `max` as well:

In [None]:
pitcher_grp_max = pitch_data.groupby(by='player_name')['release_speed'].max()
pitcher_grp_max

## Challenges:

See if you can use the methods here, and what you've learned in the previous notebook, to tackle these challenges ([hint](https://www.geeksforgeeks.org/pandas-groupby-one-column-and-get-mean-min-and-max-values/)):
1. Which pitcher throws, on average, the fastest?
1. Which pitcher threw the hardest pitch in the dataset?
1. What is the highest average velocity for each pitch?  

We can also return multiple columns when grouping:

In [None]:
pitch_data.groupby(by=['player_name', 'pitch_name'])[['release_speed', 'release_spin_rate']].mean()

## Batting

Up until now we've only looked at data that focuses on the pitches, which is (unsurprisingly) mostly related to the pitcher. Though we can also access data explicitly on the hitters (see the end of this notebook for details on how), there's some hitter data available in what we already have. But first, we have to do some data cleaning.

In our original dataset, the column `batter` contains a number that uniquely corresponds to each batter. That's helpful in keeping them apart, but not very helpful in identifying *who* each batter is. For that, we use the below code from the `pybaseball` library.

First, we're going to take the entire `batter` column, pass it to the `playerid_reverse_lookup` function, and extract just the names that are returned:

In [None]:
batter_names = pbb.playerid_reverse_lookup(pitch_data['batter'])[['name_last', 'name_first']]
batter_names

For consistency, we can take these two columns, merge them into one, and format the names so they match the style of the pitcher names ('Lastname, Firstname').

Let's capitalize the names in each column individually:

In [None]:
batter_names['name_last'] = batter_names['name_last'].str.title()
batter_names['name_first'] = batter_names['name_first'].str.title()
batter_names

Now we can join ('con**cat**enate') the two names with a comma (and space), and create a new, single column:

In [None]:
batter_names_comb = batter_names['name_last'].str.cat(batter_names['name_first'], sep=', ')
batter_names_comb

Because the function to retrieve player names from IDs ignores duplicates (but retains order), we need to do the same with our IDs:

In [None]:
ids = pitch_data['batter'].drop_duplicates().to_list()
ids[:5] # Only showing the first five entries

Then we can combine them to create a mapping function:

In [None]:
mapper = {ids[i]: batter_names_comb[i] for i in range(len(ids))} # Create a dictionary with key:value pairs of IDs and player names

Finally, use this mapping dictionary to overwrite the IDs for batter with the names. The `map` function will look for values in the **key** of each dictionary entry, and replace it with its corresponding **value**:

In [None]:
pitch_data['batter'] = pitch_data['batter'].map(mapper)
pitch_data

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)