### Pitching Analysis

Import the necessary statements and set the dataframes to show all the columns.

In [1]:
import pandas as pd
import pickle
pd.set_option('display.max_columns', None)

Uncomment this line to get the most recent statistics from the DakStats website.

In [2]:
#%run ./Conference_Statistics.ipynb

We import our dataframes for batting, pitching, and fielding statistics as well as our list of teams from the pickle file titled `Stats.pkl`.

In [3]:
with open('Stats.pkl', 'rb') as f:
    dfb = pickle.load(f)
    dfp = pickle.load(f)
    dff = pickle.load(f)
    teams = pickle.load(f)

To gather our totals for each team we take only the total row for each team and rename the dataframe `pitch_totals`. We also delete the `GS` column as the number of games started is not applicable for team totals, only individuals.

In [4]:
pitch_totals = [df[df.Pitching.str.contains("Total:", regex = False)] for df in dfp]
for df in pitch_totals:
    del df["GS"]
pitch_totals[2]

Unnamed: 0,Pitching,ERA,W,L,GP,CG,SHO,CBO,SV,IP,H,R,ER,BB,SO,2B,3B,HR,TBF,B_AVG,WP,HBP,BK,SFA,SHA
15,Total:,6.08,9,23,32,3,0,0,2,245.2,304,220,166,102,197,52,8,22,1152,0.303,26,25,7,18,4


We combine these total rows into a single dataframe named `merged_pitch_totals`.

In [5]:
merged_pitch_totals = pd.concat(pitch_totals)
merged_pitch_totals[:4]

Unnamed: 0,Pitching,ERA,W,L,GP,CG,SHO,CBO,SV,IP,H,R,ER,BB,SO,2B,3B,HR,TBF,B_AVG,WP,HBP,BK,SFA,SHA
15,Total:,5.32,12,24,36,11,2,3,3,277.2,318,225,164,104,201,60,2,28,1289,0.285,29,36,0,18,17
17,Total:,6.68,2,26,28,3,0,0,2,207.1,258,224,154,117,177,55,9,20,1067,0.3,28,56,4,16,18
15,Total:,6.08,9,23,32,3,0,0,2,245.2,304,220,166,102,197,52,8,22,1152,0.303,26,25,7,18,4
13,Total:,5.18,23,13,36,8,1,0,7,281.2,311,178,162,111,213,45,5,29,1284,0.277,20,27,4,8,12


For those familiar with baseball, the interpretation of the `IP` column displaying the number of innings pitched by each player is fairly straightforward. However, the decimals are not mathematically equivalent to their numerical value. Because each inning consists of three outs, if a pitcher gets only one out in the inning, their `IP` total would display `0.1` even though mathematically, they threw 1/3 or 0.333... of an inning. To do our calculations we must fix this before summing the total innings pitched. First, we gather the innings pitched in an individual list named `ips`.

In [6]:
ips = merged_pitch_totals["IP"].to_list()

We then run a test for each number in our list. We multiply the number of innings by 10 to eliminate the decimal and then compare the modulus 10 of each number. If that number mod 10 is equal to 0, we know the innings pitched is a whole number and we do nothing. If the number mod 10 is equal to 1, we know that one extra out was earned and we correct the number to 0.333 rather than 0.1 by adding 0.233. Similarly, if the number mod 10 is equal to 2, we must add 0.467 to acheive a decimal of 0.667. Once we have changed each our innings, we copy the dataframe again to avoid any errors and reset the `IP` column to the values in the list `ips`.

In [7]:
for i in range(len(ips)):
    if ((ips[i]*10) % 10) == 0:
        pass
    elif ((ips[i]*10) % 10) == 1:
        ips[i] = ips[i] + 0.233
    elif ((ips[i]*10) % 10) == 2:
        ips[i] = ips[i] + 0.467
fixed_ip_totals = merged_pitch_totals.copy()
fixed_ip_totals["IP"] = ips
fixed_ip_totals[:4]

Unnamed: 0,Pitching,ERA,W,L,GP,CG,SHO,CBO,SV,IP,H,R,ER,BB,SO,2B,3B,HR,TBF,B_AVG,WP,HBP,BK,SFA,SHA
15,Total:,5.32,12,24,36,11,2,3,3,277.667,318,225,164,104,201,60,2,28,1289,0.285,29,36,0,18,17
17,Total:,6.68,2,26,28,3,0,0,2,207.333,258,224,154,117,177,55,9,20,1067,0.3,28,56,4,16,18
15,Total:,6.08,9,23,32,3,0,0,2,245.667,304,220,166,102,197,52,8,22,1152,0.303,26,25,7,18,4
13,Total:,5.18,23,13,36,8,1,0,7,281.667,311,178,162,111,213,45,5,29,1284,0.277,20,27,4,8,12


We add a total row at the bottom with the code `.sum()` and we delete the `Pitching` column as it does not hold any meaningful information. Finally, we subset only the last column that contains the final totals with the code `.iloc[-1:]`.

In [8]:
fixed_ip_totals.loc["CL_Total"] = fixed_ip_totals.sum()
del fixed_ip_totals["Pitching"]
CL_pitch_totals = fixed_ip_totals.iloc[-1,:]

Since some of the statistics are averages, we have to go back and calculate those by hand because the summation incorrectly calculates those metrics. A few of the conference totals are printed.

In [9]:
CL_tot_p = CL_pitch_totals.copy()
#fix opponent batting average
CL_tot_p["B_AVG"] = round(CL_tot_p["H"] / (CL_tot_p["TBF"] - CL_tot_p["BB"] - CL_tot_p["HBP"] - CL_tot_p["SFA"] - CL_tot_p["SHA"]), 3)
#fix ERA
CL_tot_p["ERA"] = round((CL_tot_p["ER"]*27) / round(CL_tot_p["IP"]*3), 3)

In [10]:
CL_tot_p[:5]

ERA      5.278
W      166.000
L      167.000
GP     336.000
CG      51.000
Name: CL_Total, dtype: float64

Next, in order to rank the Crossroads League pitchers, we will calculate Fielding Independent Pitching (FIP), a statistic that measures a pitcher's effectiveness regardless of defense or inherent luck in hitting. Our first step is to calculate the league constant, which is shown below.

In [11]:
CL_FIP_c = CL_tot_p["ERA"] - (((13*CL_tot_p["HR"] + 3*(CL_tot_p["BB"] + CL_tot_p["HBP"]) - 2*CL_tot_p["SO"])*3) / round(CL_tot_p["IP"]*3))
CL_FIP_c

4.161279535183989

Just like we did with the totals dataframe, we must correct for the notation used in the `IP` column for each team's individual statistical table.

In [12]:
for df in dfp:
    ips = df["IP"].to_list()
    for i in range(len(df.index)):
        if ((ips[i]*10) % 10) == 0:
            pass
        elif ((ips[i]*10) % 10) == 1:
            ips[i] = ips[i] + 0.233
        elif ((ips[i]*10) % 10) == 2:
            ips[i] = ips[i] + 0.467
    df["IP"] = ips

Now that we have workable data, we can calculate our individual FIP statistics for each pitcher. We also add a column for each player's team to improve readability in our final rankings. After creating the column, we move it up to be the first statistical column displayed.

In [13]:
for i in range(len(teams)): #add column for team
    dfp[i]["Team"] = teams[i]
for df in dfp:
    df["FIP"] = round(((13*df["HR"] + 3*(df["BB"] + df["HBP"]) - 2*df["SO"])*3) / round(df["IP"]*3) + CL_FIP_c, 3)
    df["WHIP+"] = round((df["BB"]+df["HBP"]+df["H"])/df["IP"], 2)
    team = df.pop("Team")
    df.insert(1, team.name, team) #move team column to second
    fip = df.pop("FIP")
    df.insert(2, fip.name, fip) #move FIP column to third

We tidy up the data by removing the totals and the opponents rows and removing the unnecessary decimal place in the `GS` column.

In [14]:
for df in dfp:
    df.drop(df.tail(2).index,inplace=True) # drop last 2 rows(only run this code once otherwise data will be lost)
    df['GS'] = df['GS'].astype(int) #remove decimal place on GS column
dfp[2][-2:]

Unnamed: 0,Pitching,Team,FIP,ERA,W,L,GP,GS,CG,SHO,CBO,SV,IP,H,R,ER,BB,SO,2B,3B,HR,TBF,B_AVG,WP,HBP,BK,SFA,SHA,WHIP+
13,"Platt, Zach",Grace,11.828,15.0,0,0,2,0,0,0,0,0,3.0,8,13,5,4,4,1,0,1,25,0.421,1,2,0,0,0,4.67
14,"Bertke, Ryan",Grace,6.889,19.64,0,0,2,1,0,0,0,0,3.667,8,8,8,4,4,0,0,0,24,0.444,2,2,1,0,0,3.82


We subset the data to only include pitchers with at least 15 conference innings to ensure the best pitchers are recognized.

In [23]:
#make sure everyone has at least 15 IP
for i in range(len(teams)):
    dfp[i] = dfp[i][dfp[i]['IP'] >= 15]

Finally, we combine all of the dataframes with the code `pd.concat`. Then, we sort the rows by FIP beginning with the lowest. To maintain the standard notation of the `IP` column, we run a test to check for pitchers with innings pitched ending in 0.333 or 0.667 and change them back to 0.1 and 0.2 respectively. The top ten Crossroads League pitchers in terms of FIP are shown below.

In [24]:
all_pitchers = pd.concat(dfp)
top_pitchers = all_pitchers.sort_values(by=['FIP']) #sort by FIP in ascending order order
ips = top_pitchers["IP"].to_list() #change IP back to standard notation
for i in range(len(top_pitchers.index)):
    if ((ips[i]*3) % 3) == 0:
        pass
    elif (round(ips[i]*3) % 3) == 1:
        ips[i] = ips[i] - 0.233
    elif (round(ips[i]*3) % 3) == 2:
        ips[i] = ips[i] - 0.467
top_pitchers["IP"] = ips
display(top_pitchers[:10])

Unnamed: 0,Pitching,Team,FIP,ERA,W,L,GP,GS,CG,SHO,CBO,SV,IP,H,R,ER,BB,SO,2B,3B,HR,TBF,B_AVG,WP,HBP,BK,SFA,SHA,WHIP+
4,"Ubelhor, Mitch",Taylor,2.777,2.08,1,2,11,1,0,0,2,2,21.2,12,10,5,9,38,3,0,1,90,0.152,5,2,0,0,0,1.06
1,"Noelker, Andrew",Marian,3.113,2.62,2,2,8,6,3,1,0,0,34.1,24,15,10,12,44,9,0,1,141,0.189,2,1,0,0,1,1.08
3,"Hoffman, Hunter",IWU,3.201,2.16,3,1,8,8,1,1,0,0,50.0,33,17,12,17,64,5,1,2,200,0.181,1,1,0,0,0,1.02
6,"Huseman, Noah",Taylor,3.224,2.63,6,3,9,9,1,1,2,0,48.0,42,20,14,16,62,5,1,1,206,0.231,3,6,1,0,2,1.33
9,"Gongwer, Drake",Taylor,3.374,4.87,1,0,9,0,0,0,0,1,20.1,18,14,11,7,31,7,0,1,90,0.231,4,4,2,0,0,1.43
5,"Norman, Braedon",IWU,3.59,2.57,3,1,9,1,0,0,1,1,21.0,18,7,6,4,20,6,0,1,85,0.231,2,1,1,1,1,1.1
2,"Sullivan, Kaden",SFU,3.779,2.45,1,0,16,0,0,0,1,11,18.1,14,8,5,12,28,1,0,1,81,0.209,1,0,0,0,2,1.42
7,"Moran, Joe",Taylor,3.874,3.33,5,1,9,9,0,0,1,0,48.2,45,23,18,12,67,9,0,6,208,0.237,3,2,0,1,3,1.21
6,"Engelkes, Jake",IWU,4.065,3.67,4,0,8,6,2,0,0,0,41.2,45,18,17,8,41,5,1,3,181,0.268,0,5,0,0,0,1.39
2,"Young, Jon",IWU,4.067,1.71,6,0,6,5,1,0,1,0,31.2,26,8,6,5,35,4,0,4,127,0.213,0,0,0,0,0,0.98
