## Requirements
The product that you produce for this assignment is a notebook that describes your approach, as well as the code that implements the approach. Your notebook needs to include the following sections and information:

1. Introduction - A summary of your general approach
2. Data and statistics - Describe the data and statistics that you used, and how they were used.
3. Results of your approach - This section needs to include the top 5 players that your approach identifies. You also need to select one player from your top 5 and provide an analysis of his rating and monetary value.
4. Limitations - What difficulties do you see with implementing your strategy in the draft?

The draft will occur in class on March 20. Your assignment needs to be submitted before class on that day. 

To make grading easier, please name the notebook *Assignment4_LastName1_LastName2*, where *LastName1* and *LastName2* are the last names of the team members. Yes, you will be working in teams of two for this assignment.

## Introduction

Our approach will be to exploit the high points given for singles, doubles, triples, and homeruns. In particular, we will look for players who can get us these points. 

| Batting Stat | Points | 
|--------------|--------|
| H            | 5.6    |
| 2B           | 2.9    |
| 3B           | 5.7    |
| HR           | 9.4    |

<br>


In order to rank our fielding players, we will take inspiration from wOBA by using linear weights. Since we have limited funds to build our team, we are particularly interested in which players can give us the most amount of points for least amount of money. We will use the following multipliers (which correspond to the pointing system for the fantasy scoreing) for H, 2B, 3B, and HR and divide by salary. Our fomula will use multipliers and follow the formula below: 

| Event | Multiplier | 
|-------|--------|
| H     | 5.6    |
| 2B    | 2.9    |
| 3B    | 5.7    |
| HR    | 9.4    |


<br>

$$
\frac{(5.6 * H) + (2.9 * 2B) + (5.7 * 3B) + (9.4 * HR) }{salary}
$$
***
In order to rank our pitchers, we will take inspiration from FIP. FIP uses multipliers for different statistics which we will use in our formula. We will exploit the points below & calculate how many points per dollar each pitcher will bring us. See the formula below.

| Pitching Stat | Points |
|---------------|--------|
| IP            | 5      |
| K             | 2.0    |
| BB            | -3.0   |
| HBP           | -3.0   |
| HR            | -13    |
<br>
$$
\frac{(5.0 * IP) + (2.0 * K) + (-3.0 * (BB+HBP)) + (-13 * HR) }{salary}
$$

We will apply this formula for every player that has played in at least 100 games & sort them by their repected positions. 


## Data
### Pitching & Batting for 2018

We used the 2018 statcast hitter & 2016 Lahman pitcher data and Fan Graph salary projections for fantasy drafting data. From statcast we grabbed the statistics needed for our formula (H, 2B, 3B, HR). From Lahman, we grabbed the statistics needed for out formula (IP, K, BB, HBP, HR). From Fan Graphs, we grabbed data for positions of the players and their respective salaries. 

In [1]:
from pybaseball import statcast
from pybaseball import batting_stats_range
import matplotlib.pylab as plt
import pandas as pd
import numpy as np

pitching2018 = pd.read_csv('statcast_2018.csv')
batting2018 = pd.read_csv('statcast_batting_2018.csv')
hittersSalaries = pd.read_csv('batters_fan_graph.csv')
pitchersSalaries = pd.read_csv('pitchers_fan_graph.csv')

pd.set_option('display.max_columns', None)
batting2018 = batting2018.loc[batting2018['G']>=50] # more than 100 games played

### Salary Data

In [2]:
# convert 'salary' string column to floats by removing ($,())
hittersSalaries['Dollars'] = [x.strip('$') for x in hittersSalaries.Dollars]
hittersSalaries['Dollars'] = [x.strip('($') for x in hittersSalaries.Dollars]
hittersSalaries['Dollars'] = [x.strip(')') for x in hittersSalaries.Dollars]
hittersSalaries['Dollars'] = pd.to_numeric(hittersSalaries['Dollars'])
hittersSalaries

Unnamed: 0,PlayerName,Team,POS,ADP,PA,mAVG,mRBI,mR,mSB,mHR,PTS,aPOS,Dollars
0,Mookie Betts,Red Sox,OF,1.9,666,$8.1,$4.2,$8.2,$6.7,$3.5,$30.6,$7.9,39.5
1,Mike Trout,Angels,OF,1.1,595,$6.2,$4.7,$7.0,$4.0,$6.4,$28.2,$7.9,37.1
2,Jose Ramirez,Indians,3B,3.9,657,$4.2,$5.0,$5.0,$6.0,$2.9,$23.1,$8.0,32.2
3,J.D. Martinez,Red Sox,OF/DH,5.5,592,$6.3,$7.1,$4.0,($1.8),$6.3,$21.8,$7.9,30.7
4,Nolan Arenado,Rockies,3B,7.2,653,$4.6,$7.1,$5.0,($2.1),$6.7,$21.4,$8.0,30.4
5,Francisco Lindor,Indians,SS,15.2,652,$4.6,$3.1,$5.6,$4.5,$3.5,$21.4,$8.0,30.3
6,Giancarlo Stanton,Yankees,OF/DH,21.5,602,$0.6,$7.9,$4.5,($2.2),$10.1,$20.8,$7.9,29.7
7,Trea Turner,Nationals,SS,7.3,658,$4.9,($0.8),$4.7,$12.9,($1.7),$19.9,$8.0,28.9
8,Christian Yelich,Brewers,OF,7.1,641,$6.7,$2.9,$4.7,$2.6,$2.5,$19.4,$7.9,28.2
9,Ronald Acuna Jr.,Braves,OF,8.8,652,$3.2,$1.9,$4.5,$6.2,$3.2,$19.0,$7.9,27.8


In [3]:
# convert 'salary' string column to floats by removing ($,())
pitchersSalaries['Dollars'] = [x.strip('$') for x in pitchersSalaries.Dollars]
pitchersSalaries['Dollars'] = [x.strip('($') for x in pitchersSalaries.Dollars]
pitchersSalaries['Dollars'] = [x.strip(')') for x in pitchersSalaries.Dollars]
pitchersSalaries['Dollars'] = pd.to_numeric(pitchersSalaries['Dollars'])
pitchersSalaries.head()

Unnamed: 0,PlayerName,Team,POS,ADP,IP,mW,mSV,mERA,mWHIP,mSO,PTS,aPOS,Dollars
0,Chris Sale,Red Sox,SP,13.1,184,$5.9,($2.0),$11.1,$14.1,$7.0,$36.0,$10.3,47.3
1,Max Scherzer,Nationals,SP,4.7,208,$5.8,($2.0),$7.7,$11.6,$8.3,$31.3,$10.3,42.6
2,Jacob deGrom,Mets,SP,10.6,208,$4.9,($2.0),$10.4,$10.1,$7.1,$30.5,$10.3,41.8
3,Justin Verlander,Astros,SP,20.9,202,$5.6,($2.0),$4.4,$9.4,$7.0,$24.4,$10.3,35.7
4,Corey Kluber,Indians,SP,24.4,209,$5.2,($2.0),$3.3,$6.6,$5.1,$18.2,$10.3,29.5


# Applying the Formulas
***

## Fielding

In [4]:
#batting2018['fantasy'] = ((batting2018['H']*0.88)+(batting2018['2B']*1.247)+(batting2018['3B']*1.578)+(batting2018['HR']*2.031))/hittersSalaries['Dollars']
result = pd.merge(batting2018, hittersSalaries, how='inner', left_on='Name', right_on='PlayerName')
result['fantasy'] = ((result['H']*5.6)+(result['2B']*2.9)+(result['3B']*5.7)+(result['HR']*9.4))/result['Dollars']

print(batting2018.shape)
#result = result.dropna()
result = result[['Name', 'POS', 'fantasy', 'Dollars']].copy()
result.sort_values(by=['fantasy'], ascending=False)

(398, 28)


Unnamed: 0,Name,POS,fantasy,Dollars
290,Luke Voit,1B,inf,0.0
152,Kevin Kiermaier,OF,7896.000000,0.1
171,Nick Markakis,OF,5534.000000,0.2
119,Billy Hamilton,OF,4780.000000,0.2
46,Jason Castro,C,3197.000000,0.2
72,Corey Dickerson,OF,2603.800000,0.5
15,Austin Barnes,C,2414.500000,0.2
234,Hunter Renfroe,OF,2248.500000,0.4
35,Byron Buxton,OF,1760.800000,0.5
252,Domingo Santana,OF,1479.625000,0.8


## Catchers

In [5]:
resultC = result.loc[result['POS']=='C'].copy().sort_values(by='fantasy', ascending=False)
resultC

Unnamed: 0,Name,POS,fantasy,Dollars
46,Jason Castro,C,3197.0,0.2
15,Austin Barnes,C,2414.5,0.2
164,Jonathan Lucroy,C,952.0,0.8
105,Yan Gomes,C,686.111111,0.9
51,Robinson Chirinos,C,642.666667,0.9
288,Christian Vazquez,C,579.090909,1.1
127,Austin Hedges,C,525.615385,1.3
273,Kurt Suzuki,C,408.1875,1.6
167,Martin Maldonado,C,344.952381,2.1
215,Josh Phegley,C,285.125,0.8


## 1B

In [6]:
result1B = result.loc[result['POS']=='1B'].copy().sort_values(by='fantasy', ascending=False)
result1B

Unnamed: 0,Name,POS,fantasy,Dollars
290,Luke Voit,1B,inf,0.0
6,Yonder Alonso,1B,999.0,1.0
300,Ryan Zimmerman,1B,529.8,2.5
21,Brandon Belt,1B,259.933333,3.0
264,Justin Smoak,1B,243.886792,5.3
190,Mitch Moreland,1B,164.819672,6.1
63,Chris Davis,1B,135.870968,6.2
139,Eric Hosmer,1B,130.166667,10.8
19,Josh Bell,1B,119.789474,9.5
86,Wilmer Flores,1B,114.65625,6.4


## 2B

In [7]:
result2B = result.loc[result['POS']=='2B'].copy().sort_values(by='fantasy', ascending=False)
result2B

Unnamed: 0,Name,POS,fantasy,Dollars
129,Cesar Hernandez,2B,1292.75,0.8
47,Starlin Castro,2B,733.153846,1.3
154,Jason Kipnis,2B,622.1,1.0
76,Brian Dozier,2B,593.304348,2.3
254,Jonathan Schoop,2B,430.757576,3.3
159,DJ LeMahieu,2B,343.777778,3.6
163,Jed Lowrie,2B,238.77551,4.9
103,Scooter Gennett,2B,233.255319,4.7
204,Joe Panik,2B,168.344262,6.1
188,Yoan Moncada,2B,153.083333,2.4


## 3B

In [8]:
result3B = result.loc[result['POS']=='3B'].copy().sort_values(by='fantasy', ascending=False)
result3B

Unnamed: 0,Name,POS,fantasy,Dollars
57,Zack Cozart,3B,590.611111,1.8
162,Evan Longoria,3B,342.8,3.5
257,Kyle Seager,3B,313.945946,3.7
250,Miguel Sano,3B,295.40625,3.2
158,Jake Lamb,3B,291.435897,3.9
90,Maikel Franco,3B,131.825,8.0
75,Josh Donaldson,3B,101.867347,9.8
193,Mike Moustakas,3B,92.202899,13.8
272,Eugenio Suarez,3B,80.195652,13.8
284,Justin Turner,3B,78.964539,14.1


## SS

In [9]:
resultSS = result.loc[result['POS']=='SS'].copy().sort_values(by='fantasy', ascending=False)
resultSS

Unnamed: 0,Name,POS,fantasy,Dollars
10,Elvis Andrus,SS,586.666667,2.4
58,Brandon Crawford,SS,484.75,2.0
261,Andrelton Simmons,SS,225.307692,5.2
65,Paul DeJong,SS,223.295455,4.4
11,Orlando Arcia,SS,209.723404,4.7
96,Freddy Galvis,SS,142.74026,7.7
9,Tim Anderson,SS,141.487179,7.8
274,Dansby Swanson,SS,108.070423,7.1
259,Marcus Semien,SS,98.5625,6.4
56,Carlos Correa,SS,95.327273,11.0


## OF

In [10]:
resultOF = result.loc[result['POS']=='OF'].copy().sort_values(by='fantasy', ascending=False)
resultOF

Unnamed: 0,Name,POS,fantasy,Dollars
152,Kevin Kiermaier,OF,7896.000000,0.1
171,Nick Markakis,OF,5534.000000,0.2
119,Billy Hamilton,OF,4780.000000,0.2
72,Corey Dickerson,OF,2603.800000,0.5
234,Hunter Renfroe,OF,2248.500000,0.4
35,Byron Buxton,OF,1760.800000,0.5
252,Domingo Santana,OF,1479.625000,0.8
217,Kevin Pillar,OF,1379.250000,0.8
197,Brandon Nimmo,OF,855.500000,0.4
131,Odubel Herrera,OF,610.722222,1.8


# The Line-Up
- C  - Jason Castro
- 1B - Yonder Alonso
- 2B - Cesar Hernandez
- 3B - Zack Cozart
- SS - Elvis Andrus	
- OF - Kevin Kiermaier, Nick Markakis, Billy Hamilton
***




# Pitchers

In [11]:
from pybaseball.lahman import *
import pandas as pd

# data
download_lahman()
# only for Eric's machine
#pitching = pd.read_csv("C:/Users/Eric/Documents/CU/CU/sabermetrics/baseballdatabank-2017.1/core/Pitching.csv")

pitching = pitching()
pitching2018_L = pitching.loc[(pitching['yearID']==2016)].copy()
pitching2018_L

Unnamed: 0,playerID,yearID,stint,teamID,lgID,W,L,G,GS,CG,SHO,SV,IPouts,H,ER,HR,BB,SO,BAOpp,ERA,IBB,WP,HBP,BK,BFP,GF,R,SH,SF,GIDP
44139,abadfe01,2016,1,MIN,AL,1,4,39,0,0,0,1,102,27,10,2,14,29,0.220,2.65,2.0,0.0,0.0,1,138.0,8.0,11,0.0,1.0,6.0
44140,abadfe01,2016,2,BOS,AL,0,2,18,0,0,0,0,38,13,9,2,8,12,0.255,6.39,0.0,1.0,1.0,0,60.0,7.0,9,0.0,0.0,1.0
44141,achteaj01,2016,1,LAA,AL,1,0,27,0,0,0,0,113,43,13,7,12,14,0.295,3.11,1.0,0.0,1.0,0,160.0,17.0,13,0.0,1.0,7.0
44142,adamsau01,2016,1,CLE,AL,0,0,19,0,0,0,0,55,27,20,5,7,17,0.333,9.82,1.0,0.0,0.0,0,88.0,11.0,22,0.0,0.0,2.0
44143,adlemti01,2016,1,CIN,NL,4,4,13,13,0,0,0,209,64,31,13,20,47,0.251,4.00,1.0,0.0,5.0,0,287.0,0.0,32,6.0,1.0,8.0
44144,alberan01,2016,1,MIN,AL,0,0,6,2,0,0,0,51,27,11,5,6,16,0.342,5.82,0.0,1.0,0.0,0,85.0,3.0,16,0.0,0.0,3.0
44145,alberma01,2016,1,CHA,AL,2,6,58,1,0,0,0,154,67,36,10,19,30,0.321,6.31,1.0,4.0,3.0,0,237.0,11.0,44,3.0,2.0,4.0
44146,albural01,2016,1,LAA,AL,0,0,2,0,0,0,0,6,2,1,1,2,1,0.200,4.50,0.0,0.0,0.0,0,12.0,1.0,3,0.0,0.0,0.0
44147,alcanra01,2016,1,OAK,AL,1,3,5,5,0,0,0,67,31,18,9,4,14,0.333,7.25,0.0,1.0,4.0,1,103.0,0.0,18,0.0,2.0,2.0
44148,alexasc01,2016,1,KCA,AL,0,0,17,0,0,0,0,57,24,7,1,7,16,0.316,3.32,0.0,0.0,0.0,0,84.0,4.0,7,0.0,1.0,4.0


In [12]:
pitching2018_L['IP'] = pitching2018_L['IPouts'] / 3
#master = pd.read_csv("C:/Users/Eric/Documents/CU/CU/sabermetrics/baseballdatabank-2017.1/core/Master.csv")
master = master()

In [13]:
pitching2018_L['fantasy'] = ((5.0*pitching2018_L['IP'])+(2.0*pitching2018_L['SO'])-(3.0*(pitching2018_L['BB']+pitching2018_L['HBP'])-(13.0*pitching2018_L['HR'])))


resultNames = pd.merge(pitching2018_L[['playerID', 'fantasy']], 
                      master[['playerID', 'nameFirst', 'nameLast']],
                      on='playerID', how='outer')
resultNames['Name'] = resultNames['nameFirst']+" "+resultNames['nameLast']


resultP = pd.merge(resultNames, pitchersSalaries, how='inner', left_on='Name', right_on='PlayerName')

resultP = resultP.dropna()
resultP = resultP[['Name', 'POS', 'fantasy', 'Dollars']].copy()
resultP.sort_values(by=['fantasy'], ascending=False)

Unnamed: 0,Name,POS,fantasy,Dollars
284,Max Scherzer,SP,1926.666667,42.6
325,Justin Verlander,SP,1841.333333,35.7
246,David Price,SP,1825.000000,11.5
36,Madison Bumgarner,SP,1787.333333,10.0
277,Chris Sale,SP,1764.333333,47.3
244,Rick Porcello,SP,1657.000000,6.6
9,Chris Archer,SP,1652.666667,16.7
165,Corey Kluber,SP,1623.000000,29.5
160,Ian Kennedy,SP,1538.333333,11.8
247,Jose Quintana,SP,1526.000000,9.1


## SP

In [14]:
resultSP = resultP.loc[resultP['POS']=='SP']
resultSP.sort_values(by=['fantasy'], ascending=False)

Unnamed: 0,Name,POS,fantasy,Dollars
284,Max Scherzer,SP,1926.666667,42.6
325,Justin Verlander,SP,1841.333333,35.7
246,David Price,SP,1825.000000,11.5
36,Madison Bumgarner,SP,1787.333333,10.0
277,Chris Sale,SP,1764.333333,47.3
244,Rick Porcello,SP,1657.000000,6.6
9,Chris Archer,SP,1652.666667,16.7
165,Corey Kluber,SP,1623.000000,29.5
160,Ian Kennedy,SP,1538.333333,11.8
247,Jose Quintana,SP,1526.000000,9.1


## RP

In [15]:
resultRP = resultP.loc[resultP['POS']=='RP']
resultRP.sort_values(by=['fantasy'], ascending=False)

Unnamed: 0,Name,POS,fantasy,Dollars
24,Chad Bettis,RP,1294.000000,10.9
276,Danny Salazar,RP,1021.666667,0.1
340,Steven Wright,RP,998.333333,9.7
31,Archie Bradley,RP,989.333333,1.0
206,Adam Morgan,RP,956.666667,6.9
236,Wily Peralta,RP,933.333333,7.9
215,Juan Nicasio,RP,905.000000,5.6
62,Adam Conley,RP,864.666667,7.9
127,Junior Guerra,RP,800.333333,6.0
73,Chris Devenski,RP,732.666667,4.7


## The Line-Up
- SP: Max Scherzer	
- RP: Chad Bettis

# Following One Player
Looking at one of our outfielders, **Nick Markakis** -  He earned a 'fantasy' score of 5534 and has a very low salary as compared to other cathcers. He had 623 AB with a BA of .297 for the 2018 season. He also hit 185 R, 43 doubles, 2 triples, and 14 HR during this season as well. This is the type of player who will allow us to rack up a lot of points for our fantasy league while still being a reasonably priced player.


## Limitations

Our main limitation to our approach will be our narrowed view on only assessing our position players based on salary, H, Doubles, Triples and HR. The same goes for our pitchers where we are only assessing on salary, IP, K, BB, HBP, and HR. There are other ways to earn points in Fantasy but we have decided these areas will give us the biggest bang for our buck. However, this may come to hurt us in the long run as we will most likely be under our budget. 