## Requirements
The product that you produce for this assignment is a notebook that describes your approach, as well as the code that implements the approach. Your notebook needs to include the following sections and information:

1. Introduction - A summary of your general approach
2. Data and statistics - Describe the data and statistics that you used, and how they were used.
3. Results of your approach - This section needs to include the top 5 players that your approach identifies. You also need to select one player from your top 5 and provide an analysis of his rating and monetary value.
4. Limitations - What difficulties do you see with implementing your strategy in the draft?

The draft will occur in class on March 20. Your assignment needs to be submitted before class on that day. 

To make grading easier, please name the notebook *Assignment4_LastName1_LastName2*, where *LastName1* and *LastName2* are the last names of the team members. Yes, you will be working in teams of two for this assignment.

## Introduction

Our approach will be to exploit the high points given for singles, doubles, triples, and homeruns. In particular, we will look for players who can get us these points. 

| Batting Stat | Points | 
|--------------|--------|
| H            | 5.6    |
| 2B           | 2.9    |
| 3B           | 5.7    |
| HR           | 9.4    |
| BB           | 3.0    |
<br>


In order to rank our fielding players, we will take inspiration from wOBA. Since we have limited funds to build our team, we are particularly interested in which players can give us the most amount of points for least amount of money. We will use wOBA's multipliers for H, 2B, 3B, and HR and divide by salary. Our modified wOBA fomula will use the same multipliers and follow the formula below: 

| Event | Points | 
|-------|--------|
| H     | .88    |
| 2B    | 1.247    |
| 3B    | 1.578    |
| HR    | 2.031    |


<br>

$$
\frac{(0.88 * H) + (1.247 * 2B) + (1.578 * 3B) + (2.031 * HR) }{salary}
$$
***
In order to rank our pitchers, we will take inspiration from FIP. FIP uses multipliers for different statistics which we will use in our formula. We will exploit the points below & calculate how many points per dollar each pitcher will bring us. See the formula below.

| Pitching Stat | Points |
|---------------|--------|
| IP            | 5      |
| K             | 2.0    |
| BB            | -3.0   |
| HBP           | -3.0   |
| HR            | -13    |
<br>
$$
\frac{(5.0 * IP) + (2.0 * K) + (-3.0 * (BB+HBP)) + (-13 * HR) }{salary}
$$

We will apply this formula for every player that has played in at least 100 games & sort them by their repected positions
. 


## Data
### Pitching & Batting for 2018

We used the 2018 statcast hitter & pitcher data and Fan Graph salary projections for fantasy drafting data. From statcast we grabbed the statistics needed for our modified wOBA & FIP formula and from the Fan Graph data we got the positions of the players and their respective salaries. 

In [15]:
from pybaseball import statcast
from pybaseball import batting_stats_range
import matplotlib.pylab as plt
import pandas as pd
import numpy as np

pitching2018 = pd.read_csv('statcast_2018.csv')
batting2018 = pd.read_csv('statcast_batting_2018.csv')
hittersSalaries = pd.read_csv('batters_fan_graph.csv')
pitchersSalaries = pd.read_csv('pitchers_fan_graph.csv')

pd.set_option('display.max_columns', None)
batting2018 = batting2018.loc[batting2018['G']>=100] # more than 100 games played

### Salary Data

In [17]:
# convert 'salary' string column to floats by removing ($,())
hittersSalaries['Dollars'] = [x.strip('$') for x in hittersSalaries.Dollars]
hittersSalaries['Dollars'] = [x.strip('($') for x in hittersSalaries.Dollars]
hittersSalaries['Dollars'] = [x.strip(')') for x in hittersSalaries.Dollars]
hittersSalaries['Dollars'] = pd.to_numeric(hittersSalaries['Dollars'])
hittersSalaries.head()

Unnamed: 0,PlayerName,Team,POS,ADP,PA,mAVG,mRBI,mR,mSB,mHR,PTS,aPOS,Dollars
0,Mookie Betts,Red Sox,OF,1.9,666,$8.1,$4.2,$8.2,$6.7,$3.5,$30.6,$7.9,39.5
1,Mike Trout,Angels,OF,1.1,595,$6.2,$4.7,$7.0,$4.0,$6.4,$28.2,$7.9,37.1
2,Jose Ramirez,Indians,3B,3.9,657,$4.2,$5.0,$5.0,$6.0,$2.9,$23.1,$8.0,32.2
3,J.D. Martinez,Red Sox,OF/DH,5.5,592,$6.3,$7.1,$4.0,($1.8),$6.3,$21.8,$7.9,30.7
4,Nolan Arenado,Rockies,3B,7.2,653,$4.6,$7.1,$5.0,($2.1),$6.7,$21.4,$8.0,30.4


In [48]:
# convert 'salary' string column to floats by removing ($,())
pitchersSalaries['Dollars'] = [x.strip('$') for x in pitchersSalaries.Dollars]
pitchersSalaries['Dollars'] = [x.strip('($') for x in pitchersSalaries.Dollars]
pitchersSalaries['Dollars'] = [x.strip(')') for x in pitchersSalaries.Dollars]
pitchersSalaries['Dollars'] = pd.to_numeric(pitchersSalaries['Dollars'])
pitchersSalaries.head()

Unnamed: 0,PlayerName,Team,POS,ADP,IP,mW,mSV,mERA,mWHIP,mSO,PTS,aPOS,Dollars
0,Chris Sale,Red Sox,SP,13.1,184,$5.9,($2.0),$11.1,$14.1,$7.0,$36.0,$10.3,47.3
1,Max Scherzer,Nationals,SP,4.7,208,$5.8,($2.0),$7.7,$11.6,$8.3,$31.3,$10.3,42.6
2,Jacob deGrom,Mets,SP,10.6,208,$4.9,($2.0),$10.4,$10.1,$7.1,$30.5,$10.3,41.8
3,Justin Verlander,Astros,SP,20.9,202,$5.6,($2.0),$4.4,$9.4,$7.0,$24.4,$10.3,35.7
4,Corey Kluber,Indians,SP,24.4,209,$5.2,($2.0),$3.3,$6.6,$5.1,$18.2,$10.3,29.5


# Applying the Formulas
***

## Fielding

In [18]:
batting2018['fantasy'] = ((batting2018['H']*0.88)+(batting2018['2B']*1.247)+(batting2018['3B']*1.578)+(batting2018['HR']*2.031))/hittersSalaries['Dollars']

result = pd.merge(batting2018, hittersSalaries, how='inner', left_on='Name', right_on='PlayerName')
result = result.dropna()
result = result[['Name', 'POS', 'fantasy', 'Dollars']].copy()
result.sort_values(by=['fantasy'], ascending=False)

Unnamed: 0,Name,POS,fantasy,Dollars
42,Willson Contreras,C,inf,15.4
41,Michael Conforto,OF,864.375000,7.9
43,Carlos Correa,SS,496.342500,11.0
46,Nelson Cruz,DH,283.250000,22.9
44,Zack Cozart,3B,255.147500,1.8
45,Brandon Crawford,SS,234.612500,2.0
40,Shin-Soo Choo,OF/DH,196.160000,8.9
50,Khris Davis,DH,95.010385,16.5
47,Travis d'Arnaud,C,88.378000,3.3
48,Matt Davidson,1B/DH,85.786667,27.1


## Catchers

In [23]:
resultC = result.loc[result['POS']=='C'].copy().sort_values(by='fantasy', ascending=False)
resultC

Unnamed: 0,Name,POS,fantasy,Dollars
42,Willson Contreras,C,inf,15.4
47,Travis d'Arnaud,C,88.378,3.3
38,Jason Castro,C,51.426667,0.2
88,Yasmani Grandal,C,9.908057,18.8
12,Tucker Barnhart,C,9.875985,7.1
10,Alex Avila,C,7.687208,3.4
82,Yan Gomes,C,7.624771,0.9
101,Austin Hedges,C,5.589615,1.3
124,Jonathan Lucroy,C,4.468991,0.8
110,Nick Hundley,C,4.257255,9.9


## 1B

In [24]:
result1B = result.loc[result['POS']=='1B'].copy().sort_values(by='fantasy', ascending=False)
result1B

Unnamed: 0,Name,POS,fantasy,Dollars
49,Chris Davis,1B,66.387083,6.2
33,Miguel Cabrera,1B,38.783333,11.0
72,Freddie Freeman,1B,21.598932,23.0
23,Justin Bour,1B,21.410253,11.2
15,Josh Bell,1B,18.947478,9.5
81,Paul Goldschmidt,1B,17.407059,21.8
66,Wilmer Flores,1B,14.99086,6.4
17,Brandon Belt,1B,14.174636,3.0
109,Eric Hosmer,1B,10.235039,10.8
100,Ryon Healy,1B,9.581966,11.5


## 2B

In [25]:
result2B = result.loc[result['POS']=='2B'].copy().sort_values(by='fantasy', ascending=False)
result2B

Unnamed: 0,Name,POS,fantasy,Dollars
36,Robinson Cano,2B,75.465806,9.9
39,Starlin Castro,2B,75.46087,1.3
58,Brian Dozier,2B,47.224727,2.3
80,Scooter Gennett,2B,14.555282,4.7
5,Jose Altuve,2B,12.313435,23.2
67,Logan Forsythe,2B,11.162396,29.3
99,Josh Harrison,2B,8.045153,9.0
103,Cesar Hernandez,2B,8.040544,0.8
120,DJ LeMahieu,2B,7.484816,3.6
123,Jed Lowrie,2B,7.355411,4.9


## 3B

In [26]:
result3B = result.loc[result['POS']=='3B'].copy().sort_values(by='fantasy', ascending=False)
result3B

Unnamed: 0,Name,POS,fantasy,Dollars
44,Zack Cozart,3B,255.1475,1.8
57,Josh Donaldson,3B,34.867273,9.8
69,Maikel Franco,3B,20.469192,8.0
71,Todd Frazier,3B,16.731485,15.0
9,Nolan Arenado,3B,16.364194,30.4
94,Jedd Gyorko,3B,8.271298,13.4
119,Jake Lamb,3B,7.514983,3.9
122,Evan Longoria,3B,7.378392,3.5


## SS

In [27]:
resultSS = result.loc[result['POS']=='SS'].copy().sort_values(by='fantasy', ascending=False)
resultSS

Unnamed: 0,Name,POS,fantasy,Dollars
43,Carlos Correa,SS,496.3425,11.0
45,Brandon Crawford,SS,234.6125,2.0
51,Paul DeJong,SS,52.6375,4.4
21,Xander Bogaerts,SS,23.518409,17.6
75,Freddy Galvis,SS,17.948087,7.7
7,Elvis Andrus,SS,13.360396,2.4
90,Didi Gregorius,SS,11.653404,13.7
6,Tim Anderson,SS,10.055659,7.8
8,Orlando Arcia,SS,9.469149,4.7
121,Francisco Lindor,SS,9.463223,30.3


## OF

In [28]:
resultOF = result.loc[result['POS']=='OF'].copy().sort_values(by='fantasy', ascending=False)
resultOF

Unnamed: 0,Name,POS,fantasy,Dollars
41,Michael Conforto,OF,864.375,7.9
34,Lorenzo Cain,OF,57.954872,14.7
32,Melky Cabrera,OF,54.965,23.0
35,Kole Calhoun,OF,52.093514,2.8
54,Corey Dickerson,OF,50.689796,0.5
60,Adam Duvall,OF,39.723607,18.4
30,Byron Buxton,OF,36.905,0.5
20,Charlie Blackmon,OF,33.16404,25.2
53,Delino DeShields,OF,27.317111,10.7
19,Mookie Betts,OF,24.318286,39.5


# The Line-Up
- C - Willson Contreras
- 1B - Chris Davis
- 2B - Robinson Cano
- 3B - Zack Cozart
- SS - Carlos Correa
- OF - Michael Conforto, Lorenzo Cain, Melky Cabrera
***

# Pitchers

In [38]:
from pybaseball.lahman import *
import pandas as pd

# data
download_lahman()

pitching = pitching()
pitching2018_L = pitching.loc[(pitching['yearID']==2016)].copy()
pitching2018_L

Unnamed: 0,playerID,yearID,stint,teamID,lgID,W,L,G,GS,CG,SHO,SV,IPouts,H,ER,HR,BB,SO,BAOpp,ERA,IBB,WP,HBP,BK,BFP,GF,R,SH,SF,GIDP
44139,abadfe01,2016,1,MIN,AL,1,4,39,0,0,0,1,102,27,10,2,14,29,0.220,2.65,2.0,0.0,0.0,1,138.0,8.0,11,0.0,1.0,6.0
44140,abadfe01,2016,2,BOS,AL,0,2,18,0,0,0,0,38,13,9,2,8,12,0.255,6.39,0.0,1.0,1.0,0,60.0,7.0,9,0.0,0.0,1.0
44141,achteaj01,2016,1,LAA,AL,1,0,27,0,0,0,0,113,43,13,7,12,14,0.295,3.11,1.0,0.0,1.0,0,160.0,17.0,13,0.0,1.0,7.0
44142,adamsau01,2016,1,CLE,AL,0,0,19,0,0,0,0,55,27,20,5,7,17,0.333,9.82,1.0,0.0,0.0,0,88.0,11.0,22,0.0,0.0,2.0
44143,adlemti01,2016,1,CIN,NL,4,4,13,13,0,0,0,209,64,31,13,20,47,0.251,4.00,1.0,0.0,5.0,0,287.0,0.0,32,6.0,1.0,8.0
44144,alberan01,2016,1,MIN,AL,0,0,6,2,0,0,0,51,27,11,5,6,16,0.342,5.82,0.0,1.0,0.0,0,85.0,3.0,16,0.0,0.0,3.0
44145,alberma01,2016,1,CHA,AL,2,6,58,1,0,0,0,154,67,36,10,19,30,0.321,6.31,1.0,4.0,3.0,0,237.0,11.0,44,3.0,2.0,4.0
44146,albural01,2016,1,LAA,AL,0,0,2,0,0,0,0,6,2,1,1,2,1,0.200,4.50,0.0,0.0,0.0,0,12.0,1.0,3,0.0,0.0,0.0
44147,alcanra01,2016,1,OAK,AL,1,3,5,5,0,0,0,67,31,18,9,4,14,0.333,7.25,0.0,1.0,4.0,1,103.0,0.0,18,0.0,2.0,2.0
44148,alexasc01,2016,1,KCA,AL,0,0,17,0,0,0,0,57,24,7,1,7,16,0.316,3.32,0.0,0.0,0.0,0,84.0,4.0,7,0.0,1.0,4.0


In [42]:
pitching2018_L['IP'] = pitching2018_L['IPouts'] / 3
master = master()

In [60]:
pitching2018_L['fantasy'] = ((5.0*pitching2018_L['IP'])+(2.0*pitching2018_L['SO'])-(3.0*(pitching2018_L['BB']+pitching2018_L['HBP'])-(13.0*pitching2018_L['HR'])))


resultNames = pd.merge(pitching2018_L[['playerID', 'fantasy']], 
                      master[['playerID', 'nameFirst', 'nameLast']],
                      on='playerID', how='outer')
resultNames['Name'] = resultNames['nameFirst']+" "+resultNames['nameLast']


resultP = pd.merge(resultNames, pitchersSalaries, how='inner', left_on='Name', right_on='PlayerName')

resultP = resultP.dropna()
resultP = resultP[['Name', 'POS', 'fantasy', 'Dollars']].copy()
resultP.sort_values(by=['fantasy'], ascending=False)

Unnamed: 0,Name,POS,fantasy,Dollars
284,Max Scherzer,SP,1926.666667,42.6
325,Justin Verlander,SP,1841.333333,35.7
246,David Price,SP,1825.000000,11.5
36,Madison Bumgarner,SP,1787.333333,10.0
277,Chris Sale,SP,1764.333333,47.3
244,Rick Porcello,SP,1657.000000,6.6
9,Chris Archer,SP,1652.666667,16.7
165,Corey Kluber,SP,1623.000000,29.5
160,Ian Kennedy,SP,1538.333333,11.8
247,Jose Quintana,SP,1526.000000,9.1


## SP

In [62]:
resultSP = resultP.loc[resultP['POS']=='SP']
resultSP.sort_values(by=['fantasy'], ascending=False)

Unnamed: 0,Name,POS,fantasy,Dollars
284,Max Scherzer,SP,1926.666667,42.6
325,Justin Verlander,SP,1841.333333,35.7
246,David Price,SP,1825.000000,11.5
36,Madison Bumgarner,SP,1787.333333,10.0
277,Chris Sale,SP,1764.333333,47.3
244,Rick Porcello,SP,1657.000000,6.6
9,Chris Archer,SP,1652.666667,16.7
165,Corey Kluber,SP,1623.000000,29.5
160,Ian Kennedy,SP,1538.333333,11.8
247,Jose Quintana,SP,1526.000000,9.1


## RP

In [63]:
resultRP = resultP.loc[resultP['POS']=='RP']
resultRP.sort_values(by=['fantasy'], ascending=False)

Unnamed: 0,Name,POS,fantasy,Dollars
24,Chad Bettis,RP,1294.000000,10.9
276,Danny Salazar,RP,1021.666667,0.1
340,Steven Wright,RP,998.333333,9.7
31,Archie Bradley,RP,989.333333,1.0
206,Adam Morgan,RP,956.666667,6.9
236,Wily Peralta,RP,933.333333,7.9
215,Juan Nicasio,RP,905.000000,5.6
62,Adam Conley,RP,864.666667,7.9
127,Junior Guerra,RP,800.333333,6.0
73,Chris Devenski,RP,732.666667,4.7


Let's take a look at one of the top 5 outfielders our calculation found, Mookie Betts. 

## Limitations