# POTD (Pick of the Day)
#### There is no Pick of the Day today. We will use it as an opportunity to recap the performance of the models so far.

That's right. No pick today...Boo! But our POTD yesterday got a hit! So, hopefully your streak is still
active.
Today we want to highlight the success of our models and provide insight into their results so far.

### Random Forest
Our Random Forest is currently 13/19 when looking at its top picks each day. A 68% success rate
isn't encouraging at all. For reference, selecting a random player each day grants you ~65% chance
of getting a hit. However, until recently, our models were only trained on 2 seasons worth of at-bats
(~325,000 samples). We acquired every at-bat from 2018 (~151,000 samples), prepared it, trained
both of our models with it, and re-tuned their parameters. Since then, the random forest is 7/8
(87.5%). We are a lot more confident in it after the improvement, but it's not. Prior to adding more
data, it was assigining players fairly high probabilities (Mean: 37%). The new data made it more
accurate but less confident (Mean: 28.6%). While this is interesting, we like that it's being more
conservative. Our random forest's highest streak so far is its current one: 7

### Logistic Regression
Our Logistic Regression model has been more impressive thus far. It's currently 16/20 (80%), which
proves to be a better bet than selecting one of the top4 most popular picks each day (~70%). This
model has certainly been safer than our Random Forest so far. Although, it should be reiterated that
the forest has been on a tear since we fed it more at-bats and re-tuned it. The logistic regression
model's highest streak was 8.

### Our Strategy
Here is what matters most. How has our strategy been so far? Have all those boring "no pick days"
been worth it? Our strategy is currently 11/13 (84.6%), which is pretty darn good. So far, it proves
that it's better to combine our two models together with our formulated pick strategy than to use one
over another. Using our strategy, our highest (and first) streak got to 10. We are currently back to 5,
but there is a lot of baseball left to be played!

### Today's Output
Here is the output of our program for today:
Random Forest: Cesar Hernandez 0.282 Ender Inciarte 0.282
Logistic Regression: Zack Cozart 0.365 Dustin Pedroia 0.311
If you really can't wait until tomorrow (possibly) to make a pick, logistic regression is very confident in
Zack Cozart for the day (36.5% chance of a hit per at-bat!). However, since random forest disagrees,
so do we.


In [4]:
import pandas as pd
import plotly.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go

# Random Forest
df = pd.DataFrame([
['3/28/2019','Odubel Herrera',0.34,1,1],
['3/29/2019','Albert Pujols',0.344,0,0],
['3/30/2019','Nelson Cruz',0.38,0,0],
['3/31/2019','Eddie Rosario',0.36,0,0],
['4/1/2019','Rougned Odor',0.351,1,1],
['4/2/2019','Shin-Soo Choo',0.377,1,2],
['4/3/2019','Jose Abreu',0.404,1,3],
['4/4/2019', 'Anthony Rendon',0.326,1,4],
['4/5/2019','Enrique Hernandez',0.417,0,0],
['4/6/2019','Charlie Blackmon',0.399,0,0],
['4/7/2019','Justin Turner',0.378,1,1],
['4/8/2019','Ozzie Albies',0.291,0,0],
['4/9/2019','Avisail Garcia',0.287,1,1],
['4/10/2019','Carlos Correa',0.281,1,2],
['4/11/2019','David Freese',0.284,1,3],
['4/12/2019','Jonathan Lucroy',0.295,1,4],
['4/13/2019','Miguel Cabrera',0.285,1,5],
['4/14/2019','Paul Goldschmidt',0.28,1,6],
['4/15/2019','Justin Smoak',0.292,1,7],
['4/16/2019','Marcus Semien',0.283,0,0]], 
columns=['Date', 'Random Forest', 'Probability', 'Result', 'Streak'])

df['Date'] = pd.to_datetime(df['Date'])

In [18]:
trace1 = go.Scatter(
    x = df.Date,
    y = df.Streak,
    mode = 'lines+markers',
    name = 'Streak',
    text = df['Random Forest']
)

trace2 = go.Scatter(
    x = df.Date,
    y = df.Probability,
    mode = 'lines+markers',
    name = 'Probability',
    text = df['Random Forest'],
    yaxis='y2'
)

layout = go.Layout(
    showlegend=False,
    title='Pick of the Day Random Forest Performance (4/17)',
    yaxis=dict(
        title='Streak'
    ),
    yaxis2=dict(
        title='Probability',
        overlaying='y',
        side='right',
        showgrid=False
    )
)

data = [trace1, trace2]

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='potd-rf-performance-04-17-19')

In [20]:
# Logistic Regression

df = pd.DataFrame([
['3/28/2019','Odubel Herrera',0.338,1,1],
['3/29/2019','Eric Hosmer',0.317,0,0],
['3/30/2019','Hanley Ramirez',0.315,1,1],
['3/31/2019','Anthony Rizzo',0.32,1,2],
['4/1/2019','Joey Votto',0.333,1,3],
['4/2/2019','Justin Turner',0.43,0,0],
['4/3/2019','Andrew Benintendi',0.311,1,1],
['4/4/2019','Anthony Rendon',0.302,1,2],
['4/5/2019','Adam Jones',0.388,1,3],
['4/6/2019','Ryan Braun',0.365,1,4],
['4/7/2019','Edwin Encarnacion',0.349,1,5],
['4/8/2019','Yasmani Grandal',0.31,1,6],
['4/9/2019','Miguel Cabrera',0.328,1,7],
['4/10/2019','Miguel Cabrera',0.326,1,8],
['4/11/2019','Nolan Arenado',0.313,0,0],
['4/12/2019','Albert Pujols',0.377,1,1],
['4/13/2019','Joey Votto',0.337,1,2],
['4/14/2019','Kevin Kiermaier',0.318,1,3],
['4/15/2019','Whit Merrifield',0.317,0,0],
['4/16/2019','Whit Merrifield',0.304,1,1]], 
columns=['Date', 'Logistic Regression', 'Probability', 'Result', 'Streak'])

df['Date'] = pd.to_datetime(df['Date'])

trace1 = go.Scatter(
    x = df.Date,
    y = df.Streak,
    mode = 'lines+markers',
    name = 'Streak',
    text = df['Logistic Regression']
)

trace2 = go.Scatter(
    x = df.Date,
    y = df.Probability,
    mode = 'lines+markers',
    name = 'Probability',
    text = df['Logistic Regression'],
    yaxis='y2'
)

layout = go.Layout(
    showlegend=False,
    title='Pick of the Day Logistic Regression Performance (4/17)',
    yaxis=dict(
        title='Streak'
    ),
    yaxis2=dict(
        title='Probability',
        overlaying='y',
        side='right',
        showgrid=False
    )
)

data = [trace1, trace2]

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='potd-lr-performance-04-17-19')

In [26]:
# Overall Strategy

df = pd.DataFrame([
['3/28/2019','Odubel Herrera',1,1],
['3/31/2019','Joey Votto, Trea Turner',1,3],
['4/1/2019','Jose Peraza, Joey Votto',1,5],
['4/3/2019','Andrew Benintendi, Carlos Santana',1,7],
['4/4/2019','Anthony Rendon, Starling Marte',1,9],
['4/5/2019','Enrique Hernandez, Justin Turner',0,0],
['4/6/2019','Christian Yelich, Joey Votto',0,0],
['4/7/2019','Edwin Encarnacion, Justin Turner',1,2],
['4/10/2019','Miguel Cabrera',1,3],
['4/14/2019','Jorge Polanco',1,4],
['4/16/2019','Whit Merrifield',1,5]], 
columns=['Date','Strategy','Result','Streak'])


trace = go.Scatter(
    x = df.Date,
    y = df.Streak,
    mode = 'lines+markers',
    name = 'Streak',
    text = df.Strategy
)

layout = go.Layout(
    showlegend=False,
    title='Pick of the Day Strategy Performance (4/17)',
    yaxis=dict(
        title='Streak'
    )
)


data = [trace]

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='potd-strategy-performance-04-17-19')