<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Linear-Weights-and-wOBA" data-toc-modified-id="Linear-Weights-and-wOBA-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Linear Weights and wOBA</a></span><ul class="toc-item"><li><span><a href="#Linear-Weights-(LWTS)" data-toc-modified-id="Linear-Weights-(LWTS)-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Linear Weights (LWTS)</a></span></li><li><span><a href="#wOBA" data-toc-modified-id="wOBA-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>wOBA</a></span></li><li><span><a href="#Barry-Bonds-Example" data-toc-modified-id="Barry-Bonds-Example-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Barry Bonds Example</a></span><ul class="toc-item"><li><span><a href="#Bonds-Batting-Runs/wRAA" data-toc-modified-id="Bonds-Batting-Runs/wRAA-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Bonds Batting Runs/wRAA</a></span></li><li><span><a href="#Bonds-wOBA" data-toc-modified-id="Bonds-wOBA-1.3.2"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Bonds wOBA</a></span></li></ul></li></ul></li></ul></div>

# Demo - Linear Weights

In this demo, use the RE24 values computed in the demo on RE24 to empirically derive weights or expected values for hitting events known as _Linear Weights_ or _LWTS_.  We use then show how LWTS forms the basis for the advanced statistic _wOBA_ (Weighted On-Base Average) we saw in the offensive metrics demo.

In [None]:
%run ../../utils/notebook_setup.py

In [None]:
from datascience import Table
from datascience.util import table_apply

import numpy as np

# custom functions that will help do some simple tasks
from datascience_utils import *
from datascience_stats import *
from datascience_topic import fast_run_expectancy, most_common_lineup_position

##### Load RE24 Data

In [None]:
re24 = Table.read_table('re24_vals_2001.csv', sep=',')
re24.show(10)

## Linear Weights and wOBA

In the demo on RE24, we directly attributing RE24 to a player.  Instead, we can try to even out their perfomance by weighting each event the same by completely ignoring the context of the event.  This allows us to avoid the pitfalls we saw previously where it seemed nearly impossible to tell whether environment or ability was driving production.  By doing this, we completely remove environment in which the hitter was hitting in and get a performance compared to overall average!

We'll use the RE24 values to construct the Linear Weights values as well as the offensive metric wOBA (Weighted On Base Average).

For each `Event_Type` we can compute the average value for each type of event. 

A single can have different RE24 outcomes according to different out/baserunner situations so the average weights these outcome values according to the frequency with which the situations occur.  This way, we do not under value a single that occurs with no runners on and doesn't score a run or over value a single that occurs with the bases loaded.  We want them to be valued equally so we average to smooth everything out.

### Linear Weights (LWTS)

Group and average to get the net expected runs above average for each event.  Notice how Fielder's choice, generic outs, and strikeouts all have negative values.  Neither of these three events is likely to generate an improvement in baserunner position and all three increase the number of outs.  

The value of the homerun is more than 1 run because in addition to the one run guaranteed to score (the batter), there is the possibility of runners on base.

Note also that a strikeout is basically the same as a generic out.  This seems counter-intuitive since the strikeout means the ball isn't put in play.  What about the breakdown of event types suggests that this isn't as strange as it might seem?  For example, for a batter with 2 strikes, what could potentially happen if the ball is put into play versus a strikeout?

In [None]:
def to_weights_table(wts):
    data = []
    for row in wts.rows:
        data.extend([row[0], row[1]])
    return Table().with_columns(data)

lwts = re24.group('Event_Type', np.mean)
lwts.relabel('RE24 mean', 'RE24')
lwts = to_weights_table(lwts)
lwts

_Question_
+ Why is an intentional walk worth less than a regular walk?  Think about who is receiving an intentional walk.

### wOBA

wOBA weights are obtained by first subtracting out the value of a generic out and then scaling the weights by what is known as the "wOBA Scale".  The scaling is for interpretive purposes to make wOBA appear in a similar range to OBP.

The wOBA scale from FanGraphs for 2001 is 1.182.  Using that, we can compute the wOBA weights.  For interpretive purposes, we then scale by the wOBA scale factor.

In [None]:
woba_scale = 1.182
woba_cols = ['BB', 'HBP', '1B', '2B', '3B', 'HR']
out_val = lwts['Generic out'].item()
def woba_wt(x):
    return woba_scale * (x - out_val)

wOBA_weights = table_apply(lwts.select(*woba_cols), woba_wt)
wOBA_weights

### Barry Bonds Example

Let's return to Barry Bonds' monumental 2001 season and use his batting line to estimate his Batting Runs/wRAA and wOBA values.

In [None]:
barry_bonds = Table().with_columns({
    '1B': 49,
    '2B': 32,
    '3B': 2,
    'HR': 73,
    'BB': 177,
    'IBB': 35,
    'HBP': 9,
    'Generic out': 320, # Added generic out and strikeout since they are basically the same value
    'PA': 664
})
barry_bonds

#### Bonds Batting Runs/wRAA
Using LWTS, we compute Bonds' Batting Runs value.  We do this by multipling the LWTS by his event counts and summing.  Batting Runs is sometimes also called wRAA (Runs Above Average).  Bonds' Batting Runs was 121 runs above average.  Bonds produced an obscene 121 runs above an average player.  

Average players are nothing to sneeze at.  We are not talking about some Triple A call-up.  We are talking about a perfectly fine and capable MLB player.  And Bonds' produced 121 more runs above average than that player.  

We previously estimated 10 runs was worth 1 win in expectation.  This suggests Bonds was worth somewhere around 12 wins above the average player.  Baseball Reference estimates Bonds worth about 9.9 wins above the average player using their own methodology.  Either way, as we should suspect, Bonds' offensive contributions are other worldly. 

In [None]:
bonds_wraa_vals = Table().with_columns({
    col: lwts[col] * barry_bonds[col]
    for col in set(lwts.labels).intersection(barry_bonds.labels)
})
bonds_wraa = np.sum([bonds_wraa_vals[col] for col in bonds_wraa_vals.labels])
print(f"Barry Bonds 2001 Batting Runs/wRAA: {bonds_wraa:.3f}")
bonds_wraa_vals

_Questions_

+ For a hitter, what is the difference between Total RE24 and Batting Runs/wRAA?  Which one uses the raw, observed values of RE24 and which one replaces those observed, situation depdendent values with average values?  
+ What does it mean for a hitter to have a higher RE24 value than BR/wRAA?  What about lower, as in Bonds' case?


#### Bonds wOBA
Bonds' wOBA value is estimated to be 0.545.  Since we use the wOBA scale to place wOBA on a similar scale to OBP, we can use the fact as follows.

Ignoring the yearly variation in OBP, the highest single season OBP was Bonds in 2004 at 0.610. Number 5 all time was Babe Ruth at 0.545.  In terms of wOBA, Bonds in 2001 is situated relative to the rest of the league like a hitter with a 0.545 OBP would be relative to the rest of the league.  And a 0.545 OBP would be good for Top 5 all time seasons.

In [None]:
labels = set(wOBA_weights.labels).intersection(barry_bonds.labels)
bonds_woba_vals = Table().with_columns({
    label: wOBA_weights[label] * barry_bonds[label]
    for label in labels
})
bonds_woba = np.sum(
    [bonds_woba_vals[label] for label in bonds_woba_vals.labels]) / barry_bonds['PA'][0]
print(f"Barry Bonds 2001 wOBA: {bonds_woba:.3f}")
bonds_woba_vals