# 6C: Think Like a Bookie

In [None]:
# This code will load the R packages we will use
suppressPackageStartupMessages({
    library(coursekata)
})

#load the data 
nbagames2024 <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQLEAzUYdQUGYzniQ7XzPV1uzors-yUTGiZS5hP6cKfXNBdhCDa6JyrzGfbSln2PhUsichJXzlDhErj/pub?gid=617264864&single=true&output=csv")

#this code will filter for Los Angeles Lakers games - to use any other team, replace the team abbreviation (questions would need to be modified to make sense with the team's stats)

Lakers<- filter(nbagames2024, team_abbreviation == "LAL")

We’re going to look at a dataframe called `Lakers`. These are the game log data for the Los Angeles Lakers basketball team in the 2023-2024 season. 

- `season` year season stated
- `game_date` date game was played
- `team_id` ID for team 
- `team_location` team's home location 
- `team_name` team name
- `team_abbreviation` team abbreviation
- `team_home_away` whether the game was played on the team's home court or away
- `team_score` score
- `team_winner` whether the team won or lost
- `assists` team assists
- `blocks` team blocks
- `defensive_rebounds` team defensive rebounds 
- `fast_break_points` Points scored from a fast break, which is a quick offensive play to score before the other team's defense is ready
- `field_goal_pct` the percentage of 2 and 3 point shot made
- `field_goals_made` the total number of field goals (2 and 3 point shots) made
- `field_goals_attempted` the total number of field goals (2 and 3 point shots) attempted
- `flagrant_fouls` a personal foul that involves excessive or unnecessary contact with another player
- `fouls` total fouls 
- `free_throw_pct` percentage of free throws made
- `free_throws_made` number of free throws made
- `free_throws_attempted` number of free throws attempted
- `largest_lead` the largest lead the team had during any part of the game
- `offensive_rebounds` number of offensive rebounds
- `points_in_paint` points made points scored from within the key, or free-throw lane, on the basketball court
- `steals` number of steals (taking the ball away from the other team)
- `team_turnovers` number of turnovers counted against the team (as opposed to a particular player)
- `technical_fouls` unsportsmanlike conduct fouls 
- `three_point_field_goal_pct` percentage of 3-point shots made
- `three_point_field_goals_made` number of 3-point shots made
- `three_point_field_goals_attempted` number of 3-point shots attempted
- `total_rebounds` total rebounds
- `turnover_points` points scored on turnovers
- `turnovers` urnovers committed by players
- `opponent_team_id` id for opponent
- `opponent_team_location` home court location for opponent
- `opponent_team_name` opponent team name
- `opponent_team_abbreviation` opponent team abbreviation
- `opponent_team_score` opponent team score


Take a look at the `Lakers` data frame below. 

## 1.0 - The Data

1.1 - Could you look at Lebron James' number of field goals made using this data set? If you can, go ahead and look that up in this data frame. If you can't, explain why.

1.2 - If you had to predict whether the Lakers were going to win their next game based on this data set, would it be helpful to use the empty model of `Points`? Why or why not?

1.3 - If you were to predict the outcome of the next game that the Lakers play based on this data, what would you predict? 

## 2.0 - Exploring Variation in Point Spread

2.1 - Now let’s say you (and unfortunately everyone else) thinks that the Lakers will win the next game. That's not good from a bookie's point of view. Why?

2.2 - So instead of just betting win-lose, let's use a rudimentary point spread: how many more points will the 
Lakers score if they win?  In a good point spread (good from the bookie’s point of view), bets on points should roughly equally fall above and below the point spread. Does that idea sound kind of like the mean or median? 

2.3 - What would be the reason for using the mean instead of the median for the point spread?

2.4 - As baby bookies (we’re developing!), let's take a look at how many *more* points the Lakers score *when they win.* Explore the distribution of this with a visualization. What does it look like? Notice any weird things? Why does it look that way? 

Hint: You may wish to save just the winning games in a data frame called `WinningGames` and create a new variable called `MorePoints`.

## 3.0 - Creating a Model based on the Mean

3.1 - A word equation for a model based on the mean might be **MorePoints = Mean + Other Stuff**. Modify the GLM equation below to put in the mean for $b_0$:

$$Y_i = b_0 + e_i$$

3.2 - Add in the mean as a model of `MorePoints` into a histogram of `MorePoints` (color the mean blue). Gesture: Where would the residuals from the model be in the visualization you made? 

## 4.0 - Sum of Squares 

4.1 - This is the equation for the sum of squares (from the mean). 

$$\sum (Y_i - \bar{Y})^2$$

Using your hands/fingers, gesture on your visualization above each of the following components: $\bar{Y}$, $Y_i$ (pick one), $Y_i - \bar{Y}$ (pick one)--then make a square out of that distance. 

## 5.0 - Standard Deviation: A Measure of Error from the Mean

5.1 - Looking at the visualization, use your hands/fingers to gesture about how long is the longest residual? Also with your hands/fingers, gesture the shortest (absolute) residual. Gesture an "average" (or medium-sized) residual.

5.2 - Based on your gesture and the visualization alone, how big would the average residual be? 

5.3 - Now run `favstats()`. Would any of those statistics correspond roughly to the idea of an "average residual"? Does the number seem reasonable given your estimate in the previous question?

5.4 - Try to come up with a strategy for estimating standard deviation just by looking at a visualization. Then try it on the visualizations below. What do you think is the standard deviation of `team_score` just from looking at the histogram? What about `field_goals_made`?

In [None]:
gf_histogram(~ team_score, data = Lakers, color = "turquoise2", fill = "turquoise3")

In [None]:
gf_histogram(~ field_goals_made, data = MiamiHeat, color = "tomato", fill = "tomato2")