# [Sports Connector] The Hot Hand

# Due: [MM/DD/YY]
This notebook focuses on the "hot hand" phenomenon and its interactions with probability. Using data from the Minnesota Timberwolves versus the Denver Nuggets game from December 10, 2008, the notebook will emphasize the manipulation of data in order to observe patterns in athletes with alleged "hot hands."

Author: Catherine Han

# Topics Covered
0 - The Data


1 - Section 1: Exploring the Phenomenon


2 - Section 2: Playing with Probability


3 - Section 3: Visualizing Results

In [21]:
# Import Statements
from datascience import *
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

<img src="https://pixel.nymag.com/imgs/daily/science/2016/08/12/12-hot-hands.w710.h473.2x.gif" style="height:400px;">

# The Data

We have provided two `.csv` files, which stands for comma-separated-value files that contain the information for play-by-play data and a mapping of player ID to player name. This information can be found in `PlayByPlayDec102008.csv` and `Player_Map.csv`

In the cell below, the two tables are loaded and stored in the names `play_by_play` and `player_map`, respectively.

In [20]:
# Load in the play-by-play data using the Table class's read_table method
play_by_play = Table.read_table('PlayByPlayDec102008.csv')

# Load in the player identification data using the Table class's read_table method
player_map = Table.read_table('Player_Map.csv')

# Preview what the resulting tables look like
play_by_play.show(5)
player_map.show(5)

Game_id,Season,Season_Type,Game_No,Playoff_Rd,Playoff_Rd_Game_No,Date,Home_Team_id,Home_Tm,Visitor_Team_id,Away_Tm,Period,Event_Num,Wall_Clock_Time,Play_Clock_Time,Team_Committing_Action,Person1,Person2,Person3,Home_PTS,Visitor_PTS,X_Location,Y_Location,Description,Rebound_Designation,Shot_Value,Shot_Outcome,Shot_Side_of_Ct,Shot_Distance,General_Description,Player1,Player2,Player3,Player4,Player5,Player6,Player7,Player8,Player9,Player10
20800314,2008,Regular,314,0,0,12/10/2008,1610612761,TOR,1610612754,IND,1,1,19.2373,720,0,2547,1725,2551,0,0,0,0,(12:00) Jump Ball Bosh vs Nesterovic,,0,,,,Jump Ball,979,1725,2211,2547,2551,2574,2605,101122,101181,200081
20800314,2008,Regular,314,0,0,12/10/2008,1610612761,TOR,1610612754,IND,1,2,19.2461,696,1610612754,2211,979,2532,0,0,0,0,(11:36)[IND] Murphy Foul:Offensive (1 PF),,0,,,,Foul: Offensive,979,1725,2211,2547,2551,2574,2605,101122,101181,200081
20800314,2008,Regular,314,0,0,12/10/2008,1610612761,TOR,1610612754,IND,1,3,19.2461,696,1610612754,2211,0,2532,0,0,0,0,(11:36)[IND] Murphy Turnover:Foul (1 TO),,0,,,,Turnover: Foul,979,1725,2211,2547,2551,2574,2605,101122,101181,200081
20800314,2008,Regular,314,0,0,12/10/2008,1610612761,TOR,1610612754,IND,1,4,19.2553,677,1610612761,979,0,0,0,0,-109,15,(11:17)[TOR] O'Neal Turnaround Jump Shot: Missed,,2,0.0,L,08_16,Turnaround Jump: Missed,979,1725,2211,2547,2551,2574,2605,101122,101181,200081
20800314,2008,Regular,314,0,0,12/10/2008,1610612761,TOR,1610612754,IND,1,5,19.2557,676,1610612761,2211,0,0,0,0,0,0,(11:16)[IND] Murphy Rebound (Off:0 Def:1),Defensive,0,,,,Rebound,979,1725,2211,2547,2551,2574,2605,101122,101181,200081


Player_id,SV_Player_id,Name
920,,A.C. Green
2062,,A.J. Guyton
201166,4304.0,Aaron Brooks
201189,4326.0,Aaron Gray
243,,Aaron McKie


Let's keep our eye on the players who we suspect might have the hot hand. Our metric for determining a player who is on the rise will be arbitrarily defined as someone who scores a nontrivial amount of consecutive points.

**Question 1:** Given the tables above, write the code in order to assign `players_of_interest` to a table so that only players who are prospectively "hot-handed" are included with their name, ID, and total consecutive points remain as entries.

In [6]:
# Your code here
players_of_interest = ...

**Question 2:** With the `players_of_interest` table you created, manipulate the data to determine which player had the most consecutive baskets for this game, and assign it to the name `most_consecutive`!

In [2]:
# Your code here
most_consecutive = ...

# Section 1: Exploring the Phenomenon
There has been much debate over whether or not the hot hand really exists; initially regarded as a fallacy, the "hot hand" is the phenomenon that a person that experiences successes at a random event will have a greater chance of success in additional tries at the same event.

Initially introduced as fallacious by Vallone, Gillovich, and Tversky in their 1985 <a>paper</a>, the "hot hand" phenomenon was revisited by Bocskocsky, Ezekowitz, and Stein, who claim that their findings demonstrate statistical significance of the hot hand effect in basketball.

Through this notebook, we'll be exploring the sides to both arguments by data analysis and basic probability.

## Probability Recap

**Question 3:** What is the sample set of 4 attempts at shooting a basketball if we use $B$ to represent making a basket and $M$ to represent missing a basket?

In [19]:
# Your answer here
sample_set = [...]

**Question 4:** If Carmelo Anthony shoots a basket with a 45% probability of making the basket, what is the probability that he makes 5 consecutive baskets?

In [18]:
# Your code here
five_consec = ...

**Question 5:** What type of probability distribution does this modeling of Carmelo Anthony's performance follow?

[Replace this block with your answer]

**Question 6:** Is it accurate and realistically representative of athletes to model them after the probability distribution mentioned above? Justify your answer.

[Replace this block with your answer]

# Section 2: Modeling the "Hot Hand"

## Randomly Generating Data

Let's model an individual's probability of making a basket as 50% and missing a basket as 50%.

Working with the variables $B$ for making a basket and $M$ for missing a basket, let's analyze the probability distribution for certain "streaks" of consecutive baskets, otherwise known as a "run" of baskets. We will define a streak as follows: $M$ is making no consecutive baskets, $BM$ is making one consecutive basket, $BBM$ is making two consecutive baskets, etc. In other words, we will define consecutive baskets as the <em>number of baskets we make before our first miss</em>.

For 500 trials at making a basket, create a table with the following columns: the number of consecutive baskets and the associated expected number of baskets, labeled `Length of Run of Baskets` and `Expected Number`, respectively.

In [9]:
# Your code here
longest_run_table = ...

**Question 7:** Given the table we assigned earlier, `longest_run_table`, create a line graph to visualize the trend for the longest run of 500 trials of shooting a basket.

In [10]:
# Your code here
# Should be a line graph

**Question 8:** Based upon the graph that we generated above, what does the data we generated imply about an athlete and the existence of the hot hand?

[Replace this block with your answer]

# Section 3: Visualizing Results

**This isn't consistent with the coordinates given in the play-by-play data, help!**

According to the NBA, the X and Y coordinates given in the play-by-play data refer to positions on the court such that the units are tenths of a foot; therefore, the sidelines are -250 and 250 because the court is 50 feet wide (and 94 feet long). Here's an example image of how the coordinate system works for a basketball court:


<img src="http://www.trbimg.com/img-57167be7/turbine/la-1461091353-snap-photo/767">

Given this coordinate mapping, the `play_by_play` data from earlier, and the player we determined to have the most potential for the "hot hand," create a helper table and a corresponding scatterplot for all of the attempted shots for our prospective player!

Use the following hex colors to assign colors to each shot's corresponding point value!
<ul>
    <li>**0:** #ffffff</li>
    <li>**1:** #cceeff</li>
    <li>**2:** #80d4ff</li>
    <li>**3:** #33bbff</li>
</ul>

**Question 8:** Create a scatterplot mapping with the color specifications above on attempted shots and their point value.

In [15]:
# Your code here
scatter_table = ...

**Question 9:** Observe the scatterplot that we generated; are there any associations that you notice between certain coordinate areas and the number of attempted shots? What about the success or the value of those shots?

[Replace this block with your answer]

**Question 10:** Given the probabilistic modeling and our observed results in the context of a single player, is the "hot hand" truly a fallacy? Use specific evidence from this notebook to support your answer.

[Replace this block with your answer]