## Exploring the 'Deep Breath' Effect. 

##### Are players more likely to sink the second free-throw after missing the first? 

### Start by loading and peeking at the data

In [3]:
import pandas as pd #importing pandas. Will be very helpful in cleaning up the data
import numpy as np

In [4]:
df=pd.read_csv("free_throws.csv") #Loading the data set

In [5]:
print(df.shape) #Dimensions of the dataframe
df.head(2) #Let's peek at the data. What exactly are we working with?

(618019, 11)


Unnamed: 0,end_result,game,game_id,period,play,player,playoffs,score,season,shot_made,time
0,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 1 of 2,Andrew Bynum,regular,0 - 1,2006 - 2007,1,11:45
1,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 2 of 2,Andrew Bynum,regular,0 - 2,2006 - 2007,1,11:45


### Thankfully, all the columns seem pretty self explanatory. The shape function tells us we're dealing with 618,019 free throws. That's a lot of data! Let's keep exploring the data.

In [6]:
df.isna().sum().sort_values(ascending=False) #see if there's missing data

time          0
shot_made     0
season        0
score         0
playoffs      0
player        0
play          0
period        0
game_id       0
game          0
end_result    0
dtype: int64

### Wow! No missing data. A true blessing. Let's see if there are free throws from and-ones, technicals, fouls on threes, or flagrants. Removing these and paring the dataframe down to just standard sets of two free-throws will make the analysis easier

In [7]:
print(sum(df.play.str.contains("of 1")))# sum counts the 'True' booleans. The output is the number of and-ones
print(sum(df.play.str.contains("of 3")))
print(sum(df.play.str.contains("technical")))
print(sum(df.play.str.contains("flagrant")))

56949
13953
17183
886


### Let's get rid of those. 

In [8]:
pairs=df.loc[df.play.str.contains("of 2")] #removing odd free-throw types
pairs=pairs.reset_index(drop=True) #Resetting the index will make things easier


In [9]:
####This runtime was bad.



#counter=0
#odd_indeces=[]
#i=0
#while i<pairs.shape[0]-1:
#    if ((pairs.loc[i].player != pairs.loc[i+1].player)& ("1 of" in pairs.loc[i].play)):
#        counter=counter+1
#        odd_indeces.append[i]
#        i=i+1
        

In [10]:
#loop through and make column for first or second free throw. Then loop through that column and make sure it alternates

In [11]:
pairs["First_Second"] = np.nan 
#we've added a column to tell us if its shot 1 or 2, but the values are empty for now. Let's fill those up

In [12]:
pairs.head()

Unnamed: 0,end_result,game,game_id,period,play,player,playoffs,score,season,shot_made,time,First_Second
0,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 1 of 2,Andrew Bynum,regular,0 - 1,2006 - 2007,1,11:45,
1,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 2 of 2,Andrew Bynum,regular,0 - 2,2006 - 2007,1,11:45,
2,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 1 of 2,Andrew Bynum,regular,18 - 12,2006 - 2007,1,7:26,
3,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum misses free throw 2 of 2,Andrew Bynum,regular,18 - 12,2006 - 2007,0,7:26,
4,106 - 114,PHX - LAL,261031013.0,1.0,Amare Stoudemire makes free throw 1 of 2,Amare Stoudemire,regular,33 - 20,2006 - 2007,1,3:15,


In [13]:
pairs.at[3,'First_Second']
pairs.at[1,'play'][-6]

'2'

In [14]:
for x in range(0, pairs.shape[0]):
    if pairs.at[x,'play'][-6]=='2':
        pairs.at[x,'First_Second']=2
    elif pairs.at[x,'play'][-6]=='1':
        pairs.at[x,'First_Second']=1

In [15]:
pairs.head()

Unnamed: 0,end_result,game,game_id,period,play,player,playoffs,score,season,shot_made,time,First_Second
0,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 1 of 2,Andrew Bynum,regular,0 - 1,2006 - 2007,1,11:45,1.0
1,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 2 of 2,Andrew Bynum,regular,0 - 2,2006 - 2007,1,11:45,2.0
2,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 1 of 2,Andrew Bynum,regular,18 - 12,2006 - 2007,1,7:26,1.0
3,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum misses free throw 2 of 2,Andrew Bynum,regular,18 - 12,2006 - 2007,0,7:26,2.0
4,106 - 114,PHX - LAL,261031013.0,1.0,Amare Stoudemire makes free throw 1 of 2,Amare Stoudemire,regular,33 - 20,2006 - 2007,1,3:15,1.0


In [16]:
#Redo later with one counter 
First_counter=0
Second_counter=0
bad_indeces=[]
print(pairs.shape[0])
for x in range(0,pairs.shape[0]):
    if pairs.at[x,'First_Second']==1:
        First_counter=First_counter+1
    elif pairs.at[x,'First_Second']==2:
        Second_counter=Second_counter+1
    else:bad_indeces.append(x)   
print(First_counter+Second_counter)
bad_indeces

528568
528566


[322620, 368621]

In [17]:
print(pairs.loc[322619:322621].play.to_frame())
print('\n')
print(pairs.loc[368620:368622].play.to_frame())


                                                     play
322619                   Ed Davis makes free throw 1 of 2
322620  Ed Davis makes free throw 2 of 2. Kris Humphri...
322621                Brook Lopez makes free throw 1 of 2


                                            play
368620  Brandon Jennings makes free throw 2 of 2
368621    Shane Battier makes 2 of 2 free throws
368622       Monta Ellis makes free throw 1 of 2


### Looks like there's a substitution in 322620. We can fix that. 368621 has both free throws in one row. We'll just get rid of that.

In [18]:
pairs.at[322620,'play']=pairs.at[322620,'play'][0:32] #Only keep the part of the play that has to do with the free throw
pairs.at[322620,'First_Second']=2 #Manually reformat the First_Second Column so we know it was a second shot
pairs.loc[322620].to_frame().transpose()  #looks good!

Unnamed: 0,end_result,game,game_id,period,play,player,playoffs,score,season,shot_made,time,First_Second
322620,100 - 107,TOR - BKN,400278000.0,4,Ed Davis makes free throw 2 of 2,Ed Davis,regular,81 - 84,2012 - 2013,1,10:27,2


In [19]:
print(pairs.loc[368620:368622].play)
pairs=pairs.drop(368621)
pairs=pairs.reset_index(drop=True)
print(pairs.loc[368620:368622].play) #All gone

368620    Brandon Jennings makes free throw 2 of 2
368621      Shane Battier makes 2 of 2 free throws
368622         Monta Ellis makes free throw 1 of 2
Name: play, dtype: object
368620    Brandon Jennings makes free throw 2 of 2
368621         Monta Ellis makes free throw 1 of 2
368622        Monta Ellis misses free throw 2 of 2
Name: play, dtype: object


In [20]:
First_counter=0
Second_counter=0
bad_indeces=[]
print(pairs.shape[0])
for x in range(0,pairs.shape[0]):
    if pairs.at[x,'First_Second']==1:
        First_counter=First_counter+1
    elif pairs.at[x,'First_Second']==2:
        Second_counter=Second_counter+1
    else:bad_indeces.append(x)   
print(First_counter+Second_counter)

528567
528567


### Now let's make sure the rest of the data is clean. We want to check if there are any other standalone free-throws

In [21]:
counter=0
bad_indeces=[]
for x in range(0,pairs.shape[0]-1):
    if(pairs.at[x,'First_Second']==pairs.at[x+1,'First_Second']):
        counter=counter+1
        bad_indeces.append(x)
        

In [22]:
bad_indeces[0:7]

[337, 660, 721, 774, 1098, 1367, 1385]

In [23]:
counter

1495

In [24]:
pairs.loc[1366:1369]

Unnamed: 0,end_result,game,game_id,period,play,player,playoffs,score,season,shot_made,time,First_Second
1366,88 - 81,CLE - SA,261103024.0,1.0,Tim Duncan misses free throw 2 of 2,Tim Duncan,regular,16 - 14,2006 - 2007,0,3:18,2.0
1367,88 - 81,CLE - SA,261103024.0,1.0,LeBron James makes free throw 1 of 2,LeBron James,regular,22 - 18,2006 - 2007,1,0:44,1.0
1368,88 - 81,CLE - SA,261103024.0,1.0,LeBron James makes free throw 1 of 2,LeBron James,regular,23 - 18,2006 - 2007,1,0:24,1.0
1369,88 - 81,CLE - SA,261103024.0,1.0,LeBron James misses free throw 2 of 2,LeBron James,regular,23 - 18,2006 - 2007,0,0:24,2.0


In [25]:
print(bad_indeces[50:55])
pairs.loc[13546:13549]

[12795, 13547, 13641, 13930, 14315]


Unnamed: 0,end_result,game,game_id,period,play,player,playoffs,score,season,shot_made,time,First_Second
13546,79 - 101,DAL - UTAH,261211026.0,4.0,Maurice Ager misses free throw 2 of 2,Maurice Ager,regular,75 - 97,2006 - 2007,0,2:17,2.0
13547,79 - 101,DAL - UTAH,261211026.0,4.0,Devean George makes free throw 1 of 2,Devean George,regular,76 - 97,2006 - 2007,1,2:01,1.0
13548,79 - 101,DAL - UTAH,261211026.0,4.0,Maurice Ager misses free throw 1 of 2,Maurice Ager,regular,78 - 99,2006 - 2007,0,0:38,1.0
13549,79 - 101,DAL - UTAH,261211026.0,4.0,Maurice Ager makes free throw 2 of 2,Maurice Ager,regular,79 - 99,2006 - 2007,1,0:38,2.0


In [26]:
pairs.loc[bad_indeces[0:10]]

Unnamed: 0,end_result,game,game_id,period,play,player,playoffs,score,season,shot_made,time,First_Second
337,92 - 102,TOR - NJ,261101017.0,1.0,Bostjan Nachbar makes free throw 2 of 2,Bostjan Nachbar,regular,16 - 18,2006 - 2007,1,4:16,2.0
660,118 - 117,NY - MEM,261101029.0,4.0,Jamal Crawford makes free throw 2 of 2,Jamal Crawford,regular,78 - 63,2006 - 2007,1,10:30,2.0
721,106 - 99,IND - CHA,261101030.0,2.0,Othella Harrington misses free throw 2 of 2,Othella Harrington,regular,37 - 42,2006 - 2007,0,4:24,2.0
774,97 - 91,SA - DAL,261102006.0,2.0,Dirk Nowitzki misses free throw 2 of 2,Dirk Nowitzki,regular,34 - 41,2006 - 2007,0,7:00,2.0
1098,89 - 102,POR - GS,261103009.0,3.0,Andris Biedrins misses free throw 1 of 2,Andris Biedrins,regular,60 - 70,2006 - 2007,0,1:07,1.0
1367,88 - 81,CLE - SA,261103024.0,1.0,LeBron James makes free throw 1 of 2,LeBron James,regular,22 - 18,2006 - 2007,1,0:44,1.0
1385,88 - 81,CLE - SA,261103024.0,3.0,Tony Parker makes free throw 2 of 2,Tony Parker,regular,49 - 41,2006 - 2007,1,8:22,2.0
1696,109 - 95,IND - NY,261104018.0,4.0,David Harrison makes free throw 2 of 2,David Harrison,regular,109 - 95,2006 - 2007,1,0:32,2.0
1836,88 - 92,CLE - CHA,261104030.0,1.0,LeBron James makes free throw 1 of 2,LeBron James,regular,28 - 18,2006 - 2007,1,0:13,1.0
2210,107 - 104,GS - DAL,261106006.0,4.0,Jason Richardson makes free throw 2 of 2,Jason Richardson,regular,101 - 97,2006 - 2007,1,4:01,2.0


In [27]:
#Testing removing flagrants

test=pairs
test=test[~test.play.str.contains("flagrant")]
print(sum(test.play.str.contains("flagrant")))
test=test.reset_index(drop=True)
counter=0
bad_indeces=[]
for x in range(0,test.shape[0]-1):
    if(test.at[x,'First_Second']==test.at[x+1,'First_Second']):
        counter=counter+1
        bad_indeces.append(x)
counter  



0


1068

In [28]:
#Most look like this
bad_indeces[0:5]
test[1362:1365]
test=test.drop(test.index[bad_indeces])
test=test.reset_index(drop=True)

In [29]:
counter=0
bad_indeces=[]
for x in range(0,test.shape[0]-1):
    if(test.at[x,'First_Second']==test.at[x+1,'First_Second']):
        counter=counter+1
        

In [30]:
counter

0

In [31]:
test.head()

Unnamed: 0,end_result,game,game_id,period,play,player,playoffs,score,season,shot_made,time,First_Second
0,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 1 of 2,Andrew Bynum,regular,0 - 1,2006 - 2007,1,11:45,1.0
1,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 2 of 2,Andrew Bynum,regular,0 - 2,2006 - 2007,1,11:45,2.0
2,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 1 of 2,Andrew Bynum,regular,18 - 12,2006 - 2007,1,7:26,1.0
3,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum misses free throw 2 of 2,Andrew Bynum,regular,18 - 12,2006 - 2007,0,7:26,2.0
4,106 - 114,PHX - LAL,261031013.0,1.0,Amare Stoudemire makes free throw 1 of 2,Amare Stoudemire,regular,33 - 20,2006 - 2007,1,3:15,1.0


In [32]:
#Slicing to make a frame for even and odd shots
shot1=test.iloc[::2]
shot1.shape
shot2=test.iloc[1::2]
shot2.shape

(263307, 12)

In [33]:
# make sure there are no second shots in the first shot dataframe
i=0
counter=0
while  i<52610:
    if shot1.at[i,'First_Second']==2:
        counter=counter+1
    i=i+2    


In [34]:
counter #it's good!

0

In [35]:
#reset indices so shot1 and shot 2 line up with the same index
shot1=shot1.reset_index(drop=True) 
shot2=shot2.reset_index(drop=True)

In [47]:
#sample to see if things line up
print(shot1.loc[67890].play)
print(shot2.loc[67890].play)
print(shot1.loc[56000].play)
print(shot2.loc[56000].play)
#nice they do

Anthony Carter makes free throw 1 of 2
Anthony Carter makes free throw 2 of 2
Joe Johnson makes free throw 1 of 2
Joe Johnson makes free throw 2 of 2


In [37]:
#count how many times people missed 1st and made 2nd
miss1make2=0
for x in range(0,shot1.shape[0]):
    if ((shot1.at[x,'shot_made']==0) & (shot2.at[x,'shot_made']==1)):
        miss1make2=miss1make2+1

#count how many times people missed 1st       
miss1=0
for x in range(0,shot1.shape[0]):
    if(shot1.at[x,'shot_made']==0):
        miss1=miss1+1

In [38]:
miss1make2/miss1 #missed1 made 2nd percentage

0.7309641037384783

### NEXT:find 'expected' free throw percentage for second shots based on player names 

In [96]:

names=list(set(shot1.player)) #get a list of unique names
names.sort()
names[0:5]

['A.J. Price', 'Aaron Brooks', 'Aaron Gordon', 'Aaron Gray', 'Aaron Harrison']

In [97]:
names=pd.Series(names).astype(str) #convert list of names into a dataframe
names=names.to_frame()
names.columns=['player']

In [98]:
names.head()

Unnamed: 0,player
0,A.J. Price
1,Aaron Brooks
2,Aaron Gordon
3,Aaron Gray
4,Aaron Harrison


In [99]:
#let's add a free throw column
names["percentage"] = np.nan
names.head()

Unnamed: 0,player,percentage
0,A.J. Price,
1,Aaron Brooks,
2,Aaron Gordon,
3,Aaron Gray,
4,Aaron Harrison,


### We need to find everyone's shot percentage so we can get an expected free throw percentage

In [101]:
for x in range(0,len(names)):
    names.at[x,'percentage']=sum(test.play.str.contains(names.at[x,'player'])&test.shot_made==1)/sum(test.play.str.contains(names.at[x,'player']))
        

In [102]:
names[1:6]

Unnamed: 0,player,percentage
1,Aaron Brooks,0.82807
2,Aaron Gordon,0.687783
3,Aaron Gray,0.559211
4,Aaron Harrison,0.416667
5,Aaron Williams,0.805556


### Great! Now that we have everyone's free-throw percentage, we can go ahead and find the 'expected' percentage for second shots after missing the first. 

In [106]:
#This dataframe has their first and second shot in one row
bothshots=pd.concat([shot1,shot2.shot_made],axis=1)
bothshots.columns=['end_result','game','game_id','period','play','player','playoffs','score','season','shot_made','time','First_Second','shot_made_2']
bothshots.head()

Unnamed: 0,end_result,game,game_id,period,play,player,playoffs,score,season,shot_made,time,First_Second,shot_made_2
0,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 1 of 2,Andrew Bynum,regular,0 - 1,2006 - 2007,1,11:45,1.0,1
1,106 - 114,PHX - LAL,261031013.0,1.0,Andrew Bynum makes free throw 1 of 2,Andrew Bynum,regular,18 - 12,2006 - 2007,1,7:26,1.0,0
2,106 - 114,PHX - LAL,261031013.0,1.0,Amare Stoudemire makes free throw 1 of 2,Amare Stoudemire,regular,33 - 20,2006 - 2007,1,3:15,1.0,1
3,106 - 114,PHX - LAL,261031013.0,2.0,Leandro Barbosa misses free throw 1 of 2,Leandro Barbosa,regular,43 - 29,2006 - 2007,0,10:52,1.0,1
4,106 - 114,PHX - LAL,261031013.0,2.0,Lamar Odom makes free throw 1 of 2,Lamar Odom,regular,44 - 30,2006 - 2007,1,10:37,1.0,1


In [108]:
#Let's add a column for shot percentage to the bothshots dataframe by merging with the 'names' dataframe
bothshots=pd.merge(names,concattest,on='player')
bothshots.head()

Unnamed: 0,player,percentage,end_result,game,game_id,period,play,playoffs,score,season,shot_made,time,First_Second,shot_made_2
0,A.J. Price,0.761905,94 - 108,GS - IND,291111011.0,4.0,A.J. Price makes free throw 1 of 2,regular,75 - 92,2009 - 2010,1,9:14,1.0,0
1,A.J. Price,0.761905,110 - 98,ATL - IND,291226011.0,4.0,A.J. Price makes free throw 1 of 2,regular,108 - 96,2009 - 2010,1,0:40,1.0,1
2,A.J. Price,0.761905,80 - 114,IND - MIA,291227014.0,4.0,A.J. Price makes free throw 1 of 2,regular,64 - 103,2009 - 2010,1,9:38,1.0,1
3,A.J. Price,0.761905,90 - 97,ORL - IND,300105011.0,1.0,A.J. Price makes free throw 1 of 2,regular,19 - 21,2009 - 2010,1,1:25,1.0,1
4,A.J. Price,0.761905,102 - 108,IND - OKC,300109025.0,4.0,A.J. Price makes free throw 1 of 2,regular,93 - 96,2009 - 2010,1,3:11,1.0,1


In [109]:
counter=0
percentageAdder=0
for x in range(0,bothshots.shape[0]):
    if bothshots.at[x,'shot_made']==0:
        counter=counter+1
        percentageAdder=percentageAdder+bothshots.at[x,'percentage']
Expected_Percentage=percentageAdder/counter        
print(Expected_Percentage)


0.7226165347951603
69868
69868


### Here it is! The expected Free throw percentage for second shots after missing the first is 72.26% The Actual percentage is 73.09%