<img src='john_isner.png' width='600'/>

# John Isner: Pride of America or One-Trick Pony?

There's nothing that we American tennis fans love more than cheering on USA's finest during the US Open tournament. But while Serena Williams always seems to find her way to the top of the women's bracket, the talent on the men's side of the game rarely makes the headlines.

The highest ranked men's singles player on the ATP tour right now is a fella named John Isner, currently ranked 10 in the world. John, or "Big John" as he is affectionately known as in the tennis community, is a monster on the tennis court. Standing at 6'10'', John is most well-known for having an incredibly powerful serve. He has recorded serve speeds of up to 149.9 miles per hour, which visibly intimidates opposing players, even at the professional level.

https://youtu.be/cgdTzXL86XM <--- Have a look at John smacking a few at this character

Even with John's success cracking into the elite top 10, many consider him to be a one-dimensional player, with his massive serve being his only true weapon. His most obvious weaknesses on the court are his movement and his backhand. John's matches tend to be boring and predictable - he wins his service games and loses his opponent's services games, without many long exchanges mixed in. Certain friends of mine claim that he would be nothing without his serve, and I myself have to admit to changing the channel during one (or a few) of Big John's matches, saying, "This goofy bastard will never win a major." But how is it that this alleged one-dimensional player remains at the top-end of PROFESSIONAL tennis and hasn't been "figured out" by these other top players?

This question has prompted me to dig a little deeper into John Isner's game, and more specifically his serve, to try and figure out what he's really made of.

The dataset used in this analysis was put together by Jeff Sackman / Tennis Abstract and is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The data includes ATP match-level data dating back to 1968. The full dataset can be found on GitHub at https://github.com/JeffSackmann/tennis_atp.

## Question #1: How good is this guy's serve?

We started by focusing in on some more recent data from 2017 to get a feel for how effective his serve actually is. This CSV file that we import contains data from each recorded ATP match in 2017.

Below is a snapshot of the original dataset.

In [1]:
import pandas as pd


#Opening the 2017 csv file 
directory = "C://Users//Cheney//Desktop//tennis_atp-master//atp_matches_"
df_2017 = pd.read_csv(directory+'2017.csv')

df_2017.head()

Unnamed: 0,tourney_id,tourney_name,surface,draw_size,tourney_level,tourney_date,match_num,winner_id,winner_seed,winner_entry,...,w_bpFaced,l_ace,l_df,l_svpt,l_1stIn,l_1stWon,l_2ndWon,l_SvGms,l_bpSaved,l_bpFaced
0,2017-M020,Brisbane,Hard,32,A,20170102,300,105777,7.0,,...,7.0,4.0,0.0,69.0,49.0,36.0,9.0,12.0,2.0,5.0
1,2017-M020,Brisbane,Hard,32,A,20170102,299,105777,7.0,,...,0.0,4.0,3.0,61.0,28.0,24.0,16.0,10.0,2.0,4.0
2,2017-M020,Brisbane,Hard,32,A,20170102,298,105453,3.0,,...,5.0,9.0,2.0,61.0,37.0,27.0,10.0,10.0,0.0,2.0
3,2017-M020,Brisbane,Hard,32,A,20170102,297,105683,1.0,,...,7.0,4.0,0.0,84.0,61.0,39.0,14.0,14.0,2.0,4.0
4,2017-M020,Brisbane,Hard,32,A,20170102,296,105777,7.0,,...,14.0,6.0,5.0,82.0,37.0,29.0,24.0,14.0,4.0,7.0


We selected a view of the original dataset that includes only match data where John Isner was a player. After than, there was some necessary reformatting that had to be done in order to be able to work with his individual stats for each match. 

In [2]:
#Creating object for Isner's unique ID
isner_id = 104545

#Locating all matches for which John Isner was listed as either the winner or loser. Combining these matches into a new dataframe
isner_df_2017 = pd.concat([df_2017[df_2017.winner_id == isner_id] , df_2017[df_2017.loser_id == isner_id]])

#Splitting the data into winners and losers so that we can find John Isner in each match and isolate his stats
winners = isner_df_2017[['tourney_id', 'tourney_name', 'surface', 'draw_size', 'tourney_level',
       'tourney_date', 'match_num', 'winner_id', 'winner_seed', 'winner_entry',
       'winner_name', 'winner_hand', 'winner_ht', 'winner_ioc', 'winner_age',
       'winner_rank', 'winner_rank_points', 'score', 'best_of',
       'round', 'minutes', 'w_ace', 'w_df', 'w_svpt', 'w_1stIn', 'w_1stWon',
       'w_2ndWon', 'w_SvGms', 'w_bpSaved', 'w_bpFaced']]

winners['wl'] = 'w'

losers = isner_df_2017[['tourney_id', 'tourney_name', 'surface', 'draw_size', 'tourney_level',
       'tourney_date', 'match_num', 'loser_id', 'loser_seed',
       'loser_entry', 'loser_name', 'loser_hand', 'loser_ht', 'loser_ioc',
       'loser_age', 'loser_rank', 'loser_rank_points', 'score', 'best_of',
       'round', 'minutes', 'l_ace', 'l_df',
       'l_svpt', 'l_1stIn', 'l_1stWon', 'l_2ndWon', 'l_SvGms', 'l_bpSaved',
       'l_bpFaced']]

losers['wl'] = 'l'

#Creating consistent column names for both the "winners" and "losers" dataframes so that they can be combined
columns = ['tourney_id', 'tourney_name', 'surface', 'draw_size', 'tourney_level',
       'tourney_date', 'match_num', 'player_id', 'player_seed',
       'player_entry', 'player_name', 'player_hand', 'player_ht', 'player_ioc',
       'player_age', 'player_rank', 'player_rank_points', 'score', 'best_of',
       'round', 'minutes', 'aces', 'double_faults',
       'service_pts', '1st_serve_in', '1st_won', '2nd_won', 'service_gms', 'bp_saved',
       'bp_faced', 'wl']
winners.columns = columns
losers.columns = columns

#Combining the winners and losers
new_df = pd.concat([winners,losers])

#Only data from John Isner's performance
new_new_df = new_df[new_df.player_id == isner_id]

#Reseting index
match_data_2017 = new_new_df.reset_index(drop=True)
match_data_2017.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,tourney_id,tourney_name,surface,draw_size,tourney_level,tourney_date,match_num,player_id,player_seed,player_entry,...,aces,double_faults,service_pts,1st_serve_in,1st_won,2nd_won,service_gms,bp_saved,bp_faced,wl
0,2017-0301,Auckland,Hard,32,A,20170109,286,104545,2.0,,...,22.0,0.0,87.0,68.0,55.0,13.0,15.0,4.0,5.0,w
1,2017-580,Australian Open,Hard,128,G,20170116,104,104545,19.0,,...,33.0,6.0,111.0,79.0,67.0,20.0,19.0,2.0,2.0,w
2,2017-0402,Memphis,Hard,32,A,20170213,286,104545,2.0,,...,26.0,1.0,80.0,56.0,47.0,14.0,15.0,3.0,4.0,w
3,2017-M006,Indian Wells Masters,Hard,128,M,20170306,256,104545,20.0,,...,23.0,0.0,70.0,54.0,47.0,10.0,12.0,0.0,0.0,w
4,2017-M007,Miami Masters,Hard,128,M,20170320,267,104545,18.0,,...,13.0,3.0,69.0,38.0,33.0,19.0,12.0,0.0,1.0,w


At this point, we had isolated his match statistics and were ready to take a look at his serving, but there are some missing values found in the "service_gms" field. We were able to do some online research and fill these missing data points with the correct values.

In [3]:
#Filling some missing data points
match_data_2017.loc[27,'service_gms']=18
match_data_2017.loc[10,'service_gms']=21            
match_data_2017.loc[47,'service_gms']=22
match_data_2017.loc[50,'service_gms']=28
match_data_2017.loc[54,'service_gms']=16

Below we show our calculations for John's average hold percentage. 

In [4]:
#Creating new column showing serve games lost
match_data_2017['service_gms_lost'] = match_data_2017['bp_faced'] - match_data_2017['bp_saved']

#Creating column for hold percentage
match_data_2017['hold_pct'] = 1-(match_data_2017.service_gms_lost/match_data_2017.service_gms)

#NaN showed up for isntances when there were zero service games lost. Hold percentage would be 1 in these cases.
match_data_2017['hold_pct'].fillna(0, inplace=True)

#Averaged hold percentage for all 2017 matches
print('2017 Average Hold Percentage: ', (sum(match_data_2017.hold_pct)/len(match_data_2017))*100,'%')


2017 Average Hold Percentage:  87.6291363345 %


#### An average hold percentage of 87.63%. But what does this mean? It means that over the course of the 2017 season, Big John won 87.63% of his service games. Sounds pretty impressive, but begs the question, "How does this compare to his competition?"

### (See part II for more answers)