Ottawa 67's Project Rebound Classifier

Description: 

We first need to import the required dependencies and will do this below.

In [None]:
from Rebounds import classify_rebounds as cr
import pandas as pd
pd.set_option('display.max_columns', 500)


# Data Exploration

We are working with a dataset given from the Ottawa 67's Hockey club and will run some data manipulation and visualization techniques on it to get a better understanding of what we have. First I will import it into a pandas dataframe and run some basic description functions. Below is some additional information about the dataset we are working with.

<img src="Rebounds/67sDataInfo/all.png" alt="Drawing" style="width: 800px;"/>

In [None]:
shot_df = pd.read_csv('Ottawa67sShotData.csv')
shot_df.head()

Above we get a look at the first 5 rows of the dataset. We will delve deeper into some of these features later on in the notebook.

In [None]:
shot_df.describe(include='all')

Right away we notice some interesting things. There is a total of 6621 shot attempts in our dataframe. We can see that there are 58 unique game dates implying that out dataset contains shots from 58 games. What is interesting and something we will explore later on in this notebook is that there are 2315 shot attempts that created a rebound and only 283 shots after a rebound. Another useful row is the "top" row. We can see that, as expected the Ottawa 67's have the most shot attempts (as this is data from only their games). Also that most shot attempts occur on 5-5, coming from forwards, after having O-zone pressure and being taken from the perimeter of the ice. These are interesting insights that we will use later on in our analysis.

## Rebounds
We will now take a look at a set of rebounds as determined from the original dataset. 

In [None]:
rebound_df = shot_df[(shot_df['after rebound'] =='yes')]
rebounds_df = rebound_df.reset_index(drop=True)
print('Number of rebounds:', len(rebounds_df))

In [None]:
rebounds_plot = rebounds_df.plot(y = 'xG', use_index=True, style='o', )
rebounds_plot.set_xlabel("Individual Rebound Shots");
rebounds_plot.set_ylabel("Expected Goal");

A plot showing that the majority of rebounds lie within an expected goal range of 0% and 30%. It may be interesting to inspect the specific shots that have a greater than 30% expected goal rate. 

In [None]:
rebounds_df_over30 = rebound_df[rebound_df['xG'] >= 0.3]
rebounds_plot = rebounds_df_over30.plot(y = 'xG', use_index=True, style='o', )
rebounds_plot.set_xlabel("Individual Rebound Shots");
rebounds_plot.set_ylabel("Expected Goal");


In [None]:
rebounds_df_over30.describe(include='all')

From the table above, we are looking at the rebound attempts with the highest expected goal percentage. It is pretty expected to see the shot location is HIGH, and the shot is generated of extended o-zone pressure. It is also interessting to note that, although the strength doesnt effect the number of rebound attempts, it does however effect the expected goal percentage. Most of the higher expected goals come on the powerplay, as expected.

Below we can see the area on the ice that the rebound attempts are mostly generated. For reference, here is a breakdown of the ice into the specific categories: 
<img src="Rebounds/67sDataInfo/iceBreakdown.png" alt="Drawing" style="width: 200px;"/>

In [None]:
num_df = rebounds_df.replace('yes', 1)
num_df['location'].value_counts()

Comparing the rebound data to all shot data, we notice that the expected goal "mean" value is more than doubled. 

In [None]:
print('Rebound data: ', num_df['xG'].describe().loc['mean']*100, '%', 
      '\n', 'All shots:', shot_df['xG'].describe().loc['mean']*100, '%')

We also notice that the percentages for rebounds attempts on the powerplay and penalty kill are similar to the percentages for total shot attempts on the powerplay and penalty kill. This tells us that there is no increase in rebounds at different strengths.

In [None]:
counts_reb = num_df[['PP', 'PK']].describe().loc['count'];

print('Rebounds Attempts\nTotal:', len(num_df), '\nPP:', counts_reb['PP'], '\nPK:', counts_reb['PK'],
      '\nPercentages\nPP:', (counts_reb['PP']/len(num_df))*100, '%\nPK:', 
      (counts_reb['PK']/len(num_df))*100, '%')


In [None]:
counts_all = shot_df[['PP', 'PK']].describe().loc['count'];

print('Rebounds Attempts\nTotal:', len(shot_df), '\nPP:', counts_all['PP'], '\nPK:', counts_all['PK'],
      '\nPercentages\nPP:', (counts_all['PP']/len(shot_df))*100, '%\nPK:', 
      (counts_all['PK']/len(shot_df))*100, '%')

## Creating Rebounds
We will now explore the label "created rebound" included in the original dataset. 

In [None]:
created_rebound_df = shot_df[shot_df['created rebound'] == 'yes']
# created_rebound_df = created_rebound_df.reset_index(drop=True)
len(created_rebound_df)

In [None]:
both_df = created_rebound_df[created_rebound_df['after rebound'] == 'yes']
len(both_df)

We have a total of 2315 shot attempts that lead to a rebound. We wish to remove the shots that both came from a rebound and created one, as this would only occur in a broken down play and can not be easily reproduced.

In [None]:
created_rebound_df = created_rebound_df.drop(list(both_df.index.values))

We now wish to inspect the rebounds created to determine which are capatalized on, and what location those initial shots come from. We will created two datasets, one of the successful rebound shots taken and one of the initial shots.

In [None]:
created_rebound_df.describe(include='all')

In [None]:
created_rebound_df['outcome'].value_counts()

In [None]:
shot_df_integer_time = shot_df
shot_df_integer_time['video_time_min'] = 0

#Logic for reformatting the time into integer type
for i in range(shot_df_integer_time.shape[0]-1):
    time = shot_df_integer_time.loc[i, 'video_time'].replace('.000000000','')
    time = time.split('days ')[1]
    time = time.split(':')
    time_min = int(time[0])*60*60 + int(time[1])*60 + int(time[2])
    shot_df_integer_time.loc[i, 'video_time_min'] = time_min

rebound_alg = cr.rebound_type(shot_df_integer_time, 'video_time_min')

In [None]:
print(len(rebound_alg[rebound_alg['reb'] == 1]), 'rebounds')