# Glossary

- <a href='#intro'><b>1 Introduction</b></a>
- <a href='#importing'><b>2 Importing and installing dependencies</b></a>
- <a href='#game_data'><b>3 Game Data</b></a>
    - <a href='#null'><b>3.1 Checking for Null values</b></a>
        - <a href='#null_needs'>3.1.1 Checking whether or not we need the columns with missing values</a>
        - <a href='#null_drop'>3.1.2 Dropping unnecessary columns</a>
    - <a href='#types'><b>3.2 Checking column types</b></a>
        - <a href='#memory_usage'> 3.2.1 Changing column types for less memory usage</a>
    - <a href='#scatter_plot'><b>3.3 Plotting the games of the season</b></a>
- <a href='#video_footage_injury'><b>4 Video Footage Injury</b></a>
    - <a href='#video_footage_injury_null'><b>4.1 Checking for Null values</b></a>
    - <a href='#video_footage_injury_types'><b>4.2 Checking column types</b></a>
    - <a href='#video_footage_injury_season'><b>4.3 Plotting concussions by season and year</b></a>
        - <a href='#video_footage_injury_season_season'>4.3.1 Concussions by season and year</a>
        - <a href='#video_footage_injury_season_week'>4.3.2 Concussions by week and quarter</a>
- <a href='#video_review'><b>5 Video Review</b></a>
    - <a href='#video_review_null'><b>5.1 Checking for Null values</b></a>
        - <a href='#video_review_null_needs'>5.1.1 Checking whether or not we need the columns with missing values</a>
    - <a href='#video_review_types'><b>5.2 Checking column types</b></a>
        - <a href='#video_review_usage'> 5.2.1 Changing column types for less memory usage</a>
    - <a href='#video_review_plot'><b>5.3 Plotting concussions by category</b></a>

# <a id='intro'><b>1 Introduction:</b></a>

The National Football League is America's most popular sports league, comprised of 32 franchises that compete each year to win the Super Bowl, the world's biggest annual sporting event. Founded in 1920, the NFL developed the model for the successful modern sports league, including national and international distribution, extensive revenue sharing, competitive excellence, and strong franchises across the country.

The NFL is committed to advancing progress in the diagnosis, prevention and treatment of sports-related injuries. The NFL's ongoing health and safety efforts include support for independent medical research and engineering advancements and a commitment to look at anything and everything to protect players and make the game safer, including enhancements to medical protocols and improvements to how our game is taught and played.

As more is learned, the league evaluates and changes rules to evolve the game and try to improve protections for players. Since 2002 alone, the NFL has made 50 rules changes intended to eliminate potentially dangerous tactics and reduce the risk of injuries.

For more information about the NFL's health and safety efforts, please visit www.PlaySmartPlaySafe.com.

# <a id='importing'><b>2 Importing and installing dependencies:</b></a>

In [None]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import gc
import warnings

warnings.simplefilter(action='ignore', category=FutureWarning)
sns.set()

print('All dependencies installed')

# <a id='game_data'><b>3 Game Data:</b></a>

In [None]:
game_data = pd.read_csv('../input/game_data.csv', parse_dates=True)
print(game_data.shape)
game_data.head()

## <a id='null'><b>3.1 Checking for Null values</b></a>

In [None]:
np.sum(game_data.isnull())

### <a id='null_needs'>3.1.1 Checking whether or not we need the columns with missing values</a>

In [None]:
stadium_type = game_data['StadiumType'].value_counts()
turf = game_data['Turf'].value_counts()
game_weather = game_data['GameWeather'].value_counts()
temperature = game_data['Temperature'].value_counts()
outdoor_weather = game_data['OutdoorWeather'].value_counts

print(stadium_type, '\n', '-'*50, '\n', turf, '\n', '-'*50, '\n', game_weather, '\n', '-'*50, '\n', temperature,  '\n', '-'*50, '\n', outdoor_weather)

stadium_type = most of the values are repeated or labelled differently but mean the same. Most stadiums also are outdoors with a few with retractable roofs. The few that are missing can be searched easily. 
turf = same with stadium_type, most values are the same but labelled differently and can be searched easily. 
game_weather = around 15% of the values are missing in this column. This can be searched but will be more time-consuming. 
temperature = around 10% of the values are missing in this column, which seems a little weird since game_weather has more missing values. 
outdoor_weather = around 38% of the data is missing and may potentially not have a very big impact on the analysis of data.

<a id='null_drop'>3.1.2 Dropping unnecessary columns</a>

In [None]:
game_data = game_data.drop(columns=['OutdoorWeather', 'GameWeather'], axis=1)
game_data.info()

After dropping 2 columns and changing their types, the memory usage went down by 21+ KB

## <a id='types'><b>3.2 Checking column types</b></a>

In [None]:
game_data.info()

### <a id='memory_usage'> 3.2.1 Changing column types for less memory usage</a>

In [None]:
category_columns = ['Season_Type', 'StadiumType', 'Turf']
float_columns = ['Temperature']

game_data[category_columns] = game_data[category_columns].astype('category')
game_data[float_columns] = game_data[float_columns].astype(float)
date = pd.to_datetime(game_data['Game_Date'].str.split(expand=True)[0], format='%Y-%m-%d')

game_data.info()

## <a id='scatter_plot'><b>3.3 Plotting the games of the season</b></a>

In [None]:
plt.figure(figsize=(20, 9))

_ = sns.scatterplot(x='Home_Team', y='Visit_Team', hue='Week',data=game_data)
plt.xticks(rotation=90, fontsize=14)
plt.yticks(fontsize=15)
plt.xlabel('Visiting Team')
plt.ylabel('Home Team')

plt.show()

It seems that the dataset included for NFC and AFC which are divisions and not teams.

# <a id='video_footage_injury'><b>4 Video Footage Injury:</b></a>

In [None]:
video_footage_injury = pd.read_csv('../input/video_footage-injury.csv', parse_dates=True)
print(game_data.shape)
video_footage_injury.head()

## <a id='video_footage_injury_null'><b>4.1 Checking for Null values</b></a>

In [None]:
np.sum(video_footage_injury.isnull())

## <a id='video_footage_injury_types'><b>4.2 Checking column types</b></a>

In [None]:
video_footage_injury.info()

## <a id='video_footage_injury_season'><b>4.3 Plotting concussions by season and year</b></a>
### <a id='video_footage_injury_season_season'>4.3.1 Concussions by season and year</a>

In [None]:
plt.figure(figsize=(20, 7.5))

plt.subplot(1, 2, 1)
_ = sns.countplot(video_footage_injury['Type'])
plt.title('Concussions per season type:', fontsize=20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel('Type of season', fontsize=15)
plt.ylabel('Total amount of concussions', fontsize=15)

plt.subplot(1, 2, 2)
_ = sns.countplot(video_footage_injury['season'])
plt.title('Concussions per year:', fontsize=20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel('Year', fontsize=15)
plt.ylabel('Total amount of concussions', fontsize=15)

plt.tight_layout()
plt.show()

Clearly, more of the concussions occur during regular season rather than the pre-season and 2016 is showing a slight increase in concussions.

### <a id='video_footage_injury_season_week'>4.3.2 Concussions by week and quarter</a>

In [None]:
plt.figure(figsize=(20, 10))

plt.subplot(2, 1, 1)
_ = sns.stripplot(x='Week', y='Type', hue='Qtr', data=video_footage_injury)
plt.title('Concussions per week:', fontsize=20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel('Week', fontsize=15)
plt.ylabel('Season Type', fontsize=15)

plt.subplot(2, 1, 2)
_ = sns.barplot(x='Qtr', y='Week', data=video_footage_injury)
plt.title('Concussions per quarter:', fontsize=20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel('Week', fontsize=15)
plt.ylabel('Season Type', fontsize=15)

plt.tight_layout()
plt.show()

It seems that as the season gets in its final stages, concussions are more prevalent. This can be correlated to teams trying to make it to the playoffs rather than getting eliminated. Also, most concussions occur during the 4th quarter followed by the 2nd quarter. The 1st, 2nd, and 3rd quarters show almost an identical distribution.

# <a id='video_review'><b>5 Video Review:</b></a>

In [None]:
video_review = pd.read_csv('../input/video_review.csv')
print(video_review.shape)
video_review.head()

## <a id='video_review_null'><b>5.1 Checking for Null values</b></a>

In [None]:
print(np.sum(video_review.isnull()))
print('-'*60)
video_review.info()

### <a id='video_review_null_needs'>5.1.1 Checking whether or not we need the columns with missing values</a>

In [None]:
video_review = video_review.drop(columns=['Primary_Partner_GSISID'], axis=1)
video_review = video_review.dropna()
video_review.info()

Decided to drop the column 'Primary_Partner_GSISID' that was not relevant and drop 2 rows missing values.

## <a id='video_review_types'><b>5.2 Checking column types</b></a>

In [None]:
video_review.info()

### <a id='video_review_usage'> 5.2.1 Changing column types for less memory usage</a>

In [None]:
category_columns = ['Player_Activity_Derived', 'Turnover_Related', 'Primary_Impact_Type', 'Primary_Partner_Activity_Derived', 'Friendly_Fire']

video_review[category_columns] = video_review[category_columns].astype('category')
video_review.info()

Dropping one of the columns and changing the type of 5 columns, memory usage was brought down by 1.5KB.

## <a id='video_review_plot'><b>5.3 Plotting concussions by category</b></a>

In [None]:
plt.figure(figsize=(20, 7.5))

plt.subplot(1, 2, 1)
_ = sns.countplot(video_review['Player_Activity_Derived'])
plt.title('Concussions related incidents:', fontsize=20)
plt.xlabel('')
plt.ylabel('')
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)

plt.subplot(1, 2, 2)
_ = sns.countplot(video_review['Friendly_Fire'])
plt.title('Incidents from same team or opposing:', fontsize=20)
plt.xlabel('')
plt.ylabel('')
plt.xticks(['Opposing Team', 'Unclear', 'Same Team'], fontsize=15)
plt.yticks(fontsize=15)

plt.tight_layout()
plt.show()

Most concussions occur while tackling a player from the opposed team. Maybe new rules could be applied on the players that are tackling in order to lower the incedents rate.