# How has the NBA changed since 1950?

For this workshop we'll explore NBA data going all the way back to 1950! We'll explore the data and try to answer the question that was proposed to us which is, has the NBA changed since 1950 & does the college a player attends impact the success in the NBA.

First, we'll compare shooting percentages from 1950 & 2017. We'll structure the data we want then we'll visualize the data to see what the data is telling us.

After looking at percentages, we'll try to determine if the College that a player attended affects performance in NBA?

##### Resources:
- [Basketball Data Definitions](https://www.basketball-reference.com/about/glossary.html)

#### Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#### Import Data

In [None]:
nba = pd.read_csv("../../data/nba_1950/season_stats.csv")

#### Quick feel for the data

In [None]:
line_break = '='*60

# Look at head
print('{}\n{}\n{}\n'.format('HEAD', nba.head(5), line_break))

# Look at tail
print('{}\n{}\n{}\n'.format('TAIL', nba.head(5), line_break))

# Look at info
print('{}\n{}\n{}\n'.format('INFORMATION', nba.info(), line_break))

# Look at describe
print('{}\n{}\n{}\n'.format('DESCRIBE', nba.describe(), line_break))

# Shooting Percentage

In [None]:
# Use pandas .value_counts() to look at the distribution of TS%
shoot_percentage = nba['TS%'].value_counts()
shoot_percentage.head()

In [None]:
# Use pandas .hist() method to visualize distribution of TS%
nba['TS%'].hist(bins=20)

# Make it pretty
ax = plt.gca()
ax.set_title("True Shooting % Across all NBA Seasons")
ax.set_xlabel("True Shooting Percentage")
ax.set_ylabel("# of Players")

# "print" the plot
plt.show()

In [None]:
# Prep our data to answer the question asked by grabbing data from the year 1950 & 2017
first_year = nba['Year'].min()
last_year = nba['Year'].max()

# Print years to confirm
print('First year: {}\nLast year: {}'.format(first_year, last_year))

# Create new dataframes with records from specific years
nba_1950 = nba[nba['Year'] == first_year]
nba_2017 = nba[nba['Year'] == last_year]


In [None]:
# Histogram of Shooting Percentage in 1950
nba_1950['TS%'].hist(bins=20)

# Make it pretty
ax = plt.gca()
ax.set_title("True Shooting in 1950")
ax.set_xlabel("True Shooting Percentage")
ax.set_ylabel("# of Players")

# "print" the plot
plt.show()

In [None]:
# Histogram of Shooting Percentage in 2017
nba_2017['TS%'].hist(bins=20)

# Make it pretty
ax = plt.gca()
ax.set_title("True Shooting % in 2017")
ax.set_xlabel("True Shooting Percentage")
ax.set_ylabel("# of Players")

# "print" the plot
plt.show()

### Using only the data from the Histograms for both years... 

### What has happened to the True Shooting Percentage between 1950 and 2017?

In [None]:
# Provide your short answer here



# Does The College that a player attended affect performance in NBA?

#### Import player data

In [None]:
players = pd.read_csv('../../data/nba_1950/player_data.csv')

In [None]:
top_5_colleges = ['University of Kentucky',
                  'Duke Univeristy',
                  'University of Kansas',
                  'Syracuse University',
                  'University of California, Los Angeles']

In [None]:
# Lets take a look at college column of players
players['college'].value_counts().head(10)

In [None]:
# We'll use Pandas .isin() method to check college column against
in_top_5_colleges = players['college'].isin(top_5_colleges)

In [None]:
# index entire players df by "in_top_5"
players_in_top_5 = players[in_top_5_colleges]['name']
players_in_top_5.head(20)

In [None]:
# We'll use Pandas .isin() method to check players column in nba 
# against the players_in_top_5 series we just constructed
in_top_colleges = nba['Player'].isin(players_in_top_5)

In [None]:
# We will divide all of the rows of nba into 2 dataframes, 
# those players who attended the top colleges, and those who did not
top_college = nba[in_top_colleges]
bottom_college = nba[~in_top_colleges]

In [None]:
# Plot the Players NOT FROM Top 5
bottom_college['PER'].hist(bins=50) 

# Make it pretty
ax = plt.gca()
ax.set_title("PER for players NOT FROM Top 5 Colleges")
ax.set_xlabel("Player Efficiency Rating")
ax.set_ylabel("# of Players")

# "print" the plot
plt.show()

In [None]:
# Plot the Players FROM Top 5
top_college['PER'].hist(bins=50) 

# Make it pretty
ax = plt.gca()
ax.set_title("PER for players FROM Top 5 Colleges")
ax.set_xlabel("Player Efficiency Rating")
ax.set_ylabel("# of Players")

# "print" the plot
plt.show()

### What conclusions can we draw about these 2 groups?

In [None]:
# Provide your short answer here
# What could we do better? ...