## NBA Statistics Friday Project

##### This project's purpose is to see if there's a way to use statistical methods to identify the best players for a few NBA teams based on their offensive style. This can later evolve to identifying diamonds in the rough players.  

##### Teams for consideration: San Antonio Spurs, Houston Rockets, Golden State Warriors

In [4]:
# !pip install sklearn

In [5]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

### Dataset Selection

In [11]:
import pandas as pd

playergamedata = pd.read_csv('/Users/dereklee/ml-projects/nba_player_selection/201819_nbaplayergamedata.csv')

playergamedata.columns.values

array(['Rk', 'Player', 'Player ID', 'Age', 'Tm', 'G', 'GS', 'MP', 'FG',
       'FGA', 'FG%', '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT',
       'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV',
       'PF', 'PTS'], dtype=object)

In [7]:
playergamedata

Unnamed: 0,Rk,Player,Player ID,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Álex Abrines,abrinal01,25,OKC,31,2,19.0,1.8,5.1,...,0.923,0.2,1.4,1.5,0.6,0.5,0.2,0.5,1.7,5.3
1,2,Quincy Acy,acyqu01,28,PHO,10,0,12.3,0.4,1.8,...,0.700,0.3,2.2,2.5,0.8,0.1,0.4,0.4,2.4,1.7
2,3,Jaylen Adams,adamsja01,22,ATL,34,1,12.6,1.1,3.2,...,0.778,0.3,1.4,1.8,1.9,0.4,0.1,0.8,1.3,3.2
3,4,Steven Adams,adamsst01,25,OKC,80,80,33.4,6.0,10.1,...,0.500,4.9,4.6,9.5,1.6,1.5,1.0,1.7,2.6,13.9
4,5,Bam Adebayo,adebaba01,21,MIA,82,28,23.3,3.4,5.9,...,0.735,2.0,5.3,7.3,2.2,0.9,0.8,1.5,2.5,8.9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
703,528,Tyler Zeller,zellety01,29,MEM,4,1,20.5,4.0,7.0,...,0.778,2.3,2.3,4.5,0.8,0.3,0.8,1.0,4.0,11.5
704,529,Ante Žižić,zizican01,22,CLE,59,25,18.3,3.1,5.6,...,0.705,1.8,3.6,5.4,0.9,0.2,0.4,1.0,1.9,7.8
705,530,Ivica Zubac,zubaciv01,21,TOT,59,37,17.6,3.6,6.4,...,0.802,1.9,4.2,6.1,1.1,0.2,0.9,1.2,2.3,8.9
706,530,Ivica Zubac,zubaciv01,21,LAL,33,12,15.6,3.4,5.8,...,0.864,1.6,3.3,4.9,0.8,0.1,0.8,1.0,2.2,8.5


### Web Scraping method of dataset #2 (Spurs season by season data)

In [16]:
import requests
import json
import pickle

from urllib.request import urlopen
from bs4 import BeautifulSoup

url = "https://www.basketball-reference.com/teams/SAS/stats_basic_totals.html"

html = urlopen(url)

soup = BeautifulSoup(html)


In [17]:
# gets column headers
soup.findAll('tr', limit=2)

[<tr>
 <th aria-label="If listed as single number, the year the season ended.★ - Indicates All-Star for league.Only on regular season tables." class="poptip sort_default_asc center" data-stat="season" data-tip="If listed as single number, the year the season ended.&lt;br&gt;★ - Indicates All-Star for league.&lt;br&gt;Only on regular season tables." scope="col">Season</th>
 <th aria-label="League" class="poptip sort_default_asc left" data-stat="lg_id" data-tip="League" scope="col">Lg</th>
 <th aria-label="Team" class="poptip sort_default_asc left" data-stat="team_id" data-tip="Team" scope="col">Tm</th>
 <th aria-label="Wins" class="poptip right" data-stat="wins" data-tip="Wins" scope="col">W</th>
 <th aria-label="Losses" class="poptip right" data-stat="losses" data-tip="Losses" scope="col">L</th>
 <th aria-label="Regular season finish (within division, if applicable)" class="poptip sort_default_asc right" data-stat="rank_team" data-tip="Regular season finish (within division, if applica

In [20]:
# extract text we need into a list
headers = [th.getText() for th in soup.findAll('tr', limit = 2)[0].findAll('th')]

In [21]:
# exclude first column from analysis
headers = headers[1:]
headers

['Lg',
 'Tm',
 'W',
 'L',
 'Finish',
 '\xa0',
 'Age',
 'Ht.',
 'Wt.',
 '\xa0',
 'G',
 'MP',
 'FG',
 'FGA',
 'FG%',
 '3P',
 '3PA',
 '3P%',
 '2P',
 '2PA',
 '2P%',
 'FT',
 'FTA',
 'FT%',
 'ORB',
 'DRB',
 'TRB',
 'AST',
 'STL',
 'BLK',
 'TOV',
 'PF',
 'PTS']

In [32]:
# use [1:] to exclude the first header row (2 to exclude 1st 2 rows)

rows = soup.findAll('tr')[1:]

In [34]:
season_stats = [[td.getText() for td in rows[i].findAll('td')] 
               for i in range(len(rows))]

In [35]:
spurs_season_data = pd.DataFrame(season_stats, columns = headers)
spurs_season_data.head(10)

Unnamed: 0,Lg,Tm,W,L,Finish,Unnamed: 6,Age,Ht.,Wt.,Unnamed: 10,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,NBA,SAS,24,31,4,,28.1,6-6,212,,...,0.805,501,1981,2482,1346,383,301,679,1044,6226
1,NBA,SAS,48,34,2,,28.8,6-6,218,,...,0.819,757,2910,3667,2013,501,386,992,1487,9156
2,NBA,SAS,47,35,3,,29.3,6-6,214,,...,0.772,849,2777,3626,1868,628,460,1078,1408,8424
3,NBA,SAS,61,21,1,,29.6,6-7,222,,...,0.797,821,2777,3598,1954,655,484,1101,1498,8637
4,NBA,SAS,67,15,1,,30.3,6-7,223,,...,0.803,770,2831,3601,2010,677,485,1071,1433,8490
5,NBA,SAS,55,27,3,,29.8,6-6,215,,...,0.78,806,2772,3578,2000,657,444,1146,1564,8461
6,NBA,SAS,62,20,1,,28.9,6-6,213,,...,0.785,762,2786,3548,2064,604,420,1180,1495,8639
7,NBA,SAS,58,24,1,,28.6,6-6,217,,...,0.791,666,2721,3387,2058,695,446,1206,1427,8448
8,NBA,SAS,50,16,1,,27.5,6-6,221,,...,0.748,683,2153,2836,1528,490,293,895,1143,6841
9,NBA,SAS,61,21,1,,28.8,6-6,217,,...,0.767,829,2603,3432,1836,602,372,1101,1556,8502


##### Based on the game data, we get some stats on scoring, scoring percentages, some defensive metrics (defensive rebounds, steals, blocks).

Glossary: https://www.basketball-reference.com/about/glossary.html#site_menu_link

Found a few articles about some offensive strategies:
https://www.goldenstateofmind.com/2018/3/13/17108248/nba-2018-golden-state-warriors-visualizing-offense-the-top-five-offenses-statistics-houston-rockets (Talks about top 5 offenses of 2017-18 season and how they're all different)

https://www.youtube.com/watch?v=bRxMdABA1qQ (YouTube video of Spurs offensive strategy)

Since my favorite team is the San Antonio Spurs, we can first try to understand their offensive strategy. Historically, they've looked for selfless players and hinge their offense around passing the ball constantly to collapse defenses and get the ball to the player with the best shot opportunity. Looks like we'll have to find another dataset to append to this one with number of passes in a game. The Spurs offense especially relies on a Big (Center or Power Forward) who is able to pass well, which enables the Big to pass from the perimeter and draw their defender outside of the 'paint.'

Article about current Spurs offense: https://www.nba.com/article/2018/09/29/one-team-one-stat-san-antonio-spurs-shooting

Video on current Spurs offense: https://www.youtube.com/watch?v=QHhva5XY_9c

### Research:

It seems like the Spurs still stick to their core principles: 1) passing a lot and 2) spacing the offense well

For the purposes of this project, I'll be focusing on offensive improvements.

Some issues identified in the articles above. I'll do some of my own too.

Home/Road splits: perhaps this is up to Spurs to track how well their players are sleeping when subject to the rigors of traveling.

#### Research Question 1: What have been the best Spurs teams over the years from our dataset?

Play with data to identify what Spurs are lagging compared to previous years of offense?  
Will need team-level data over the years

## Data Analysis

In [10]:
playergamedata.columns.values

array(['Rk', 'Player', 'Player ID', 'Age', 'Tm', 'G', 'GS', 'MP', 'FG',
       'FGA', 'FG%', '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT',
       'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV',
       'PF', 'PTS'], dtype=object)

Based on the data analysis, it seems like the Spurs are 

In [None]:
# Position-by-position analysis based on offensive holes
