# Machine Learning Predictions for the 2019-2020 All-NBA Team (continuation of 2018-2019 predictor)

## Introduction
Every NBA season a panel of sportswriters and broadcasters throughout the United States and Canada vote on the All-NBA teams. The All-NBA teams acknowledge the best players in the NBA by position. There are 3 ranked teams (1st, 2nd, 3rd) each consisting of 2 guards, 2 forwards, and 1 center, combining for a total of 15 All-NBA players. Not only is it a huge honor to be selected to an All-NBA team, but it is also important for contract negotiations as players can only be paid certain amounts if they achieve All-NBA accolades for a certain number of years.

Due to COVID-19, the 2019-20 season was suspended in March before resuming in late July. The uncertain future of the NBA season at the time led the league to vote for the All-NBA teams at the time the NBA was suspended. Due to the restart, the All-NBA team won't be released until after the season ends, which will subsequently be in September instead of June. 

**The goal of this notebook is to build a neural network to predict what the 2020 All-NBA team will be. Statistics of all NBA players over the last 20 years are  scraped alongside All-NBA team info. A neural net is then trained on this data. Stats from the NBA season up until COVID-19 (approx. 60-63 games depending on the team) will be scaled to a full 82 game season and used as test data for the neural network.**

The result... a prediction of what the 1st, 2nd, and 3rd All-NBA teams will look like when released at the end of the season.

## TLDR See link for full NBA ML project presentation https://docs.google.com/presentation/d/1GAcQrv--O6522p817_GfwxO8wyjRE0g2agpECrs6I0A/edit?usp=sharing 

Collaborators: Ari Hirsch (Columbia University), Jordan Ramos (Columbia University, github: jordanpramos)

**The above presentation reports the most accurate and relevant version of results. If you run the notebook right now the web scraper will include data from the NBA bubble, which the All-NBA team voting did not take into account**

### To Note:

**You must use pip to install basketball_reference_web_scraper otherwise none of this will work. Install it using the following command: pip install basketball_reference_web_scraper** 

Note: I originally had an issue installing it which was solved by running the command like this:
pip install basketball_reference_web_scraper --ignore-installed

**Please run each cell in order!! The first cell takes a while to run since it is retrieving a lot of data. Just wait for it to finish.**

## Table of Contents

[Data Acquisition and Structuring](#data_retrieval)
<a href='#data_retrieval'></a>

[Neural Network Training](#neural_network)
<a href='#neural_network'></a>

[Performance of Model](#results)
<a href='#results'></a>

## Data Acquisition and Structuring
<a id='data_retrieval'></a>

I added in some calculated stats like field goal percentage and points. Also, if a player was moved midseason, I combined their totals from each team.

In [1]:
from basketball_reference_web_scraper import client
import collections
import copy
import pandas as pd

statsToAdd = ['games_played', 'games_started', 'minutes_played', 'made_field_goals', 'attempted_field_goals', 'made_three_point_field_goals', 'attempted_three_point_field_goals', 'made_free_throws', 'attempted_free_throws', 'offensive_rebounds', 'defensive_rebounds', 'assists', 'steals', 'blocks', 'turnovers', 'personal_fouls']
statDict = {}
for year in range(2001, 2021):
    yearPlayers = client.players_season_totals(season_end_year=year)
    playerIdList = [player['slug'] for player in yearPlayers]
    tradedPlayersIdList = [playerId for playerId, count in collections.Counter(playerIdList).items() if count > 1]
    for tradedPlayerId in tradedPlayersIdList:
        playerStatsList = []
        toDelete = []
        totalPlayerStats = {'slug': '', 'name': '', 'positions': [], 'age': 0, 'team': 0, 'games_played': 0, 'games_started': 0, 'minutes_played': 0, 'made_field_goals': 0, 'attempted_field_goals': 0, 'made_three_point_field_goals': 0, 'attempted_three_point_field_goals': 0, 'made_free_throws': 0, 'attempted_free_throws': 0, 'offensive_rebounds': 0, 'defensive_rebounds': 0, 'assists': 0, 'steals': 0, 'blocks': 0, 'turnovers': 0, 'personal_fouls': 0}
        for idx, player in enumerate(yearPlayers):
            if tradedPlayerId == player['slug']:
                playerStatsList.append(player)
                toDelete.append(idx)
        for i in toDelete:
            yearPlayers.pop(toDelete[0])
        for playerTeamStats in playerStatsList:
            if totalPlayerStats['slug'] == '':
                totalPlayerStats['slug'] = playerTeamStats['slug']
            if totalPlayerStats['name'] == '':
                totalPlayerStats['name'] = playerTeamStats['name']
            if totalPlayerStats['positions'] == []:
                totalPlayerStats['positions'] = playerTeamStats['positions']
            if totalPlayerStats['age'] == 0:
                totalPlayerStats['age'] = playerTeamStats['age']
            if totalPlayerStats['team'] == 0:
                totalPlayerStats['team'] = playerTeamStats['team']
            for statName in statsToAdd:
                totalPlayerStats[statName] = totalPlayerStats[statName] + playerTeamStats[statName]
        yearPlayers.append(totalPlayerStats)
        
        for player in yearPlayers:
            #center = 1, forward = 2, guard = 3
            positionstring = str(player['positions'])
            if 'CENTER' in positionstring: # 8, -1
                 player['positions'] = 1
            elif 'FORWARD' in positionstring:
                player['positions'] = 2
            elif 'GUARD' in positionstring:
                player['positions'] = 3
                
            if(player['attempted_field_goals'] != 0):
                player['effective_field_goal_percentage'] = (player['made_field_goals'] + (.5 * player['made_three_point_field_goals'])) / player['attempted_field_goals']
            else:
                player['effective_field_goal_percentage'] = 0
            player['total_points'] = (player['made_three_point_field_goals'] * 3) + ((player['made_field_goals'] - player['made_three_point_field_goals']) * 2) + (player['made_free_throws']) #not necessary?? 
            if(player['attempted_free_throws'] != 0):
                player['free_throw_percentage'] = player['made_free_throws'] / player['attempted_free_throws']
            else:
                player['free_throw_percentage'] = 0
    statDict[year] = yearPlayers

In [4]:
for year in range(2001, 2021):
    for game in client.season_schedule(season_end_year=year):
        for player in statDict[year]:
            if "team_games_played" in player and player["team_games_played"] >= 82:
                break
            if player["team"] == game["home_team"] or player["team"] == game["away_team"]:
                if "team_games_played" in player:
                    player["team_games_played"] += 1
                else:
                    player["team_games_played"] = 1
            if game["home_team_score"] > game["away_team_score"]:
                if player["team"] == game["home_team"]:
                    if "wins" in player:
                        player["wins"] += 1
                    else:
                        player["wins"] = 1
            else:
                if player["team"] == game["away_team"]:
                    if "wins" in player:
                        player["wins"] += 1
                    else:
                        player["wins"] = 1

A dictionary containing all player stats from 2001 is returned and the most recent season (2020) is removed. This will serve as our training data. Our testing data will be the dictionary of players and their stats for just the 2020 season.

Function to scale stats to an 82 game season.

In [5]:
currentStatDict = copy.deepcopy(statDict[2020])

totalgames = 82
for player in currentStatDict:
    rescalefact = totalgames/player['team_games_played']
    
    #print(rescalefact)
    for key, value in player.items():
        if key != 'age' and key != 'positions' and isinstance(value, int) == True:                
            value = round(value * rescalefact)
            player[key] = value

statDict.pop(2020) 
;

''

### Using beautiful soup to get a nice clean list of the all-nba players since 2000.

In [6]:
from bs4 import BeautifulSoup
import urllib.request
import collections
import re
import bs4
import lxml

url = 'https://www.basketball-reference.com/awards/all_league.html'

soup = BeautifulSoup()

textList = []
allNbaTeamDict = {}
with urllib.request.urlopen(url) as ef:
    soup = BeautifulSoup(ef)
    textList = soup.find('table').get_text().splitlines()[15:]

    for line in textList:
        year = line[:7]
        formattedYear = year[:2] + year[5:]
        if formattedYear == '1900':
            formattedYear = '2001'
        if year == '1999-00':
            break
        if formattedYear not in allNbaTeamDict:
            allNbaTeamDict[formattedYear] = []
        wordList = line[13:].split()
        if len(wordList) == 0:
            continue
        playerTuple = (wordList[2][0],"%s %s" % (wordList[0], wordList[1]))
        allNbaTeamDict[formattedYear].append(playerTuple)
        playerTuple = (wordList[4][0],"%s %s" % (wordList[2][1:], wordList[3]))
        allNbaTeamDict[formattedYear].append(playerTuple)
        playerTuple = (wordList[6][0],"%s %s" % (wordList[4][1:], wordList[5]))
        allNbaTeamDict[formattedYear].append(playerTuple)
        playerTuple = (wordList[8][0],"%s %s" % (wordList[6][1:], wordList[7]))
        allNbaTeamDict[formattedYear].append(playerTuple)
        playerTuple = (wordList[10][0],"%s %s" % (wordList[8][1:], wordList[9]))
        allNbaTeamDict[formattedYear].append(playerTuple)
    allNbaTeamDict.pop('')


Adding the all_nba_type back into our stat dictionary:

In [8]:
relevantCenterData = []
relevantForwardData = []
relevantGuardData = []

for year in allNbaTeamDict:
    for position, playerName in allNbaTeamDict[str(year)]:
        if position == 'C':
            for player in statDict[int(year)]:
                if player['name'] == playerName:
                    relevantCenterData.append(player)
        elif position == 'F':
            for player in statDict[int(year)]:
                if player['name'] == playerName:
                    relevantForwardData.append(player)
        elif position == 'G':
            for player in statDict[int(year)]:
                if player['name'] == playerName:
                    relevantGuardData.append(player)


#### Create 2 dataframes: one for historical data (2000-2019) and one for the current season to be predicted (2020). 

In [11]:
# 1 = All-NBA center, 2 = All-NBA forward, 3 = All-NBA guard, 0 = regular player
for playerSeason in relevantCenterData:
    for year in statDict:
        for player in statDict[int(year)]:
            if player == playerSeason:
                player['all_nba_type'] = 1 #if player in statdict is in allNbaTeamDict, update all_nba_type value
for playerSeason in relevantForwardData:
    for year in statDict:
        for player in statDict[int(year)]:
            if player == playerSeason:
                player['all_nba_type'] = 2
for playerSeason in relevantGuardData:
    for year in statDict:
        for player in statDict[int(year)]:
            if player == playerSeason:
                player['all_nba_type'] = 3
flattenedStats = []
for year in statDict:
        for player in statDict[int(year)]:
            if 'all_nba_type' not in player: #iterate through all players in stat dict, if they have no allnbatype make 0 
                player['all_nba_type'] = 0
            flattenedStats.append(player) #add all players to flattenedstats

historicalDf = pd.DataFrame.from_dict(flattenedStats) #df of all players with All NBA team type
currentDf = pd.DataFrame.from_dict(currentStatDict) #df of current year to be looked at 
historicalDf.describe() 

Unnamed: 0,positions,age,games_played,games_started,minutes_played,made_field_goals,attempted_field_goals,made_three_point_field_goals,attempted_three_point_field_goals,made_free_throws,...,blocks,turnovers,personal_fouls,points,effective_field_goal_percentage,total_points,free_throw_percentage,team_games_played,wins,all_nba_type
count,8874.0,8874.0,8874.0,8874.0,8874.0,8874.0,8874.0,8874.0,8874.0,8874.0,...,8874.0,8874.0,8874.0,7806.0,8874.0,8874.0,8874.0,8874.0,8874.0,8874.0
mean,2.194388,26.638833,53.266734,25.878972,1251.586996,192.703403,425.635903,36.345842,102.00462,94.231801,...,25.366464,71.631846,108.820261,532.802075,0.473686,515.984449,0.700573,80.338517,39.922696,0.050034
std,0.755586,4.322034,25.006668,29.1426,903.247901,173.202731,372.633601,48.948826,129.303639,106.098018,...,33.870831,62.680901,73.057498,483.166013,0.09967,473.007812,0.194316,2.635981,12.392311,0.34331
min,1.0,18.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,66.0,7.0,0.0
25%,2.0,23.0,33.0,1.0,412.0,47.25,112.0,0.0,2.0,18.0,...,4.0,20.0,44.0,130.0,0.442446,123.0,0.645161,81.0,31.0,0.0
50%,2.0,26.0,61.0,11.0,1183.5,151.0,339.5,12.0,39.5,59.0,...,13.0,57.0,107.0,421.0,0.483312,399.0,0.75,81.0,41.0,0.0
75%,3.0,30.0,75.0,52.0,1984.0,294.0,647.0,60.0,171.0,133.0,...,32.0,106.0,164.0,813.0,0.519927,783.0,0.81536,81.0,49.0,0.0
max,3.0,44.0,85.0,83.0,3485.0,978.0,2173.0,402.0,1028.0,756.0,...,307.0,464.0,344.0,2832.0,1.5,2832.0,1.0,82.0,72.0,3.0


In [12]:
currentDf.describe()

Unnamed: 0,positions,age,games_played,games_started,minutes_played,made_field_goals,attempted_field_goals,made_three_point_field_goals,attempted_three_point_field_goals,made_free_throws,...,steals,blocks,turnovers,personal_fouls,points,effective_field_goal_percentage,total_points,free_throw_percentage,team_games_played,wins
count,529.0,529.0,529.0,529.0,529.0,529.0,529.0,529.0,529.0,529.0,...,529.0,529.0,529.0,529.0,469.0,529.0,529.0,529.0,529.0,529.0
mean,2.26276,25.561437,47.005671,22.204159,1075.383743,181.482042,394.99811,54.124764,151.502836,79.162571,...,34.075614,21.680529,62.028355,92.400756,505.614072,0.50713,496.328922,0.708333,82.0,39.922495
std,0.749339,4.119487,25.199093,26.788027,802.789282,167.366546,358.713439,60.581741,159.383009,96.400443,...,29.879524,26.862127,60.409032,66.193258,476.186528,0.107781,467.831451,0.211754,0.0,12.233664
min,1.0,19.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,82.0,19.0
25%,2.0,22.0,24.0,0.0,315.0,38.0,83.0,4.0,17.0,11.0,...,9.0,4.0,15.0,34.0,98.0,0.475514,100.0,0.647619,82.0,28.0
50%,2.0,25.0,54.0,7.0,1020.0,137.0,303.0,34.0,101.0,46.0,...,28.0,12.0,48.0,92.0,381.0,0.518057,366.0,0.759036,82.0,38.0
75%,3.0,28.0,69.0,45.0,1786.0,288.0,621.0,86.0,244.0,106.0,...,53.0,29.0,91.0,146.0,803.0,0.557196,775.0,0.833333,82.0,50.0
max,3.0,43.0,90.0,87.0,2827.0,704.0,1552.0,309.0,864.0,709.0,...,147.0,201.0,354.0,308.0,2393.0,1.0,2393.0,1.0,82.0,61.0


Now I have all of the All-NBA players since year 1999-2000 season and their stats, along with whether they made an All-NBA team or not, and which team they made. Now to select relevant statistic:

In [13]:
#extract relevant stats of historical data
relevantStatsHistoricalDf = historicalDf[['wins', 'positions','free_throw_percentage', 'turnovers', 'games_played', 'games_started', 'minutes_played', 'made_field_goals','attempted_field_goals', 'made_three_point_field_goals', 'attempted_three_point_field_goals', 'made_free_throws', 'attempted_free_throws', 'assists', 'blocks', 'steals', 'total_points', 'offensive_rebounds', 'defensive_rebounds', 'effective_field_goal_percentage', 'all_nba_type']] 
relevantStatsHistoricalDf.head()
    

Unnamed: 0,wins,positions,free_throw_percentage,turnovers,games_played,games_started,minutes_played,made_field_goals,attempted_field_goals,made_three_point_field_goals,...,made_free_throws,attempted_free_throws,assists,blocks,steals,total_points,offensive_rebounds,defensive_rebounds,effective_field_goal_percentage,all_nba_type
0,23,3,0.758621,26,41,0,486,120,246,4,...,22,29,76,1,9,266,5,20,0.495935,0
1,40,3,0.583333,34,29,12,420,43,111,4,...,21,36,22,13,14,111,14,45,0.405405,0
2,22,2,0.834275,231,81,81,3241,604,1280,12,...,443,531,250,77,90,1663,175,560,0.476562,0
3,43,3,0.666667,25,26,0,227,18,56,4,...,12,18,36,0,16,52,0,25,0.357143,0
4,52,3,0.887755,204,82,82,3129,628,1309,202,...,348,392,374,20,124,1806,101,327,0.556914,0


In [15]:
#extract relevant stats of current data
relevantStatsCurrentDf = currentDf[['wins', 'positions','free_throw_percentage', 'turnovers', 'games_played', 'games_started', 'minutes_played', 'made_field_goals','attempted_field_goals', 'made_three_point_field_goals', 'attempted_three_point_field_goals', 'made_free_throws', 'attempted_free_throws', 'assists', 'blocks', 'steals', 'total_points', 'offensive_rebounds', 'defensive_rebounds', 'effective_field_goal_percentage']] 
relevantStatsCurrentDf.head()

Unnamed: 0,wins,positions,free_throw_percentage,turnovers,games_played,games_started,minutes_played,made_field_goals,attempted_field_goals,made_three_point_field_goals,attempted_three_point_field_goals,made_free_throws,attempted_free_throws,assists,blocks,steals,total_points,offensive_rebounds,defensive_rebounds,effective_field_goal_percentage
0,48,1,0.58209,94,63,63,1680,283,478,1,3,117,201,146,67,51,684,207,376,0.593096
1,52,2,0.691099,204,72,72,2417,440,790,2,14,264,382,368,93,82,1146,176,559,0.558228
2,37,1,0.827225,85,61,61,2026,452,916,70,181,182,221,149,100,42,1156,119,334,0.531526
3,52,2,0.0,1,2,0,13,1,2,0,0,0,0,0,0,0,2,2,1,0.5
4,34,3,0.675676,62,54,1,673,112,303,52,151,28,42,101,9,19,304,10,85,0.454887


## Neural Network Training
<a id='neural_network'></a>

Begin training the neural network to predict the All-NBA teams.

Split the data into train and test to evaluate performance of the neural net 

In [16]:
from sklearn.model_selection import train_test_split

y_col = 'all_nba_type' #to be removed from original Df and stored as targets
x_cols = list(relevantStatsHistoricalDf.columns.values)
x_cols.remove(y_col)

x = relevantStatsHistoricalDf[x_cols].values #all column values of df excluding allnbatype
y = relevantStatsHistoricalDf[y_col].values #corresponding targets

X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.2)


Normalize and preprocess the data!

In [17]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Set up a gridsearch for the MLPClassifier neural network. Then build and train the actual neural net. 

In [19]:
from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(max_iter=800)
parameter_space = {
    'hidden_layer_sizes': [(50,50,50)],
    'activation': ['tanh', 'relu'],
    'solver': [ 'sgd', 'adam'],
    'alpha': [0.0001],
    'learning_rate_init': [.001],
    'learning_rate': ['constant','adaptive'],
}

from sklearn.model_selection import GridSearchCV
clf = GridSearchCV(mlp, parameter_space, n_jobs=-1, cv=3)
clf.fit(X_train, Y_train)

print('Best parameters found:\n', clf.best_params_)

Best parameters found:
 {'activation': 'tanh', 'alpha': 0.0001, 'hidden_layer_sizes': (50, 50, 50), 'learning_rate': 'constant', 'learning_rate_init': 0.001, 'solver': 'adam'}


## Performance of Model
<a id='results'></a>

In [20]:
#prepare 2020 data
predictions = clf.predict(X_test)

from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(Y_test,predictions))
print(classification_report(Y_test,predictions))

train_score = clf.score(X_train, Y_train)
test_score = clf.score(X_test, Y_test)
print("train_score: ", train_score)
print("test_score: ", test_score)


[[1730    3    3    1]
 [   4    3    1    0]
 [   3    0    8    2]
 [   7    0    0   10]]
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      1737
           1       0.50      0.38      0.43         8
           2       0.67      0.62      0.64        13
           3       0.77      0.59      0.67        17

   micro avg       0.99      0.99      0.99      1775
   macro avg       0.73      0.64      0.68      1775
weighted avg       0.99      0.99      0.99      1775

train_score:  0.9984504859839414
test_score:  0.9864788732394366


In [21]:
curr_cols = list(relevantStatsCurrentDf.columns.values)
curr_test = relevantStatsCurrentDf[x_cols].values
curr_test = scaler.transform(curr_test) #scale testing data
allnbateam = clf.predict(curr_test)


Raw predictions of All-NBA teams unordered (can be more or less than 15)

In [30]:
print("Predicted team")
for player in range(len(allnbateam)):
    if allnbateam[player] == 1:
        print("center: ", currentDf.iloc[player]['name'])
    elif allnbateam[player] == 2:
        print("forward: ", currentDf.iloc[player]['name'])
    elif allnbateam[player] == 3: 
        print("guard: ", currentDf.iloc[player]['name'])
        


Predicted team
forward:  Giannis Antetokounmpo
guard:  Bradley Beal
forward:  Jimmy Butler
forward:  Anthony Davis
guard:  Luka Dončić
center:  Rudy Gobert
guard:  James Harden
guard:  LeBron James
center:  Nikola Jokić
forward:  Kawhi Leonard
guard:  Damian Lillard
guard:  Kyle Lowry
center:  Domantas Sabonis
forward:  Pascal Siakam
forward:  Jayson Tatum
guard:  Russell Westbrook
guard:  Trae Young


### Results 
Since the neural network is retrained each time the notebook is run it is unlikely to get the exact same results each time. While the predicted players seem likely candidates for the All-NBA team the validity of this prediction is still unknown due the fact that the actual All-NBA team is unreleased. That being said the 2019 neural network was able to classify the team with 87% (13/15) accuracy after minor positional adjustments.  

An interesting aspect of the neural network is how it categorizes players positionally. Lebron James is a perfect example of a player who changed roles between 2019 and 2020 and through statistical patterns one can see that he changed from an All-NBA forward to an All-NBA guard https://github.com/aratzan/NBA-Machine-Learning/blob/master/2020-Neural-Net-All-NBA-Team-Predictor.ipynb. 

Lastly, the classification report above essentially shows the neural network is extremely good at predicting if a player will not be an All-NBA team player and still very good at predicting if they will be. 

Below is a script to take the top 3 centers, top 5 forwards, and top 5 guards to generate 3 five man All-NBA teams. 

In [31]:
import numpy as np

allprobs = clf.predict_proba(curr_test)
#add index as 5th column of matrix 
idx = range(0, allprobs.shape[0])
print(allprobs.shape)
idx = np.asarray(idx)
idx = np.reshape(idx, (529, 1))
allprobs = np.hstack((allprobs, idx))

allcenters = allprobs[allprobs[:,1].argsort()]
c = 1
while c < 4:
    c_index = int(allcenters[-c][4])
    print("center: ", currentDf.iloc[c_index]['name'])
    c+=1

allforwards = allprobs[allprobs[:,2].argsort()]
f = 1
while f < 7:
    f_index = int(allforwards[-f][4])
    print("forward: ", currentDf.iloc[f_index]['name'])
    f+=1

allguards = allprobs[allprobs[:,3].argsort()]
g = 1
while g < 7:
    g_index = int(allguards[-g][4])
    print("guard: ", currentDf.iloc[g_index]['name'])
    g+=1
    


(529, 4)
center:  Rudy Gobert
center:  Nikola Jokić
center:  Domantas Sabonis
forward:  Giannis Antetokounmpo
forward:  Kawhi Leonard
forward:  Jimmy Butler
forward:  Pascal Siakam
forward:  Jayson Tatum
forward:  Anthony Davis
guard:  LeBron James
guard:  Damian Lillard
guard:  James Harden
guard:  Kyle Lowry
guard:  Trae Young
guard:  Luka Dončić


See notebook on feature importance for further analysis: https://github.com/aratzan/NBA-Machine-Learning/blob/master/final_project_code_RandomForest_2020_FeatureImportance.ipynb

### References:

https://pypi.org/project/basketball-reference-web-scraper/

https://www.basketball-reference.com/

https://scikit-learn.org/stable/

http://ataspinar.com/2017/05/26/classification-with-scikit-learn/