# Aplicação de Sistmea de Recomendação (Notebook 1)
## Projeto da disciplina **SCC0284  - Sistemas de Recomendação**
### Sistemas de Recomendação aplicado no modelo de Draft da NBA.

## Membros

* Arthur Santorum Lorenzetto - 12559465
* Gustavo Silva de Oliveira - 12567231

## Breve resumo:

Nosso objetivo é recomendar os 15 primeiros jogadores do colegial selecionados para o draft de 2021 dentre todos os jogadores do colegial que se alistaram para participar do draft desse ano. Para isso, iremos usar recomendação baseada em conteúdo, onde os nossos metadados serão o conjunto de todos os jogadores do colegial que estiveram entre os 15 primeiros selecionados para o draft do ano de 2009 até o ano de 2020. (Logo, iremos buscar nos jogadores de 2021 os mais semelhantes aos jogadores de 2009-2020) 

Nesse notebook iremos apenas organizar os dados de forma que possamos realizar a nossa recomendação.

In [43]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [44]:
# separando os dados que serão usados para o treinamento do nosso modelo (referentes aos jogadores de college do ano de 2021)
# dos dados que serão usados como metadados (referentes aos jogadores de college de 2009 até 2021)

df = pd.read_csv('./CollegeBasketballPlayers2009-2021.csv')
df_train = df.loc[(df.year == 2021)]
df = df.loc[(df.year!=2021)]

In [45]:
# escolhendo as variáveis que iremos utilizar para calcular nossas similaridades (optamos por utilizar apenas as estatísticas
# mais avançadas) (metadados)

df_advanced_stats = df[['player_name','GP', 'Min_per','Ortg', 'usg','eFG','TS_per', 'ORB_per', 'DRB_per', 'AST_per',
                       'TO_per', 'FT_per', 'twoP_per','TP_per','blk_per', 'stl_per']]

# agrupamos pelo nome do jogador e tiramos a média pra evitar nomes repetidos (nomes que se repetem no nosso dataframe se
# a jogadores que passaram mais de um ano no college)

df_advanced_stats = df_advanced_stats.groupby(['player_name']).mean().reset_index()
df_advanced_stats = df_advanced_stats.loc[(df.year!=2021)]

In [46]:
# escolhendo as variáveis que iremos utilizar para calcular nossas similaridades (optamos por utilizar apenas as estatísticas
# mais avançadas) (treinamento)

df_train_advanced_stats = df_train[['player_name','GP', 'Min_per','Ortg', 'usg','eFG','TS_per', 'ORB_per', 'DRB_per', 'AST_per',
                       'TO_per', 'FT_per', 'twoP_per','TP_per','blk_per', 'stl_per']]

df_train_advanced_stats = df_train_advanced_stats.groupby(['player_name']).mean().reset_index()
df_train_advanced_stats

Unnamed: 0,player_name,GP,Min_per,Ortg,usg,eFG,TS_per,ORB_per,DRB_per,AST_per,TO_per,FT_per,twoP_per,TP_per,blk_per,stl_per
0,A.J. Caldwell,24.0,80.2,103.1,11.4,52.7,51.94,1.6,14.9,13.7,17.2,0.333,0.366,0.400,1.3,2.1
1,A.J. Hoggard,26.0,30.6,78.3,17.9,32.7,36.83,1.3,14.6,26.8,26.5,0.600,0.351,0.167,1.5,1.5
2,A.J. Labriola,5.0,0.9,0.0,0.0,0.0,0.00,0.0,14.4,20.6,0.0,0.000,0.000,0.000,0.0,0.0
3,A.J. Lawson,17.0,73.3,91.4,26.0,47.8,49.64,5.3,14.9,33.9,27.7,0.561,0.475,0.324,0.4,1.4
4,A.J. McGinnis,28.0,35.2,104.1,18.5,51.6,51.95,0.9,8.1,5.7,12.4,0.571,0.559,0.336,0.3,1.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4934,Zion Young,30.0,46.2,98.0,20.5,45.8,49.72,2.8,14.2,5.3,13.5,0.830,0.324,0.353,0.4,1.1
4935,Zoar Nedd,4.0,0.8,0.0,0.0,0.0,0.00,0.0,41.1,0.0,0.0,0.000,0.000,0.000,0.0,0.0
4936,Zondrick Garrett,5.0,1.0,63.6,14.9,50.0,50.00,0.0,10.7,0.0,39.7,0.000,1.000,0.000,0.0,0.0
4937,Zool Kueth,18.0,51.0,114.7,14.4,55.9,56.67,5.8,9.2,4.8,12.6,0.700,0.558,0.373,4.3,0.5


In [47]:
# mapeando nossos dados de metadados pelo nome dos atletas

map_players = {user: idx for idx, user in enumerate(df_advanced_stats.player_name.unique())}
df_advanced_stats['player_name_id'] = df_advanced_stats['player_name'].map(map_players)
df_advanced_stats

Unnamed: 0,player_name,GP,Min_per,Ortg,usg,eFG,TS_per,ORB_per,DRB_per,AST_per,TO_per,FT_per,twoP_per,TP_per,blk_per,stl_per,player_name_id
0,A'Torey Everett,23.000000,19.000000,88.000000,15.700000,40.500000,50.080000,1.600000,8.100,14.500000,30.600000,0.783000,0.300000,0.571000,0.600000,0.300000,0
1,A'Torri Shine,27.000000,79.300000,94.100000,23.100000,45.400000,49.985000,4.000000,10.100,8.200000,18.100000,0.759500,0.448000,0.306000,0.650000,1.200000,1
2,A'Uston Calhoun,30.000000,83.800000,99.900000,25.500000,47.100000,51.500000,7.800000,16.800,5.700000,15.500000,0.796000,0.461000,0.352000,1.500000,0.700000,2
3,A'uston Calhoun,21.333333,45.333333,99.466667,24.633333,47.833333,50.153333,5.700000,11.300,3.533333,10.366667,0.643667,0.319333,0.322333,1.633333,0.933333,3
4,A.C. Reid,23.750000,39.950000,69.025000,16.700000,34.475000,35.722500,0.625000,9.725,11.900000,16.825000,0.541250,0.285000,0.236750,0.950000,1.375000,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23878,Zvonko Buljan,29.500000,68.100000,95.200000,27.600000,51.400000,54.550000,8.900000,28.850,17.950000,24.800000,0.713000,0.537500,0.307500,1.150000,1.750000,23878
23879,Zygis Sestokas,23.000000,30.900000,132.700000,11.300000,58.800000,59.150000,2.700000,6.400,6.800000,6.700000,0.750000,0.600000,0.391000,1.100000,0.900000,23879
23880,Zylan Cheatham,33.333333,61.500000,104.766667,21.700000,53.700000,58.666667,9.066667,22.000,13.866667,23.533333,0.692333,0.546333,0.263000,3.266667,1.866667,23880
23881,Zyon Dobbs,16.000000,8.000000,66.300000,13.700000,31.600000,31.350000,2.200000,5.900,7.700000,17.400000,0.286000,0.300000,0.222000,1.200000,2.400000,23881


In [48]:
# mapeando nossos dados de treinamento pelo nome dos atletas

map_train_players = {user: idx for idx, user in enumerate(df_train_advanced_stats.player_name.unique())}
df_train_advanced_stats['player_name_id'] = df_train_advanced_stats['player_name'].map(map_train_players)
df_train_advanced_stats

Unnamed: 0,player_name,GP,Min_per,Ortg,usg,eFG,TS_per,ORB_per,DRB_per,AST_per,TO_per,FT_per,twoP_per,TP_per,blk_per,stl_per,player_name_id
0,A.J. Caldwell,24.0,80.2,103.1,11.4,52.7,51.94,1.6,14.9,13.7,17.2,0.333,0.366,0.400,1.3,2.1,0
1,A.J. Hoggard,26.0,30.6,78.3,17.9,32.7,36.83,1.3,14.6,26.8,26.5,0.600,0.351,0.167,1.5,1.5,1
2,A.J. Labriola,5.0,0.9,0.0,0.0,0.0,0.00,0.0,14.4,20.6,0.0,0.000,0.000,0.000,0.0,0.0,2
3,A.J. Lawson,17.0,73.3,91.4,26.0,47.8,49.64,5.3,14.9,33.9,27.7,0.561,0.475,0.324,0.4,1.4,3
4,A.J. McGinnis,28.0,35.2,104.1,18.5,51.6,51.95,0.9,8.1,5.7,12.4,0.571,0.559,0.336,0.3,1.6,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4934,Zion Young,30.0,46.2,98.0,20.5,45.8,49.72,2.8,14.2,5.3,13.5,0.830,0.324,0.353,0.4,1.1,4934
4935,Zoar Nedd,4.0,0.8,0.0,0.0,0.0,0.00,0.0,41.1,0.0,0.0,0.000,0.000,0.000,0.0,0.0,4935
4936,Zondrick Garrett,5.0,1.0,63.6,14.9,50.0,50.00,0.0,10.7,0.0,39.7,0.000,1.000,0.000,0.0,0.0,4936
4937,Zool Kueth,18.0,51.0,114.7,14.4,55.9,56.67,5.8,9.2,4.8,12.6,0.700,0.558,0.373,4.3,0.5,4937


In [49]:
# este dataframe contém todos os jogadores que foram selecionados nos drafts dos anos de 2009 até 2021

drafted_players = pd.read_csv('DraftedPlayers2009-2021.csv')
drafted_players = drafted_players.drop([0])
drafted_players = drafted_players.loc[(drafted_players.YEAR!=2021)]
drafted_players = drafted_players[['PLAYER','OVERALL']]

In [50]:
# nessa etapa estamos selecionando apenas os jogadores que foram selecionados no TOP 15 do draft nos anos de 2009 ate 2021
# pois são com eles que iremos comparar nossos dados de treinamento.

drafted_players['player_name_id'] = drafted_players['PLAYER'].map(map_players)
drafted_players["OVERALL"] = pd.to_numeric(drafted_players["OVERALL"])
drafted_players.dtypes
lista = list(range(1, 16))

lottery_drafted_players = pd.DataFrame(columns=['player_name','Overall','player_name_id'])

for index, row in drafted_players.iterrows():
    if row['OVERALL'] in lista:
        line = {'player_name':row['PLAYER'], 'Overall':row['OVERALL'], 'player_name_id':row['player_name_id']}
        lottery_drafted_players = lottery_drafted_players.append(line, ignore_index=True)

lottery_drafted_players['player_name_id']

# os valores nan correspondem a jogadores que foram draftados mas nao passaram pelo COLLEGE, logo eles não
# nos interessam.

lottery_drafted_players['player_name_id'].isna().sum()
lottery_drafted_players = lottery_drafted_players.dropna(axis='rows')
lottery_drafted_players

Unnamed: 0,player_name,Overall,player_name_id
0,Anthony Edwards,1,1251.0
1,James Wiseman,2,10379.0
3,Patrick Williams,4,18302.0
4,Isaac Okoro,5,9195.0
5,Onyeka Okongwu,6,18025.0
...,...,...,...
175,Terrence Williams,11,21588.0
176,Gerald Henderson,12,8512.0
177,Tyler Hansbrough,13,22707.0
178,Earl Clark,14,7371.0


In [51]:
# funcao que retorna os quantis de uma certa faixa de valores, sera necessaria para que seja possivel binarizar nossos
# dados afim de realizar o calculo da correlação por cosseno futuramente

def faixa_de_valores(series):
    Q1 = (np.quantile(lista, .20))
    Q2 = (np.quantile(lista, .40))
    Q3 = (np.quantile(lista, .60))
    Q4 = (np.quantile(lista, .80))

    return Q1, Q2, Q3, Q4

In [52]:
# daqui para baixo estaremos binarizando todas as nossas variáveis selecionadas em 5 faixas de valores baseadas nos quantis
# calculados, nossa ideia é de calcular a semelhança entre os jogadores listados para o draft de 2021 com os jogadores que foram
# top15 nos drafts de 2009 a 2020 baseados nesses valores.

lista1 = df_train_advanced_stats['GP'].tolist()
lista2 = df_advanced_stats['GP'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

10.0 20.0 26.0 30.0


In [53]:
df_advanced_stats['GP (<10)'] = np.where(df_advanced_stats['GP'] < Q1, 1, 0)
df_advanced_stats['GP (>=10 e < 20)'] = np.where((df_advanced_stats['GP'] >= Q1) & (df_advanced_stats['GP'] < Q2), 1, 0)
df_advanced_stats['GP (>=20 e < 26)'] = np.where((df_advanced_stats['GP'] >= Q2) & (df_advanced_stats['GP'] < Q3), 1, 0)
df_advanced_stats['GP (>=26 e < 30)'] = np.where((df_advanced_stats['GP'] >= Q3) & (df_advanced_stats['GP'] < Q4), 1, 0)
df_advanced_stats['GP (>=30)'] = np.where((df_advanced_stats['GP'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['GP'])

df_train_advanced_stats['GP (<10)'] = np.where(df_train_advanced_stats['GP'] < Q1, 1, 0)
df_train_advanced_stats['GP (>=10 e < 20)'] = np.where((df_train_advanced_stats['GP'] >= Q1) & (df_train_advanced_stats['GP'] < Q2), 1, 0)
df_train_advanced_stats['GP (>=20 e < 26)'] = np.where((df_train_advanced_stats['GP'] >= Q2) & (df_train_advanced_stats['GP'] < Q3), 1, 0)
df_train_advanced_stats['GP (>=26 e < 30)'] = np.where((df_train_advanced_stats['GP'] >= Q3) & (df_train_advanced_stats['GP'] < Q4), 1, 0)
df_train_advanced_stats['GP (>=30)'] = np.where((df_train_advanced_stats['GP'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['GP'])

In [54]:
lista1 = df_train_advanced_stats['Min_per'].tolist()
lista2 = df_advanced_stats['Min_per'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

4.3 22.040000000000024 41.53333333333334 60.3


In [55]:
df_advanced_stats['Min_per (<4.3)'] = np.where(df_advanced_stats['Min_per'] < Q1, 1, 0)
df_advanced_stats['Min_per (>=4.3 e < 22.04)'] = np.where((df_advanced_stats['Min_per'] >= Q1) & (df_advanced_stats['Min_per'] < Q2), 1, 0)
df_advanced_stats['Min_per (>=22.04 e < 41.53)'] = np.where((df_advanced_stats['Min_per'] >= Q2) & (df_advanced_stats['Min_per'] < Q3), 1, 0)
df_advanced_stats['Min_per (>=41.53 e < 60.3)'] = np.where((df_advanced_stats['Min_per'] >= Q3) & (df_advanced_stats['Min_per'] < Q4), 1, 0)
df_advanced_stats['Min_per (>=60.3)'] = np.where((df_advanced_stats['Min_per'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['Min_per'])

df_train_advanced_stats['Min_per (<4.3)'] = np.where(df_train_advanced_stats['Min_per'] < Q1, 1, 0)
df_train_advanced_stats['Min_per (>=4.3 e < 22.04)'] = np.where((df_train_advanced_stats['Min_per'] >= Q1) & (df_train_advanced_stats['Min_per'] < Q2), 1, 0)
df_train_advanced_stats['Min_per (>=22.04 e < 41.53)'] = np.where((df_train_advanced_stats['Min_per'] >= Q2) & (df_train_advanced_stats['Min_per'] < Q3), 1, 0)
df_train_advanced_stats['Min_per (>=41.53 e < 60.3)'] = np.where((df_train_advanced_stats['Min_per'] >= Q3) & (df_train_advanced_stats['Min_per'] < Q4), 1, 0)
df_train_advanced_stats['Min_per (>=60.3)'] = np.where((df_train_advanced_stats['Min_per'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['Min_per'])

In [56]:
lista1 = df_train_advanced_stats['Ortg'].tolist()
lista2 = df_advanced_stats['Ortg'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

75.1 90.75 99.1 106.9


In [57]:
df_advanced_stats['Ortg (<75.1)'] = np.where(df_advanced_stats['Ortg'] < Q1, 1, 0)
df_advanced_stats['Ortg (>=75.1 e < 90.75)'] = np.where((df_advanced_stats['Ortg'] >= Q1) & (df_advanced_stats['Ortg'] < Q2), 1, 0)
df_advanced_stats['Ortg (>=90.75 e < 99.1)'] = np.where((df_advanced_stats['Ortg'] >= Q2) & (df_advanced_stats['Ortg'] < Q3), 1, 0)
df_advanced_stats['Ortg (>=99.1 e < 106.9)'] = np.where((df_advanced_stats['Ortg'] >= Q3) & (df_advanced_stats['Ortg'] < Q4), 1, 0)
df_advanced_stats['Ortg (>=106.9)'] = np.where((df_advanced_stats['Ortg'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['Ortg'])

df_train_advanced_stats['Ortg (<75.1)'] = np.where(df_train_advanced_stats['Ortg'] < Q1, 1, 0)
df_train_advanced_stats['Ortg (>=75.1 e < 90.75)'] = np.where((df_train_advanced_stats['Ortg'] >= Q1) & (df_train_advanced_stats['Ortg'] < Q2), 1, 0)
df_train_advanced_stats['Ortg (>=90.75 e < 99.1)'] = np.where((df_train_advanced_stats['Ortg'] >= Q2) & (df_train_advanced_stats['Ortg'] < Q3), 1, 0)
df_train_advanced_stats['Ortg (>=99.1 e < 106.9)'] = np.where((df_train_advanced_stats['Ortg'] >= Q3) & (df_train_advanced_stats['Ortg'] < Q4), 1, 0)
df_train_advanced_stats['Ortg (>=106.9)'] = np.where((df_train_advanced_stats['Ortg'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['Ortg'])

In [58]:
lista1 = df_train_advanced_stats['usg'].tolist()
lista2 = df_advanced_stats['usg'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

13.666666666666666 16.7 19.3 22.3


In [59]:
df_advanced_stats['usg (<13.67)'] = np.where(df_advanced_stats['usg'] < Q1, 1, 0)
df_advanced_stats['usg (>=13.67 e < 16.7)'] = np.where((df_advanced_stats['usg'] >= Q1) & (df_advanced_stats['usg'] < Q2), 1, 0)
df_advanced_stats['usg (>=16.7 e < 19.3)'] = np.where((df_advanced_stats['usg'] >= Q2) & (df_advanced_stats['usg'] < Q3), 1, 0)
df_advanced_stats['usg (>=19.3 e < 22.3)'] = np.where((df_advanced_stats['usg'] >= Q3) & (df_advanced_stats['usg'] < Q4), 1, 0)
df_advanced_stats['usg (>=22.3)'] = np.where((df_advanced_stats['usg'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['usg'])

df_train_advanced_stats['usg (<13.67)'] = np.where(df_train_advanced_stats['usg'] < Q1, 1, 0)
df_train_advanced_stats['usg (>=13.67 e < 16.7)'] = np.where((df_train_advanced_stats['usg'] >= Q1) & (df_train_advanced_stats['usg'] < Q2), 1, 0)
df_train_advanced_stats['usg (>=16.7 e < 19.3)'] = np.where((df_train_advanced_stats['usg'] >= Q2) & (df_train_advanced_stats['usg'] < Q3), 1, 0)
df_train_advanced_stats['usg (>=19.3 e < 22.3)'] = np.where((df_train_advanced_stats['usg'] >= Q3) & (df_train_advanced_stats['usg'] < Q4), 1, 0)
df_train_advanced_stats['usg (>=22.3)'] = np.where((df_train_advanced_stats['usg'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['usg'])

In [60]:
lista1 = df_train_advanced_stats['eFG'].tolist()
lista2 = df_advanced_stats['eFG'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

34.475 44.4 49.03333333333333 53.45


In [61]:
df_advanced_stats['eFG (<34.475)'] = np.where(df_advanced_stats['eFG'] < Q1, 1, 0)
df_advanced_stats['eFG (>=34.475 e < 44.4)'] = np.where((df_advanced_stats['eFG'] >= Q1) & (df_advanced_stats['eFG'] < Q2), 1, 0)
df_advanced_stats['eFG (>=44.4 e < 49.03)'] = np.where((df_advanced_stats['eFG'] >= Q2) & (df_advanced_stats['eFG'] < Q3), 1, 0)
df_advanced_stats['eFG (>=49.03 e < 53.45)'] = np.where((df_advanced_stats['eFG'] >= Q3) & (df_advanced_stats['eFG'] < Q4), 1, 0)
df_advanced_stats['eFG (>=53.45)'] = np.where((df_advanced_stats['eFG'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['eFG'])

df_train_advanced_stats['eFG (<34.475)'] = np.where(df_train_advanced_stats['eFG'] < Q1, 1, 0)
df_train_advanced_stats['eFG (>=34.475 e < 44.4)'] = np.where((df_train_advanced_stats['eFG'] >= Q1) & (df_train_advanced_stats['eFG'] < Q2), 1, 0)
df_train_advanced_stats['eFG (>=44.4 e < 49.03)'] = np.where((df_train_advanced_stats['eFG'] >= Q2) & (df_train_advanced_stats['eFG'] < Q3), 1, 0)
df_train_advanced_stats['eFG (>=49.03 e < 53.45)'] = np.where((df_train_advanced_stats['eFG'] >= Q3) & (df_train_advanced_stats['eFG'] < Q4), 1, 0)
df_train_advanced_stats['eFG (>=53.45)'] = np.where((df_train_advanced_stats['eFG'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['eFG'])

In [62]:
lista1 = df_train_advanced_stats['TS_per'].tolist()
lista2 = df_advanced_stats['TS_per'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

38.67400000000001 47.7625 51.92 56.164000000000016


In [63]:
df_advanced_stats['TS_per (<38.67)'] = np.where(df_advanced_stats['TS_per'] < Q1, 1, 0)
df_advanced_stats['TS_per (>=38.67 e < 47.76)'] = np.where((df_advanced_stats['TS_per'] >= Q1) & (df_advanced_stats['TS_per'] < Q2), 1, 0)
df_advanced_stats['TS_per (>=47.76 e < 51.92)'] = np.where((df_advanced_stats['TS_per'] >= Q2) & (df_advanced_stats['TS_per'] < Q3), 1, 0)
df_advanced_stats['TS_per (>=51.92 e < 56.16)'] = np.where((df_advanced_stats['TS_per'] >= Q3) & (df_advanced_stats['TS_per'] < Q4), 1, 0)
df_advanced_stats['TS_per (>=56.16)'] = np.where((df_advanced_stats['TS_per'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['TS_per'])

df_train_advanced_stats['TS_per (<38.67)'] = np.where(df_train_advanced_stats['TS_per'] < Q1, 1, 0)
df_train_advanced_stats['TS_per (>=38.67 e < 47.76'] = np.where((df_train_advanced_stats['TS_per'] >= Q1) & (df_train_advanced_stats['TS_per'] < Q2), 1, 0)
df_train_advanced_stats['TS_per (>=47.76 e < 51.92)'] = np.where((df_train_advanced_stats['TS_per'] >= Q2) & (df_train_advanced_stats['TS_per'] < Q3), 1, 0)
df_train_advanced_stats['TS_per (>=51.92 e < 56.16)'] = np.where((df_train_advanced_stats['TS_per'] >= Q3) & (df_train_advanced_stats['TS_per'] < Q4), 1, 0)
df_train_advanced_stats['TS_per (>=56.16)'] = np.where((df_train_advanced_stats['TS_per'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['TS_per'])

In [64]:
lista1 = df_train_advanced_stats['ORB_per'].tolist()
lista2 = df_advanced_stats['ORB_per'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

1.4000000000000001 3.1 5.625 8.8


In [65]:
df_advanced_stats['ORB_per (<1.4)'] = np.where(df_advanced_stats['ORB_per'] < Q1, 1, 0)
df_advanced_stats['ORB_per (>=1.4 e < 3.1)'] = np.where((df_advanced_stats['ORB_per'] >= Q1) & (df_advanced_stats['ORB_per'] < Q2), 1, 0)
df_advanced_stats['ORB_per (>=3.1 e < 5.62)'] = np.where((df_advanced_stats['ORB_per'] >= Q2) & (df_advanced_stats['ORB_per'] < Q3), 1, 0)
df_advanced_stats['ORB_per (>=5.62 e < 8.8)'] = np.where((df_advanced_stats['ORB_per'] >= Q3) & (df_advanced_stats['ORB_per'] < Q4), 1, 0)
df_advanced_stats['ORB_per (>=8.8)'] = np.where((df_advanced_stats['ORB_per'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['ORB_per'])

df_train_advanced_stats['ORB_per (<1.4)'] = np.where(df_train_advanced_stats['ORB_per'] < Q1, 1, 0)
df_train_advanced_stats['ORB_per (>=1.4 e < 3.1)'] = np.where((df_train_advanced_stats['ORB_per'] >= Q1) & (df_train_advanced_stats['ORB_per'] < Q2), 1, 0)
df_train_advanced_stats['ORB_per (>=3.1 e < 5.62)'] = np.where((df_train_advanced_stats['ORB_per'] >= Q2) & (df_train_advanced_stats['ORB_per'] < Q3), 1, 0)
df_train_advanced_stats['ORB_per (>=5.62 e < 8.8)'] = np.where((df_train_advanced_stats['ORB_per'] >= Q3) & (df_train_advanced_stats['ORB_per'] < Q4), 1, 0)
df_train_advanced_stats['ORB_per (>=8.8)'] = np.where((df_train_advanced_stats['ORB_per'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['ORB_per'])

In [66]:
lista1 = df_train_advanced_stats['DRB_per'].tolist()
lista2 = df_advanced_stats['DRB_per'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

7.6000000000000005 10.5 13.35 16.973333333333358


In [67]:
df_advanced_stats['DRB_per (<7.6)'] = np.where(df_advanced_stats['DRB_per'] < Q1, 1, 0)
df_advanced_stats['DRB_per (>=7.6 e < 10.5)'] = np.where((df_advanced_stats['DRB_per'] >= Q1) & (df_advanced_stats['DRB_per'] < Q2), 1, 0)
df_advanced_stats['DRB_per (>=10.5 e < 13.35)'] = np.where((df_advanced_stats['DRB_per'] >= Q2) & (df_advanced_stats['DRB_per'] < Q3), 1, 0)
df_advanced_stats['DRB_per (>=13.35 e < 16.97)'] = np.where((df_advanced_stats['DRB_per'] >= Q3) & (df_advanced_stats['DRB_per'] < Q4), 1, 0)
df_advanced_stats['DRB_per (>=16.97)'] = np.where((df_advanced_stats['DRB_per'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['DRB_per'])

df_train_advanced_stats['DRB_per (<7.6)'] = np.where(df_train_advanced_stats['DRB_per'] < Q1, 1, 0)
df_train_advanced_stats['DRB_per (>=7.6 e < 10.5)'] = np.where((df_train_advanced_stats['DRB_per'] >= Q1) & (df_train_advanced_stats['DRB_per'] < Q2), 1, 0)
df_train_advanced_stats['DRB_per (>=10.5 e < 13.35)'] = np.where((df_train_advanced_stats['DRB_per'] >= Q2) & (df_train_advanced_stats['DRB_per'] < Q3), 1, 0)
df_train_advanced_stats['DRB_per (>=13.35 e < 16.97)'] = np.where((df_train_advanced_stats['DRB_per'] >= Q3) & (df_train_advanced_stats['DRB_per'] < Q4), 1, 0)
df_train_advanced_stats['DRB_per (>=16.97)'] = np.where((df_train_advanced_stats['DRB_per'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['DRB_per'])

In [68]:
lista1 = df_train_advanced_stats['AST_per'].tolist()
lista2 = df_advanced_stats['AST_per'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

3.5 7.050000000000001 10.649999999999999 16.3


In [69]:
df_advanced_stats['AST_per (<3.5)'] = np.where(df_advanced_stats['AST_per'] < Q1, 1, 0)
df_advanced_stats['AST_per (>=3.5 e < 7.05)'] = np.where((df_advanced_stats['AST_per'] >= Q1) & (df_advanced_stats['AST_per'] < Q2), 1, 0)
df_advanced_stats['AST_per (>=7.05 e < 10.65)'] = np.where((df_advanced_stats['AST_per'] >= Q2) & (df_advanced_stats['AST_per'] < Q3), 1, 0)
df_advanced_stats['AST_per (>=10.65 e < 16.3)'] = np.where((df_advanced_stats['AST_per'] >= Q3) & (df_advanced_stats['AST_per'] < Q4), 1, 0)
df_advanced_stats['AST_per (>=16.3)'] = np.where((df_advanced_stats['AST_per'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['AST_per'])

df_train_advanced_stats['AST_per (<3.5)'] = np.where(df_train_advanced_stats['AST_per'] < Q1, 1, 0)
df_train_advanced_stats['AST_per (>=3.5 e < 7.05)'] = np.where((df_train_advanced_stats['AST_per'] >= Q1) & (df_train_advanced_stats['AST_per'] < Q2), 1, 0)
df_train_advanced_stats['AST_per (>=7.05 e < 10.65)'] = np.where((df_train_advanced_stats['AST_per'] >= Q2) & (df_train_advanced_stats['AST_per'] < Q3), 1, 0)
df_train_advanced_stats['AST_per (>=10.65 e < 16.3)'] = np.where((df_train_advanced_stats['AST_per'] >= Q3) & (df_train_advanced_stats['AST_per'] < Q4), 1, 0)
df_train_advanced_stats['AST_per (>=16.3)'] = np.where((df_train_advanced_stats['AST_per'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['AST_per'])

In [70]:
lista1 = df_train_advanced_stats['TO_per'].tolist()
lista2 = df_advanced_stats['TO_per'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

13.8 17.710000000000036 21.1 26.166666666666668


In [71]:
df_advanced_stats['TO_per (<13.8)'] = np.where(df_advanced_stats['TO_per'] < Q1, 1, 0)
df_advanced_stats['TO_per (>=13.8 e < 17.71)'] = np.where((df_advanced_stats['TO_per'] >= Q1) & (df_advanced_stats['TO_per'] < Q2), 1, 0)
df_advanced_stats['TO_per (>=17.71 e < 21.1)'] = np.where((df_advanced_stats['TO_per'] >= Q2) & (df_advanced_stats['TO_per'] < Q3), 1, 0)
df_advanced_stats['TO_per (>=21.1 e < 26.16)'] = np.where((df_advanced_stats['TO_per'] >= Q3) & (df_advanced_stats['TO_per'] < Q4), 1, 0)
df_advanced_stats['TO_per (>=26.16)'] = np.where((df_advanced_stats['TO_per'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['TO_per'])

df_train_advanced_stats['TO_per (<13.8)'] = np.where(df_train_advanced_stats['TO_per'] < Q1, 1, 0)
df_train_advanced_stats['TO_per (>=13.8 e < 17.71)'] = np.where((df_train_advanced_stats['TO_per'] >= Q1) & (df_train_advanced_stats['TO_per'] < Q2), 1, 0)
df_train_advanced_stats['TO_per (>=17.71 e < 21.1)'] = np.where((df_train_advanced_stats['TO_per'] >= Q2) & (df_train_advanced_stats['TO_per'] < Q3), 1, 0)
df_train_advanced_stats['TO_per (>=21.1 e < 26.16)'] = np.where((df_train_advanced_stats['TO_per'] >= Q3) & (df_train_advanced_stats['TO_per'] < Q4), 1, 0)
df_train_advanced_stats['TO_per (>=26.16)'] = np.where((df_train_advanced_stats['TO_per'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['TO_per'])

In [72]:
lista1 = df_train_advanced_stats['FT_per'].tolist()
lista2 = df_advanced_stats['FT_per'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

0.357 0.5775 0.6755 0.75675


In [73]:
df_advanced_stats['FT_per (<0.357)'] = np.where(df_advanced_stats['FT_per'] < Q1, 1, 0)
df_advanced_stats['FT_per (>=0.357 e < 0.577)'] = np.where((df_advanced_stats['FT_per'] >= Q1) & (df_advanced_stats['FT_per'] < Q2), 1, 0)
df_advanced_stats['FT_per (>=0.577 e < 0.675)'] = np.where((df_advanced_stats['FT_per'] >= Q2) & (df_advanced_stats['FT_per'] < Q3), 1, 0)
df_advanced_stats['FT_per (>=0.675 e < 0.756)'] = np.where((df_advanced_stats['FT_per'] >= Q3) & (df_advanced_stats['FT_per'] < Q4), 1, 0)
df_advanced_stats['FT_per (>=0.756)'] = np.where((df_advanced_stats['FT_per'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['FT_per'])

df_train_advanced_stats['FT_per (<0.357)'] = np.where(df_train_advanced_stats['FT_per'] < Q1, 1, 0)
df_train_advanced_stats['FT_per (>=0.357 e < 0.577)'] = np.where((df_train_advanced_stats['FT_per'] >= Q1) & (df_train_advanced_stats['FT_per'] < Q2), 1, 0)
df_train_advanced_stats['FT_per (>=0.577 e < 0.675)'] = np.where((df_train_advanced_stats['FT_per'] >= Q2) & (df_train_advanced_stats['FT_per'] < Q3), 1, 0)
df_train_advanced_stats['FT_per (>=0.675 e < 0.756)'] = np.where((df_train_advanced_stats['FT_per'] >= Q3) & (df_train_advanced_stats['FT_per'] < Q4), 1, 0)
df_train_advanced_stats['FT_per (>=0.756)'] = np.where((df_train_advanced_stats['FT_per'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['FT_per'])

In [74]:
lista1 = df_train_advanced_stats['twoP_per'].tolist()
lista2 = df_advanced_stats['twoP_per'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

0.314 0.4225 0.47866666666666663 0.532


In [75]:
df_advanced_stats['twoP_per (<0.314)'] = np.where(df_advanced_stats['twoP_per'] < Q1, 1, 0)
df_advanced_stats['twoP_per (>=0.314 e < 0.422)'] = np.where((df_advanced_stats['twoP_per'] >= Q1) & (df_advanced_stats['twoP_per'] < Q2), 1, 0)
df_advanced_stats['twoP_per (>=0.422 e < 0.478)'] = np.where((df_advanced_stats['twoP_per'] >= Q2) & (df_advanced_stats['twoP_per'] < Q3), 1, 0)
df_advanced_stats['twoP_per (>=0.478 e < 0.532)'] = np.where((df_advanced_stats['twoP_per'] >= Q3) & (df_advanced_stats['twoP_per'] < Q4), 1, 0)
df_advanced_stats['twoP_per (>=0.532)'] = np.where((df_advanced_stats['twoP_per'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['twoP_per'])

df_train_advanced_stats['twoP_per (<0.314)'] = np.where(df_train_advanced_stats['twoP_per'] < Q1, 1, 0)
df_train_advanced_stats['twoP_per (>=0.314 e < 0.422)'] = np.where((df_train_advanced_stats['twoP_per'] >= Q1) & (df_train_advanced_stats['twoP_per'] < Q2), 1, 0)
df_train_advanced_stats['twoP_per (>=0.422 e < 0.478)'] = np.where((df_train_advanced_stats['twoP_per'] >= Q2) & (df_train_advanced_stats['twoP_per'] < Q3), 1, 0)
df_train_advanced_stats['twoP_per (>=0.478 e < 0.532)'] = np.where((df_train_advanced_stats['twoP_per'] >= Q3) & (df_train_advanced_stats['twoP_per'] < Q4), 1, 0)
df_train_advanced_stats['twoP_per (>=0.532)'] = np.where((df_train_advanced_stats['twoP_per'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['twoP_per'])

In [76]:
lista1 = df_train_advanced_stats['TP_per'].tolist()
lista2 = df_advanced_stats['TP_per'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

0.0 0.194 0.3 0.357


In [77]:
df_advanced_stats['TP_per (<0.0)'] = np.where(df_advanced_stats['TP_per'] < Q1, 1, 0)
df_advanced_stats['TP_per (>=0.0 e < 0.194)'] = np.where((df_advanced_stats['TP_per'] >= Q1) & (df_advanced_stats['TP_per'] < Q2), 1, 0)
df_advanced_stats['TP_per (>=0.194 e < 0.3)'] = np.where((df_advanced_stats['TP_per'] >= Q2) & (df_advanced_stats['TP_per'] < Q3), 1, 0)
df_advanced_stats['TP_per (>=0.3 e < 0.357)'] = np.where((df_advanced_stats['TP_per'] >= Q3) & (df_advanced_stats['TP_per'] < Q4), 1, 0)
df_advanced_stats['TP_per (>=0.357)'] = np.where((df_advanced_stats['TP_per'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['TP_per'])

df_train_advanced_stats['TP_per (<0.0)'] = np.where(df_train_advanced_stats['TP_per'] < Q1, 1, 0)
df_train_advanced_stats['TP_per (>=0.0 e < 0.194)'] = np.where((df_train_advanced_stats['TP_per'] >= Q1) & (df_train_advanced_stats['TP_per'] < Q2), 1, 0)
df_train_advanced_stats['TP_per (>=0.194 e < 0.3)'] = np.where((df_train_advanced_stats['TP_per'] >= Q2) & (df_train_advanced_stats['TP_per'] < Q3), 1, 0)
df_train_advanced_stats['TP_per (>=0.3 e < 0.357)'] = np.where((df_train_advanced_stats['TP_per'] >= Q3) & (df_train_advanced_stats['TP_per'] < Q4), 1, 0)
df_train_advanced_stats['TP_per (>=0.357)'] = np.where((df_train_advanced_stats['TP_per'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['TP_per'])

In [78]:
lista1 = df_train_advanced_stats['blk_per'].tolist()
lista2 = df_advanced_stats['blk_per'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

0.0 0.5 1.2999999999999998 2.9000000000000004


In [79]:
df_advanced_stats['blk_per (<0.0)'] = np.where(df_advanced_stats['blk_per'] < Q1, 1, 0)
df_advanced_stats['blk_per (>=0.0 e < 0.5)'] = np.where((df_advanced_stats['blk_per'] >= Q1) & (df_advanced_stats['blk_per'] < Q2), 1, 0)
df_advanced_stats['blk_per (>=0.5 e < 1.3)'] = np.where((df_advanced_stats['blk_per'] >= Q2) & (df_advanced_stats['blk_per'] < Q3), 1, 0)
df_advanced_stats['blk_per (>=1.3 e < 2.9)'] = np.where((df_advanced_stats['blk_per'] >= Q3) & (df_advanced_stats['blk_per'] < Q4), 1, 0)
df_advanced_stats['blk_per (>=2.9)'] = np.where((df_advanced_stats['blk_per'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['blk_per'])

df_train_advanced_stats['blk_per (<0.0)'] = np.where(df_train_advanced_stats['blk_per'] < Q1, 1, 0)
df_train_advanced_stats['blk_per (>=0.0 e < 0.5)'] = np.where((df_train_advanced_stats['blk_per'] >= Q1) & (df_train_advanced_stats['blk_per'] < Q2), 1, 0)
df_train_advanced_stats['blk_per (>=0.5 e < 1.3)'] = np.where((df_train_advanced_stats['blk_per'] >= Q2) & (df_train_advanced_stats['blk_per'] < Q3), 1, 0)
df_train_advanced_stats['blk_per (>=1.3 e < 2.9)'] = np.where((df_train_advanced_stats['blk_per'] >= Q3) & (df_train_advanced_stats['blk_per'] < Q4), 1, 0)
df_train_advanced_stats['blk_per (>=2.9)'] = np.where((df_train_advanced_stats['blk_per'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['blk_per'])

In [80]:
lista1 = df_train_advanced_stats['stl_per'].tolist()
lista2 = df_advanced_stats['stl_per'].tolist()

lista = lista1 + lista2

Q1, Q2, Q3, Q4 = faixa_de_valores(lista)

print(Q1, Q2, Q3, Q4)

0.7 1.3 1.75 2.4


In [81]:
df_advanced_stats['stl_per (<0.7)'] = np.where(df_advanced_stats['stl_per'] < Q1, 1, 0)
df_advanced_stats['stl_per (>=0.7 e < 1.3)'] = np.where((df_advanced_stats['stl_per'] >= Q1) & (df_advanced_stats['stl_per'] < Q2), 1, 0)
df_advanced_stats['stl_per (>=1.3 e < 1.75)'] = np.where((df_advanced_stats['stl_per'] >= Q2) & (df_advanced_stats['stl_per'] < Q3), 1, 0)
df_advanced_stats['stl_per (>=1.75 e < 2.4)'] = np.where((df_advanced_stats['stl_per'] >= Q3) & (df_advanced_stats['stl_per'] < Q4), 1, 0)
df_advanced_stats['stl_per (>=2.4)'] = np.where((df_advanced_stats['stl_per'] >= Q4), 1, 0)
df_advanced_stats = df_advanced_stats.drop(columns = ['stl_per'])

df_train_advanced_stats['stl_per (<0.7)'] = np.where(df_train_advanced_stats['stl_per'] < Q1, 1, 0)
df_train_advanced_stats['stl_per (>=0.7 e < 1.3)'] = np.where((df_train_advanced_stats['stl_per'] >= Q1) & (df_train_advanced_stats['stl_per'] < Q2), 1, 0)
df_train_advanced_stats['stl_per (>=1.3 e < 1.75)'] = np.where((df_train_advanced_stats['stl_per'] >= Q2) & (df_train_advanced_stats['stl_per'] < Q3), 1, 0)
df_train_advanced_stats['stl_per (>=1.75 e < 2.4)'] = np.where((df_train_advanced_stats['stl_per'] >= Q3) & (df_train_advanced_stats['stl_per'] < Q4), 1, 0)
df_train_advanced_stats['stl_per (>=2.4)'] = np.where((df_train_advanced_stats['stl_per'] >= Q4), 1, 0)
df_train_advanced_stats = df_train_advanced_stats.drop(columns = ['stl_per'])

In [82]:
# resultado final dos nossos metadados

df_advanced_stats

Unnamed: 0,player_name,player_name_id,GP (<10),GP (>=10 e < 20),GP (>=20 e < 26),GP (>=26 e < 30),GP (>=30),Min_per (<4.3),Min_per (>=4.3 e < 22.04),Min_per (>=22.04 e < 41.53),...,blk_per (<0.0),blk_per (>=0.0 e < 0.5),blk_per (>=0.5 e < 1.3),blk_per (>=1.3 e < 2.9),blk_per (>=2.9),stl_per (<0.7),stl_per (>=0.7 e < 1.3),stl_per (>=1.3 e < 1.75),stl_per (>=1.75 e < 2.4),stl_per (>=2.4)
0,A'Torey Everett,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,1,0,0,0,0
1,A'Torri Shine,1,0,0,0,1,0,0,0,0,...,0,0,1,0,0,0,1,0,0,0
2,A'Uston Calhoun,2,0,0,0,0,1,0,0,0,...,0,0,0,1,0,0,1,0,0,0
3,A'uston Calhoun,3,0,0,1,0,0,0,0,0,...,0,0,0,1,0,0,1,0,0,0
4,A.C. Reid,4,0,0,1,0,0,0,0,1,...,0,0,1,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23878,Zvonko Buljan,23878,0,0,0,1,0,0,0,0,...,0,0,1,0,0,0,0,0,1,0
23879,Zygis Sestokas,23879,0,0,1,0,0,0,0,1,...,0,0,1,0,0,0,1,0,0,0
23880,Zylan Cheatham,23880,0,0,0,0,1,0,0,0,...,0,0,0,0,1,0,0,0,1,0
23881,Zyon Dobbs,23881,0,1,0,0,0,0,1,0,...,0,0,1,0,0,0,0,0,0,1


In [83]:
#resultado final dos nossos dados de treino

df_train_advanced_stats

Unnamed: 0,player_name,player_name_id,GP (<10),GP (>=10 e < 20),GP (>=20 e < 26),GP (>=26 e < 30),GP (>=30),Min_per (<4.3),Min_per (>=4.3 e < 22.04),Min_per (>=22.04 e < 41.53),...,blk_per (<0.0),blk_per (>=0.0 e < 0.5),blk_per (>=0.5 e < 1.3),blk_per (>=1.3 e < 2.9),blk_per (>=2.9),stl_per (<0.7),stl_per (>=0.7 e < 1.3),stl_per (>=1.3 e < 1.75),stl_per (>=1.75 e < 2.4),stl_per (>=2.4)
0,A.J. Caldwell,0,0,0,1,0,0,0,0,0,...,0,0,0,1,0,0,0,0,1,0
1,A.J. Hoggard,1,0,0,0,1,0,0,0,1,...,0,0,0,1,0,0,0,1,0,0
2,A.J. Labriola,2,1,0,0,0,0,1,0,0,...,0,1,0,0,0,1,0,0,0,0
3,A.J. Lawson,3,0,1,0,0,0,0,0,0,...,0,1,0,0,0,0,0,1,0,0
4,A.J. McGinnis,4,0,0,0,1,0,0,0,1,...,0,1,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4934,Zion Young,4934,0,0,0,0,1,0,0,0,...,0,1,0,0,0,0,1,0,0,0
4935,Zoar Nedd,4935,1,0,0,0,0,1,0,0,...,0,1,0,0,0,1,0,0,0,0
4936,Zondrick Garrett,4936,1,0,0,0,0,1,0,0,...,0,1,0,0,0,1,0,0,0,0
4937,Zool Kueth,4937,0,1,0,0,0,0,0,0,...,0,0,0,0,1,1,0,0,0,0


In [84]:
# salvando os dados finalizados, eles serao utilizados no notebook que realizará, de fato, a recomendação do TOP15 do draft para
# o ano de 2021 baseados nos jogadores do college que se alistaram pro draft daquele respectivo ano.

df_advanced_stats.to_csv("advanced_stats_2009_2020.csv")
df_train_advanced_stats.to_csv("advanced_stats_2021.csv")
lottery_drafted_players.to_csv("lottery_drafted_players.csv")