## Bart Torvik Data

This notebook will create a dataframe from Bart Torvik's website. Bart has a public website with college basketball individual and team stats since 2008. The data he has provided includes season stats for every player that has played since 2008. 

### Contents

- [Imports](#Imports)
- [Reading Data](#Reading-Data)
- [Data Review](#Data-Review)


### Imports

In [1]:
import pandas as pd

### Reading Data

In [2]:
#creaing dataframe from the json file
df = pd.read_json('http://barttorvik.com/getadvstats.php?year=all&specialSource=0&conyes=0&start=-11101&end=all0501&top=0&xvalue=All&page=playerstat.json')


### Data Review

In [3]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,47,48,49,50,51,52,53,54,55,56
0,Jason Praet,Detroit,Horz,6,1.0,0.0,16.8,0.0,0.0,0.0,...,98.5024,0.196087,2.50101,-16.1407,-14.7415,-1.39926,-19.178,2.0,-21.543,2.365
1,DeAndrae Ross,South Alabama,SB,22,8.4,62.0,14.1,32.6,33.4,0.0,...,104.687,0.365511,15.0379,-8.33266,-6.31752,-2.01514,-10.622,5.125,-9.557,-1.065
2,DeAndrae Ross,South Alabama,SB,26,29.5,97.3,16.6,42.5,44.43,1.6,...,108.527,0.893017,49.9644,-5.15203,-1.77499,-3.37704,-4.381,14.5769,-2.722,-1.659
3,DeAndrae Ross,Troy,SB,7,11.5,67.1,16.9,34.5,36.11,3.1,...,111.956,0.962461,17.4434,-10.4455,-6.83856,-3.60694,-9.617,18.8571,-6.524,-3.093
4,Pooh Williams,Utah St.,WAC,28,24.3,93.1,13.8,45.2,46.2,1.4,...,110.366,0.672348,48.4848,-3.29203,-1.17901,-2.11302,-3.537,12.1379,-2.517,-1.02


In [4]:
#creating a dictionary to rename columns
#column titles provided by Bart Torvik

new_columns = {0: 'player_name', 1: 'school', 2: 'conference', 3: 'GP', 4: 'Min_per',
               5: 'ORtg', 6: 'usg', 7: 'eFG', 8: 'TS_per', 9: 'ORB_per', 10: 'DRB_per',
              11: 'AST_per', 12: 'TO_per', 13: 'FTM', 14: 'FTA', 15: 'FT_per',16: 'twoPM',
              17: 'twoPA', 18: 'twoP_per', 19: 'TPM', 20: 'TPA', 21: 'TP_per', 22: 'blk_per',
              23: 'stl_per', 24: 'ftr', 25: 'yr', 26: 'ht', 27: 'num', 28: 'porpag', 29: 'adjoe',
              30: 'pfr', 31: 'year', 32: 'pid', 33: 'type', 34: 'rec-rk', 35: 'ast/tov', 36: 'rimmade',
              37: 'rimmade + rimmiss', 38: 'midmade', 39: 'midmade + midmiss', 40: 'rimmade/(rimmade+rimmiss)',
              41: 'midmade/(midmade+mismiss)', 42: 'dunksmade', 43: 'dunksmiss + dunksmade', 
              44: 'dunksmade/(dunksmade+dunksmiss)', 45: 'pick', 46: 'drtg', 47: 'adrtg', 48: 'dporpag',
              49: 'stops', 50: 'bpm', 51: 'obpm', 52: 'dbpm', 53: 'gbpm', 54: 'mp', 55: 'ogbpm', 56: 'dgbpm'}

In [5]:
#renaming columns
df = df.rename(columns = new_columns)

In [6]:
df.head()

Unnamed: 0,player_name,school,conference,GP,Min_per,ORtg,usg,eFG,TS_per,ORB_per,...,adrtg,dporpag,stops,bpm,obpm,dbpm,gbpm,mp,ogbpm,dgbpm
0,Jason Praet,Detroit,Horz,6,1.0,0.0,16.8,0.0,0.0,0.0,...,98.5024,0.196087,2.50101,-16.1407,-14.7415,-1.39926,-19.178,2.0,-21.543,2.365
1,DeAndrae Ross,South Alabama,SB,22,8.4,62.0,14.1,32.6,33.4,0.0,...,104.687,0.365511,15.0379,-8.33266,-6.31752,-2.01514,-10.622,5.125,-9.557,-1.065
2,DeAndrae Ross,South Alabama,SB,26,29.5,97.3,16.6,42.5,44.43,1.6,...,108.527,0.893017,49.9644,-5.15203,-1.77499,-3.37704,-4.381,14.5769,-2.722,-1.659
3,DeAndrae Ross,Troy,SB,7,11.5,67.1,16.9,34.5,36.11,3.1,...,111.956,0.962461,17.4434,-10.4455,-6.83856,-3.60694,-9.617,18.8571,-6.524,-3.093
4,Pooh Williams,Utah St.,WAC,28,24.3,93.1,13.8,45.2,46.2,1.4,...,110.366,0.672348,48.4848,-3.29203,-1.17901,-2.11302,-3.537,12.1379,-2.517,-1.02


In [7]:
#FYI, data includes individual seasons, not cumulative data
df[df['player_name'] == 'Jordan Poole']

Unnamed: 0,player_name,school,conference,GP,Min_per,ORtg,usg,eFG,TS_per,ORB_per,...,adrtg,dporpag,stops,bpm,obpm,dbpm,gbpm,mp,ogbpm,dgbpm
51706,Jordan Poole,Michigan,B10,38,28.9,108.0,23.2,53.5,57.55,0.8,...,89.312,1.69519,86.5604,4.50062,2.36598,2.13464,7.046,12.2051,3.399,3.647
51707,Jordan Poole,Michigan,B10,37,82.6,108.4,20.0,53.8,57.32,0.8,...,88.0083,4.63319,204.665,7.55575,4.04507,3.51068,6.312,33.027,2.763,3.549


In [8]:
#looking at null values
#will clean in Cleaning and EDA Notebook
df.isnull().sum()

player_name                            0
school                                 0
conference                             0
GP                                     0
Min_per                                0
ORtg                                   0
usg                                    0
eFG                                    0
TS_per                                 0
ORB_per                                0
DRB_per                                0
AST_per                                0
TO_per                                 0
FTM                                    0
FTA                                    0
FT_per                                 0
twoPM                                  0
twoPA                                  0
twoP_per                               0
TPM                                    0
TPA                                    0
TP_per                                 0
blk_per                                0
stl_per                                0
ftr             

In [9]:
df.shape

(55939, 57)

In [10]:
df.to_csv('../Data_Files/torvik_data.csv')