## NCAA Early Applicant to NBA Picker
Today we are trying to see if we can determine whether given a USA NCAA player who's applied to the NBA, whether that person is likely to get drafted. We shall do this through a variety of techniques. Firstly we must generate our own unique dataset utilizing BigQuery's ncaa_mbb dataset. Our first issue with the ncaa_mbb dataset is we have no idea whether any of the applicants have applied to the nba. To solve this issue, we shall partition the data by adding another categorical feature to each player indicating whether this player applied to the nba. Specifically we will label whether a player is an early nba applicant, simply because early applicants seemed to be the only reliable form of data we could find regarding nba applicants. 

In [1]:
# Simple python libraries
import numpy as np
import matplotlib.pyplot as plt
import csv
from data_loader.DatasetsGenerator import getall, get_classifier_data
from data_loader.Draft import Draft
from data_loader.Applicants import Applicants
import os
import pandas as pd
# from data_loader import Draft, DatasetsGenerator
%matplotlib inline

# Generate all custom made datasets
getall()

# Create an nba drafted validation class
draft = Draft()

# Create a nba applicant validation class
applied = Applicants()

In [2]:
player_box = pd.read_csv("./data_loader/data/ncaa/player_box.csv")
player_info = pd.read_csv("./data_loader/data/ncaa/player_info.csv")

# Filter out unwanted values
player_box = player_box[player_box['player_id']!=-101]
player_box = player_box[~player_box['player_id'].isnull()]
player_box = player_box[player_box['season']!=None]
player_box = player_box[player_box['season']!=2018]

player_info = player_info[player_info['player_id']!=-101]
player_info = player_info[~player_info['player_id'].isnull()]
player_info = player_info[player_info['season']!=None]
player_info = player_info[player_info['season']!=2018]
player_info = player_info[~player_info['first_name'].isnull()]
player_info = player_info[~player_info['last_name'].isnull()]

# Combine the two datasets
player_box["player_id_AND_season"] = player_box["player_id"].map(int).map(str) + "_AND_" + player_box["season"].map(int).map(str) 
player_info["player_id_AND_season"] = player_info["player_id"].map(int).map(str) + "_AND_" + player_info["season"].map(int).map(str) 
player_stats = pd.merge(player_box, player_info, on='player_id_AND_season', how='outer')
player_stats = player_stats.dropna()

# Get only drafted players
player_stats = applied.join(player_stats)

# Add drafted categorical value
player_stats = draft.join(player_stats)
player_stats.to_csv('./data_loader/data/ncaa/player_stats.csv')


  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
X, Y = get_classifier_data()

In [7]:
X

Unnamed: 0.1,Unnamed: 0,team_code_x,season_x,game_id,game_date,player_id_x,jersey_num,pts,fga,fga3,...,season_y,class_desc,xml_name,school,conference,height_ft,conference_id,player_id_y,position,team_code_y
0,0,193.0,2015.0,193-253-2015-12-15,2015-12-15,1711099.0,14,26.0,13.0,4.0,...,2015.0,Fr.,"INGRAM,BRANDON",Duke,ACC,6.0,821.0,1711099.0,F,193.0
1,1,193.0,2015.0,67-193-2016-01-02,2016-01-02,1711099.0,14,25.0,18.0,9.0,...,2015.0,Fr.,"INGRAM,BRANDON",Duke,ACC,6.0,821.0,1711099.0,F,193.0
2,2,193.0,2015.0,529-193-2016-03-24,2016-03-24,1711099.0,14,24.0,20.0,7.0,...,2015.0,Fr.,"INGRAM,BRANDON",Duke,ACC,6.0,821.0,1711099.0,F,193.0
3,3,193.0,2015.0,193-1068-2015-12-28,2015-12-28,1711099.0,14,26.0,16.0,6.0,...,2015.0,Fr.,"INGRAM,BRANDON",Duke,ACC,6.0,821.0,1711099.0,F,193.0
4,4,193.0,2015.0,193-306-2015-12-02,2015-12-02,1711099.0,14,24.0,15.0,6.0,...,2015.0,Fr.,"INGRAM,BRANDON",Duke,ACC,6.0,821.0,1711099.0,F,193.0
5,5,193.0,2015.0,193-513-2016-01-16,2016-01-16,1711099.0,14,25.0,14.0,7.0,...,2015.0,Fr.,"INGRAM,BRANDON",Duke,ACC,6.0,821.0,1711099.0,F,193.0
6,6,193.0,2015.0,193-746-2016-02-13,2016-02-13,1711099.0,14,25.0,22.0,9.0,...,2015.0,Fr.,"INGRAM,BRANDON",Duke,ACC,6.0,821.0,1711099.0,F,193.0
7,7,193.0,2015.0,193-813-2016-03-19,2016-03-19,1711099.0,14,25.0,19.0,7.0,...,2015.0,Fr.,"INGRAM,BRANDON",Duke,ACC,6.0,821.0,1711099.0,F,193.0
8,8,193.0,2015.0,490-193-2016-01-23,2016-01-23,1711099.0,14,25.0,16.0,6.0,...,2015.0,Fr.,"INGRAM,BRANDON",Duke,ACC,6.0,821.0,1711099.0,F,193.0
9,9,193.0,2015.0,193-86-2015-12-05,2015-12-05,1711099.0,14,23.0,15.0,4.0,...,2015.0,Fr.,"INGRAM,BRANDON",Duke,ACC,6.0,821.0,1711099.0,F,193.0


In [12]:
type(Y)

pandas.core.series.Series

In [13]:
X.to_csv('./data_loader/data/X.csv')