Our proposal is to explore datasets containing the physical attributes and statistics of drafted NBA players to determine what are the most relevant features that scouts/recruiters look for in prospective players, and then develop a model that predicts whether or not a player will be drafted.  

For data analysis and exploration, we intend to look at the physical attributes of drafted players using the NBA draft combine, where the height, weight, vertical, sprint speed, and more are available for each player. We also intend to look at the college statistics (points, games played, etc.) of potential prospects to evaluate the level of performance in college. The datasets have over 30 years of data, with the most recent year being 2021/2022.  

Datasets:
* https://www.kaggle.com/code/adityak2003/predicting-the-nba-draft-using-college-stats/input  
* https://www.kaggle.com/datasets/marcusfern/nba-draft-combine  
* https://www.kaggle.com/datasets/mattop/nba-draft-basketball-player-data-19892021

In [None]:
# Imports
from google.colab import auth
import os
import glob
import requests
import pandas as pd

In [None]:
workspaces = {
    "ahhussain": "ECE_143_Project_Team_Number_7_Codebase",
    "stdong": "/content/drive/MyDrive/ECE143/ECE_143_Project_Team_Number_7_Codebase"
}

In [None]:
def import_setup_work_env(workspaces):
    """
    Imports all datasets and sets up working environment per user

    * Data will be imported from the Drug Crime dataset (Drug_Crime_20231111.csv) and the 2020 Census with manual preprocessing by
    splitting the original excel dataset into 5 seperate datasets (Total Population, Hispanic, White, Black, Asian) to reformat it into
    a supported csv format with correctly parsable and distinguishable columns.

    * With the removal of shared drives, the location of the work directory can no longer be hard-coded and guaranteed the same for everyone.
    In addition to importing the data, this function will change the work directory of the runtime according to set of user prefernces
    Reference for gmail fetching: https://colab.research.google.com/drive/1VVWs_pcjjz2vg0H2Ti6-12FzcCojRF6a

    Args:
        workspaces (dict): dictionary list of the absolute path to the users' workspace directory, keyed by their username

    Returns:
        Tuple of pandas dataframes containing the data from each dataset. List is returned in this order:
            (drug_crime, census_total, census_hispanic, census_white, census_black, census_asian)

    """
    assert isinstance(workspaces, dict)

    # Fetch user's gmail and extract username
    auth.authenticate_user()

    gcloud_token = !gcloud auth print-access-token
    gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()

    user_email = gcloud_tokeninfo['email']
    username = user_email[:user_email.find("@")]

    assert username in workspaces

    # Change work directory
    os.chdir(workspaces[username])
    print(f"Working directory changed to {os.getcwd()}")

    # Read all csv's as dataframe
    drug_crime = pd.read_csv(os.getcwd() + "/Drug_Crime_20231111.csv")
    tot_pop = pd.read_csv(os.getcwd() + "/2020_Census/dcp-comps-of-chg-StoryMap-data-032023-Total-Population.csv")
    hisp_pop = pd.read_csv(os.getcwd() + "/2020_Census/dcp-comps-of-chg-StoryMap-data-032023-Total-Population-Hispanic.csv")
    white_pop = pd.read_csv(os.getcwd() + "/2020_Census/dcp-comps-of-chg-StoryMap-data-032023-Total-Population-White.csv")
    black_pop = pd.read_csv(os.getcwd() + "/2020_Census/dcp-comps-of-chg-StoryMap-data-032023-Total-Population-Black.csv")
    asian_pop = pd.read_csv(os.getcwd() + "/2020_Census/dcp-comps-of-chg-StoryMap-data-032023-Total-Population-Asian.csv")

    return (drug_crime, tot_pop, hisp_pop, white_pop, black_pop, asian_pop)

raw_drug_crime, raw_census_total, raw_census_hisp, raw_census_white, raw_census_black, raw_census_asian = import_setup_work_env(workspaces)

FileNotFoundError: ignored

In [None]:
raw_drug_crime

Unnamed: 0,CMPLNT_NUM,CMPLNT_FR_DT,CMPLNT_FR_TM,CMPLNT_TO_DT,CMPLNT_TO_TM,ADDR_PCT_CD,RPT_DT,KY_CD,OFNS_DESC,PD_CD,...,LOC_OF_OCCUR_DESC,PREM_TYP_DESC,JURIS_DESC,PARKS_NM,HADEVELOPT,X_COORD_CD,Y_COORD_CD,Latitude,Longitude,Lat_Lon
0,80488152,08/27/2011,02:00:00,,(null),41.0,08/27/2011,117,DANGEROUS DRUGS,503,...,(null),STREET,N.Y. POLICE DEPT,(null),(null),1013037.0,236657.0,40.816206,-73.896001,"(40.8162058439227, -73.8960011932583)"
1,25436757,11/21/2000,19:10:00,11/21/2006,19:15:00,75.0,11/21/2006,235,DANGEROUS DRUGS,511,...,(null),STREET,N.Y. POLICE DEPT,(null),(null),1017036.0,183890.0,40.671360,-73.881811,"(40.6713598203364, -73.8818110231735)"
2,10354189,02/09/2005,21:25:00,02/09/2006,21:30:00,113.0,02/09/2006,117,DANGEROUS DRUGS,503,...,INSIDE,RESIDENCE-HOUSE,N.Y. POLICE DEPT,(null),(null),1046315.0,187088.0,40.679981,-73.776234,"(40.6799807384666, -73.7762339071953)"
3,10049866,01/10/2005,21:50:00,,(null),42.0,01/10/2006,117,DANGEROUS DRUGS,503,...,(null),STREET,N.Y. POLICE DEPT,(null),(null),1008690.0,238862.0,40.822271,-73.911698,"(40.8222710411331, -73.911697780277)"
4,26116195,01/04/2005,19:18:00,,(null),48.0,01/04/2007,235,DANGEROUS DRUGS,511,...,INSIDE,RESIDENCE - APT. HOUSE,N.Y. POLICE DEPT,(null),(null),1011751.0,246839.0,40.844157,-73.900605,"(40.8441566000203, -73.9006054489734)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
443490,261180996,12/31/2022,22:59:00,12/31/2022,23:42:00,48.0,12/31/2022,117,DANGEROUS DRUGS,503,...,FRONT OF,STREET,N.Y. POLICE DEPT,(null),(null),1014655.0,248327.0,40.848224,-73.890098,"(40.848224, -73.890098)"
443491,261166565,12/31/2022,11:53:00,12/31/2022,11:55:00,28.0,12/31/2022,117,DANGEROUS DRUGS,501,...,INSIDE,STREET,N.Y. POLICE DEPT,(null),(null),997602.0,230430.0,40.799146,-73.951772,"(40.799146, -73.951772)"
443492,261150748,12/30/2022,21:05:00,12/30/2022,21:07:00,52.0,12/30/2022,117,DANGEROUS DRUGS,503,...,OPPOSITE OF,STREET,N.Y. POLICE DEPT,(null),(null),1016545.0,255351.0,40.867497,-73.883234,"(40.867497, -73.883234)"
443493,261158092,12/31/2022,02:55:00,12/31/2022,03:00:00,71.0,12/31/2022,117,DANGEROUS DRUGS,510,...,(null),TRANSIT - NYC SUBWAY,N.Y. TRANSIT POLICE,(null),(null),995908.0,183618.0,40.670658,-73.957974,"(40.67065802, -73.95797447)"


In [None]:
raw_census_total

Unnamed: 0,Orig Order,GeoType,Borough,GeoID,Name,NTA Type,Pop_10,Pop_20,Pop Change,Natural Change,Net Migration
0,1,NYC,New York City,0,NYC (adjusted for citywide total population in...,,8242624,8804190,561566,612638,-51072
1,2,Boro,Manhattan,1,Manhattan,,1585873,1694251,108378,81949,26429
2,3,Boro,Bronx,2,Bronx,,1385108,1472654,87546,114402,-26856
3,4,Boro,Brooklyn,3,Brooklyn,,2504700,2736074,231374,246479,-15105
4,5,Boro,Queens,4,Queens,,2230722,2405464,174742,152976,21766
...,...,...,...,...,...,...,...,...,...,...,...
263,264,NTA2020,Staten Island,SI0391,Freshkills Park (South),9.0,97,95,-2,9,-11
264,265,NTA2020,Staten Island,SI9561,Fort Wadsworth,6.0,731,495,-236,120,-356
265,266,NTA2020,Staten Island,SI9591,Hoffman & Swinburne Islands,9.0,0,0,0,0,0
266,267,NTA2020,Staten Island,SI9592,Miller Field,9.0,31,46,15,-1,16
