#Xgboost model
The NBA draft is an annual event in which teams select players from their American colleges as well as international professional leagues to join their rosters. Moving to the NBA league is a big deal for any basketball player.

Sport commentators and fans are very excited to follow the careers of college players and guess who will be drafted by an NBA team.

You are tasked to build a model that will predict if a college basketball player will be drafted to join the NBA league based on his statistics for the current season.

The metric used to assess model performance is AUROC (Area Under ROC).

Importing important libraries

In [33]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score
from xgboost import XGBRegressor
from sklearn.impute import SimpleImputer

# Load the training and test datasets

In [34]:
train_data = pd.read_csv('/content/train.csv', dtype={'yr': str})
test_data = pd.read_csv('/content/test.csv', dtype={'yr': str})


  train_data = pd.read_csv('/content/train.csv', dtype={'yr': str})


# Define features and target variable

In [35]:
numeric_features = ['GP', 'Min_per', 'Ortg', 'usg', 'eFG', 'TS_per', 'ORB_per', 'DRB_per',
                   'AST_per', 'TO_per', 'FTM', 'FTA', 'FT_per', 'twoPM', 'twoPA', 'twoP_per',
                   'TPM', 'TPA', 'TP_per', 'blk_per', 'stl_per', 'ftr', 'porpag', 'adjoe',
                   'Rec_Rank', 'ast_tov', 'rim_ratio', 'mid_ratio', 'dunks_ratio',
                   'pick', 'drtg', 'adrtg', 'dporpag', 'stops', 'bpm', 'obpm', 'dbpm', 'gbpm',
                   'mp', 'ogbpm', 'dgbpm', 'oreb', 'dreb', 'treb', 'ast', 'stl', 'blk', 'pts']

target = 'drafted'


# Drop rows with NaN values in the selected numeric features

In [36]:
train_data = train_data.dropna(subset=numeric_features)

# Split data into features and target

In [37]:
X = train_data[numeric_features]
y = train_data[target]

# Split data into training and validation sets

In [38]:
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)


# Initialize and train an XGBoost Regressor

In [39]:
xgb_regressor = XGBRegressor(random_state=42)
xgb_regressor.fit(X_train, y_train)


# Predict probabilities on the validation set

In [40]:

y_val_pred_prob = xgb_regressor.predict(X_val)


# Calculate AUROC score

In [41]:
auroc_score = roc_auc_score(y_val, y_val_pred_prob)
print(f'AUROC Score: {auroc_score:.4f}')


AUROC Score: 0.7434


# Initialize a SimpleImputer to fill missing values with the mean

In [42]:
imputer = SimpleImputer(strategy='mean')
X_test_imputed = imputer.fit_transform(test_data[numeric_features])

# Preprocess test data using standard scaling

In [43]:
scaler = StandardScaler()
X_test_scaled = scaler.fit_transform(X_test_imputed)

# Predict on the test set

In [44]:
y_test_pred_prob = xgb_regressor.predict(X_test_scaled)


# Submission file

In [45]:
submission = pd.DataFrame({'player_id': test_data['player_id'], 'drafted': y_test_pred_prob})
submission.to_csv('submission_xgboost.csv', index=False)
