Dataset can be downloaded from https://www.kaggle.com/datasets/amrmaree/student-performance-prediction?resource=download 
### 1. Barebones (Standard Scikit-Learn)
**Description:**
- This function loads and preprocesses the student performance dataset.
- It splits the data into training and testing sets.
- It trains an XGBoost model on the training data using default scikit-learn settings.
- The model is then used to predict outcomes on the testing data.
- The accuracy of the model is calculated and printed.
- Some example predictions are printed to compare actual vs. predicted values.

### 2. Using cuPY with NVIDIA GPU for XGBoost
**Description:**
- This function leverages cuPY for GPU acceleration and NVIDIA GPU for training the XGBoost model.
- The student performance dataset is loaded and preprocessed, and then converted to cuPY arrays.
- The data is split into training and testing sets using cuPY.
- An XGBoost model with GPU support is trained on the training data.
- The model predicts outcomes on the testing data, and the accuracy is calculated and printed.
- Some example predictions are printed to compare actual vs. predicted values.
- The `device` parameter is set to `cuda` for GPU training.

### 3. Using Intelex (Intel Optimizations)
**Description:**
- This function applies the Intel Extension for Scikit-learn to speed up computations.
- The student performance dataset is loaded and preprocessed.
- The data is split into training and testing sets.
- An XGBoost model is trained using Intel optimizations on the training data.
- The model predicts outcomes on the testing data, and the accuracy is calculated and printed.
- Some example predictions are printed to compare actual vs. predicted values.

In [1]:
import warnings
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

def main_barebones():
    # Suppress warnings
    warnings.filterwarnings("ignore", category=FutureWarning)
    
    # Load the Data
    data = pd.read_csv("student_performance_dataset.csv")
    
    # Preprocess the Data
    label_encoder = LabelEncoder()
    data['Gender'] = label_encoder.fit_transform(data['Gender'])
    data['Parental_Education_Level'] = label_encoder.fit_transform(data['Parental_Education_Level'])
    data['Internet_Access_at_Home'] = label_encoder.fit_transform(data['Internet_Access_at_Home'])
    data['Extracurricular_Activities'] = label_encoder.fit_transform(data['Extracurricular_Activities'])
    data['Pass_Fail'] = label_encoder.fit_transform(data['Pass_Fail'])
    
    # Split the Data
    X = data.drop(['Student_ID', 'Pass_Fail'], axis=1)
    y = data['Pass_Fail']
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Train the Model using XGBoost
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dtest = xgb.DMatrix(X_test, label=y_test)
    
    params = {'objective': 'binary:logistic', 'random_state': 42}
    model = xgb.train(params, dtrain, num_boost_round=100)
    
    # Evaluate the Model
    y_pred = (model.predict(dtest) > 0.5).astype(int)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy (Barebones): {accuracy * 100:.2f}%")
    
    # Print some predictions
    predictions_df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
    print(predictions_df.head())

if __name__ == "__main__":
    main_barebones()


Accuracy (Barebones): 100.00%
     Actual  Predicted
120       0          0
247       1          1
324       1          1
204       1          1
603       1          1


Using cuPy

In [2]:
import warnings
import pandas as pd
import cupy as cp
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

def main_cupy_gpu():
    # Suppress warnings
    warnings.filterwarnings("ignore", category=FutureWarning)
    
    # Load the Data
    data = pd.read_csv("student_performance_dataset.csv")
    
    # Preprocess the Data
    label_encoder = LabelEncoder()
    data['Gender'] = label_encoder.fit_transform(data['Gender'])
    data['Parental_Education_Level'] = label_encoder.fit_transform(data['Parental_Education_Level'])
    data['Internet_Access_at_Home'] = label_encoder.fit_transform(data['Internet_Access_at_Home'])
    data['Extracurricular_Activities'] = label_encoder.fit_transform(data['Extracurricular_Activities'])
    data['Pass_Fail'] = label_encoder.fit_transform(data['Pass_Fail'])
    
    # Convert DataFrame to cuPy array
    X = data.drop(['Student_ID', 'Pass_Fail'], axis=1).to_numpy()
    y = data['Pass_Fail'].to_numpy()
    
    X = cp.asarray(X)
    y = cp.asarray(y)
    
    # Split the Data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Train the Model using XGBoost with GPU support
    dtrain = xgb.DMatrix(cp.asnumpy(X_train), label=cp.asnumpy(y_train))
    dtest = xgb.DMatrix(cp.asnumpy(X_test), label=cp.asnumpy(y_test))
    
    params = {
        'objective': 'binary:logistic',
        'tree_method': 'hist',  
        'device': 'cuda',       # Use GPU
        'random_state': 42
    }
    model = xgb.train(params, dtrain, num_boost_round=100)
    
    # Evaluate the Model
    y_pred = (model.predict(dtest) > 0.5).astype(int)
    accuracy = accuracy_score(cp.asnumpy(y_test), y_pred)
    print(f"Accuracy (cuPY + GPU): {accuracy * 100:.2f}%")
    
    # Print some predictions
    predictions_df = pd.DataFrame({'Actual': cp.asnumpy(y_test), 'Predicted': y_pred})
    print(predictions_df.head())

if __name__ == "__main__":
    main_cupy_gpu()


Accuracy (cuPY + GPU): 100.00%
   Actual  Predicted
0       0          0
1       1          1
2       1          1
3       1          1
4       1          1


Using Intelex

In [3]:
import warnings
import pandas as pd
import xgboost as xgb
from sklearnex import patch_sklearn
patch_sklearn()  # Apply the patch to scikit-learn

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

def main_intelex():
    # Suppress warnings
    warnings.filterwarnings("ignore", category=FutureWarning)
    
    # Load the Data
    data = pd.read_csv("student_performance_dataset.csv")
    
    # Preprocess the Data
    label_encoder = LabelEncoder()
    data['Gender'] = label_encoder.fit_transform(data['Gender'])
    data['Parental_Education_Level'] = label_encoder.fit_transform(data['Parental_Education_Level'])
    data['Internet_Access_at_Home'] = label_encoder.fit_transform(data['Internet_Access_at_Home'])
    data['Extracurricular_Activities'] = label_encoder.fit_transform(data['Extracurricular_Activities'])
    data['Pass_Fail'] = label_encoder.fit_transform(data['Pass_Fail'])
    
    # Split the Data
    X = data.drop(['Student_ID', 'Pass_Fail'], axis=1)
    y = data['Pass_Fail']
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Train the Model using XGBoost
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dtest = xgb.DMatrix(X_test, label=y_test)
    
    params = {'objective': 'binary:logistic', 'random_state': 42}
    model = xgb.train(params, dtrain, num_boost_round=100)
    
    # Evaluate the Model
    y_pred = (model.predict(dtest) > 0.5).astype(int)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy (Intelex): {accuracy * 100:.2f}%")
    
    # Print some predictions
    predictions_df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
    print(predictions_df.head())

if __name__ == "__main__":
    main_intelex()


Accuracy (Intelex): 100.00%
     Actual  Predicted
120       0          0
247       1          1
324       1          1
204       1          1
603       1          1


Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
