To create three binary classification problems from your dataset, we will need to select three different target variables and process the rest of the dataset to use as features. We can convert categorical data into numerical data using one-hot encoding, which transforms each category value into a new binary column (0 or 1).

The three target variables I select are G3 (final grade), school (which school the student attended), and internet (whether the student has internet access at home). We will create binary classes for these variables as follows:

In [1]:
import pandas as pd

# Load the data
file_path = '/Users/jamescheng/Desktop/WASHU/CSE 514/student+performance/student/student-mat.csv'
data = pd.read_csv(file_path, delimiter=';')

# One-hot encoding for categorical variables except for our binary targets
features = pd.get_dummies(data.drop(['G3', 'school', 'internet'], axis=1))

# Binary classification problem 1: Predicting high/low grade based on median G3 value
median_g3 = data['G3'].median()
data['high_performance'] = (data['G3'] > median_g3).astype(int)

# Binary classification problem 2: Predicting school (already binary, just encode directly)
data['school_binary'] = (data['school'] == 'GP').astype(int)

# Binary classification problem 3: Predicting internet access
data['internet_binary'] = (data['internet'] == 'yes').astype(int)

# The target variables are now 'high_performance', 'school_binary', 'internet_binary'
targets = data[['high_performance', 'school_binary', 'internet_binary']]

In this process:

1. I used get_dummies to one-hot encode the categorical features.
2. I derived binary targets from G3, school, and internet to set up three different binary classification problems.