# Performance Metrics Hands-On Exercise

This hands-on exercise encourages the utilization of different evaluation metrics using this dataset: https://www.kaggle.com/datasets/uciml/student-alcohol-consumption

Instructions:
 1. Fork the given repository
 2. Rename 'Exercise.ipynb' to '( lastname ).ipynb'
 3. Load the given dataset
 4. For No. 2-4, preprocess the dataset then construct, train, and evaluate a **classification** model (You may experiment in this step)
 5. For No. 5-7, preprocess the dataset then construct, train, and evaluate a **regression** model (You may experiment in this step)

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import accuracy_score, mean_squared_error

### 1. Load the Data

In [50]:
usedCol=['school','sex','age','Medu','Fedu','Mjob','Fjob','studytime','failures','famrel','freetime','goout','Dalc','Walc','health','absences']
data = pd.read_csv('student-por.csv', usecols=usedCol)
data.head()

Unnamed: 0,school,sex,age,Medu,Fedu,Mjob,Fjob,studytime,failures,famrel,freetime,goout,Dalc,Walc,health,absences
0,GP,F,18,4,4,at_home,teacher,2,0,4,3,4,1,1,3,4
1,GP,F,17,1,1,at_home,other,2,0,5,3,3,1,1,3,2
2,GP,F,15,1,1,at_home,other,2,0,4,3,2,2,3,3,6
3,GP,F,15,4,2,health,services,3,0,3,2,2,1,1,5,0
4,GP,F,16,3,3,other,other,2,0,4,3,2,1,2,5,0


### 2. Preprocess the Data (Classification)

In [34]:
data['school'] = data['school'].map({'GP': 0, 'MS': 1})
data['sex'] = data['sex'].map({'F': 0, 'M': 1})
data['Mjob'] = data['Mjob'].map({'teacher': 0, 'health': 1, 'services': 2, 'at_home': 3, 'other': 4})
data['Fjob'] = data['Fjob'].map({'teacher': 0, 'health': 1, 'services': 2, 'at_home': 3, 'other': 4})

In [35]:
data.head()

Unnamed: 0,school,sex,age,Medu,Fedu,Mjob,Fjob,studytime,failures,famrel,freetime,goout,Dalc,Walc,health,absences
0,0,0,18,4,4,3,0,2,0,4,3,4,1,1,3,4
1,0,0,17,1,1,3,4,2,0,5,3,3,1,1,3,2
2,0,0,15,1,1,3,4,2,0,4,3,2,2,3,3,6
3,0,0,15,4,2,1,2,3,0,3,2,2,1,1,5,0
4,0,0,16,3,3,4,4,2,0,4,3,2,1,2,5,0


In [36]:
# study time as the target for both classification and regression

In [37]:
data=data.dropna()

In [39]:
data.head()

Unnamed: 0,school,sex,age,Medu,Fedu,Mjob,Fjob,studytime,failures,famrel,freetime,goout,Dalc,Walc,health,absences
0,0,0,18,4,4,3,0,2,0,4,3,4,1,1,3,4
1,0,0,17,1,1,3,4,2,0,5,3,3,1,1,3,2
2,0,0,15,1,1,3,4,2,0,4,3,2,2,3,3,6
3,0,0,15,4,2,1,2,3,0,3,2,2,1,1,5,0
4,0,0,16,3,3,4,4,2,0,4,3,2,1,2,5,0


In [41]:
X=data.drop('studytime', axis=1) 
y=data['studytime']

In [42]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=16)

### 3. Construct and Train the Model (Classification)

In [43]:
# Logistic Regression 

In [44]:
clf = LogisticRegression(max_iter=1250)
clf.fit(X_train, y_train)

### 4. Evaluate the Model (Classification)

In [45]:
classification_accuracy = accuracy_score(clf.predict(X_test), y_test)
print(f"Classification accuracy: {classification_accuracy:.4f}")

Classification accuracy: 0.5462


### 5. Preprocess the Data (Regression)

In [51]:
data['school'] = data['school'].map({'GP': 0, 'MS': 1})
data['sex'] = data['sex'].map({'F': 0, 'M': 1})
data['Mjob'] = data['Mjob'].map({'teacher': 0, 'health': 1, 'services': 2, 'at_home': 3, 'other': 4})
data['Fjob'] = data['Fjob'].map({'teacher': 0, 'health': 1, 'services': 2, 'at_home': 3, 'other': 4})

In [52]:
data=data.dropna()

In [53]:
X=data.drop('studytime', axis=1) 
y=data['studytime']

In [54]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=16)

### 6. Construct and Train the Model (Regression)

In [55]:
reg = LinearRegression()
reg.fit(X_train, y_train)

### 7. Evaluate the Model (Regression)

In [56]:
regression_mse = mean_squared_error(reg.predict(X_test), y_test)
print(f"Mean Squared Error for regression: {regression_mse:.4f}")

Mean Squared Error for regression: 0.7063
