## Machine Learning Exercise 3 - Multi-Class Classification

Importing necessary libraries and loading the dataset from a CSV file.
Also, transforming the 'Extracurricular_Activities' column using label encoding.

In [14]:
import numpy as np
import pandas as pd
from scipy.optimize import minimize
from sklearn.preprocessing import LabelEncoder

# Загрузка данных из CSV-файла
data = pd.read_csv('Student_Performance.csv')
label_encoder = LabelEncoder()
data['Extracurricular_Activities'] = label_encoder.fit_transform(data['Extracurricular_Activities'])
data.head()


Unnamed: 0,Hours_Studied,Previous_Scores,Extracurricular_Activities,Sleep_Hours,Sample_Question_Papers_Practiced,Performance_Index
0,7,99,1,9,1,91.0
1,4,82,0,4,2,65.0
2,8,51,1,7,2,45.0
3,5,52,1,5,2,36.0
4,7,75,0,8,5,66.0


Defining a function to categorize performance levels based on the performance index.
Applying this function to create a new column 'Performance Level' in the dataset.

In [15]:
# Определение уровней производительности
def performance_level(index):
    if index >= 90:
        return 3
    elif 70 <= index < 90:
        return 2
    elif 50 <= index < 70:
        return 1
    else:
        return 0

data['Performance Level'] = data['Performance_Index'].apply(performance_level)


# Разделение данных на признаки (X) и целевую переменную (y)
X = data.drop(['Performance_Index', 'Performance Level'], axis=1)
y = data['Performance Level']

print("Shape of X:", X.shape)
print("Shape of Y:", y.shape)




Shape of X: (999, 5)
Shape of Y: (999,)


Defining the sigmoid function, cost function and gradient for logistic regression.

These functions will be used in the optimization process for parameter estimation.

In [16]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def cost(theta, X, y, learningRate):
    m = len(y)
    h = sigmoid(X.dot(theta))
    J = (1 / m) * (-y.dot(np.log(h)) - (1 - y).dot(np.log(1 - h))) + (learningRate / (2 * m)) * np.sum(theta[1:]**2)
    return J

def gradient(theta, X, y, learningRate):
    m = len(y)
    h = sigmoid(X.dot(theta))
    grad = (1 / m) * X.T.dot(h - y) + (learningRate / m) * np.r_[0, theta[1:]]
    return grad


def one_vs_all(X, y, num_labels, learning_rate):
    rows, params = X.shape
    
    all_theta = np.zeros((num_labels, params))
    
    for i in range(1, num_labels + 1):
        theta = np.zeros(params)
        y_i = (y == i).astype(int)
        
        fmin = minimize(fun=cost, x0=theta, args=(X, y_i, learning_rate), method='TNC', jac=gradient)
        all_theta[i-1, :] = fmin.x
    
    return all_theta

Implementing the one-vs-all classification for logistic regression.

This function trains multiple logistic regression classifiers, one for each class.

In [None]:
all_theta = one_vs_all(X, y, 4, 1)
print(all_theta)

Function to predict the class for each instance in the dataset using the trained model.

In [17]:
def predict_all(X, all_theta):
    m, n = X.shape
    num_labels = all_theta.shape[0]
    
    X = np.matrix(X)
    
    h = sigmoid(X * all_theta.T)
    
    h_argmax = np.argmax(h, axis=1)
    h_argmax = h_argmax + 1
    
    return h_argmax

[[-5.10029318e-02  7.05210761e-03 -2.73277158e-01 -1.09485366e-01
  -3.56906280e-02]
 [ 2.32052468e-02  4.15967959e-02 -4.97399210e-01 -5.13891195e-01
  -1.28854231e-01]
 [ 1.98372924e-01  1.06322355e-02 -7.05587317e-01 -7.66385506e-01
  -1.39277216e-01]
 [-1.67463578e+01 -5.07185466e-04 -4.13269000e-06 -4.75036495e-05
  -3.17465311e-05]]


Predicting the labels for the dataset and calculating the accuracy of the model.

In [18]:
y_pred = predict_all(X, all_theta)
y_pred = np.array(y_pred).ravel()

accuracy = np.mean(y_pred == y) * 100
print('Accuracy = {:.2f}%'.format(accuracy))

Точность = 34.03%


The resulting model accuracy was 34.03%, indicating low model performance in the current configuration.

Using scikit-learn's logistic regression model to predict and evaluate the model's performance.

In [19]:
from sklearn import linear_model
model = linear_model.LogisticRegression(solver='newton-cg')
model.fit(X, y_pred)


print(f'acc: {model.score(X, y_pred)}')


acc: 0.992992992992993


Additionally, a logistic regression model from the scikit-learn library was used and showed high accuracy (99.29%). This may indicate that the model performs well in predicting the classes it has already seen