# About Dataset
Dataset Column Descriptions

This dataset includes various features that were known at the time of student enrollment. Below is a description of each column in the dataset:


**Marital Status:** The marital status of the student (e.g., single, married, divorced).

**Application Mode:** Refers to the mode or type of application the student submitted to enroll in the course.

**Application Order:** Indicates the order in which the student applied for the course. For example, whether it was the student’s first, second, or third choice.

**Course:** The course or degree program the student is enrolled in (e.g., Computer Science, Engineering, etc.).

**Daytime/Evening Attendance:** Specifies whether the student attends the course during the day or in the evening, representing their attendance schedule.

**Previous Qualification:** The type of academic qualification the student had before enrolling in the course (e.g., high school diploma, vocational training).

**Previous Qualification (Grade):** The final grade or score associated with the student's previous qualification.

**Nationality:** The nationality of the student.

**Mother's Qualification:** The highest academic qualification attained by the student's mother.

**Father's Qualification:** The highest academic qualification attained by the student's father.

These features represent important demographic, academic, and socio-economic factors, which are crucial for predicting a student's academic outcome.

**Link to dataset:**

https://www.kaggle.com/datasets/syedfaizanalii/predict-students-dropout-and-academic-success

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

# from scipy import interp
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import auc, confusion_matrix,  f1_score, precision_score, recall_score, roc_curve, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt


In [2]:
# df = pd.read_csv("Predict_Student_Dropout_and_Academic_Success.csv")
df = pd.read_csv("Predict_Student_Dropout_and_Academic_Success.csv", sep=";")
df = df.dropna()




# SVM

In [6]:
X = df.drop(df.columns[-1], axis=1)

le = LabelEncoder()
df["Target"] = le.fit_transform(df["Target"])
y = df["Target"]

y.unique()

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.3, random_state=32)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

param_grid_svm = {
    # 'C': [0.1, 1, 10, 100],
    'C': [0.1, 1],
    # 'gamma': [1, 0.1, 0.01, 0.001],
    'gamma': [1, 0.1],
    'kernel': ['linear', 'rbf', 'poly', 'sigmoid'],
    'class_weight': ['balanced']
}

# Hyper parameter optimization
svm = GridSearchCV(estimator=SVC(), param_grid=param_grid_svm)

svm.fit(X_train, y_train)

y_pred_svm = svm.predict(X_test)


print("SVM metrics")
print("Accuracy:", accuracy_score(y_test, y_pred_svm))
print("F1 score:", f1_score(y_test, y_pred_svm, average="micro"))
print("Recall score:", recall_score(y_test, y_pred_svm, average="micro"))
print("Precision score:", precision_score(y_test, y_pred_svm, average="micro"))
print("___")
print("Best params:")
print([svm.best_params_])



SVM metrics
Accuracy: 0.7384565708750404
F1 score: 0.7384565708750404
Recall score: 0.7384565708750404
Precision score: 0.7384565708750404
___
Best params:
['C', 'class_weight', 'gamma', 'kernel']


# Logistic regression

In [None]:

lr = LogisticRegression(max_iter = 1000)
lr.fit(X_train, y_train)
lr.score(X_train, y_train)

scores = []
C = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2]

for choice in C:
  lr.set_params(C=choice)
  lr.fit(X_train,y_train)
  scores.append(lr.score(X_train, y_train))

# print(max(scores))

y_pred = lr.predict(X_test)



