## Student Dropout and Academic Success Prediction

### 1. Problem Statement


Student dropout rates are a persistent problem in all learning institutions. The effects of a student dropping out have far reaching consequences beyond them as an individual - institutions lose tuition revenue and funding, governments see diminished returns on public investment in education, and societies forfeit the long-term economic and social contributions of a skilled graduate.

Interestngly, students who drop out typically show early warning signals across academic performance, attendance patterns, engagement levels, and socio-economic factors. However, these signals are often identified too late or assessed subjectively, limiting the effectiveness of timely interventions by the learning institution.

This project proposes the use of supervised machine learning to develop a classification model that is able to identify high-risk students before it is too late.

Given a set of features collected at the time of student enrolment and at the end of the first two academic semesters, we aim to train a model that can accurately predict one of three possible student outcomes:

•	Dropout — Students who leave the institution before completing their programme.
•	Enrolled — Students who remain enrolled but have not yet graduated.
•	Graduate — Students who have successfully completed their degree.

The central research question is: can we identify students at risk of dropping out early enough — and with sufficient confidence — to enable timely, targeted interventions by academic advisors and student support services?


### 2. Importing Libraries & Data

In [None]:
# import all basic libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as mn


In [1]:
#importing the dataset from UCI ML repo

from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
predict_students_dropout_and_academic_success = fetch_ucirepo(id=697) 
  
# data (as pandas dataframes) 
X = predict_students_dropout_and_academic_success.data.features 
y = predict_students_dropout_and_academic_success.data.targets 
  
# metadata 
print(predict_students_dropout_and_academic_success.metadata) 
  
# variable information 
print(predict_students_dropout_and_academic_success.variables) 


{'uci_id': 697, 'name': "Predict Students' Dropout and Academic Success", 'repository_url': 'https://archive.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success', 'data_url': 'https://archive.ics.uci.edu/static/public/697/data.csv', 'abstract': "A dataset created from a higher education institution (acquired from several disjoint databases) related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies.\nThe dataset includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students' academic performance at the end of the first and second semesters. \nThe data is used to build classification models to predict students' dropout and academic sucess. The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes.", 'area': 'Social Sc

### 3. Exploratory Data Analysis

In [3]:
#before we start with EDA, let's do some basic data cleaning and preprocessing

X.head()

Unnamed: 0,Marital Status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Previous qualification (grade),Nacionality,Mother's qualification,Father's qualification,...,Curricular units 1st sem (without evaluations),Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP
0,1,17,5,171,1,1,122.0,1,19,12,...,0,0,0,0,0,0.0,0,10.8,1.4,1.74
1,1,15,1,9254,1,1,160.0,1,1,3,...,0,0,6,6,6,13.666667,0,13.9,-0.3,0.79
2,1,1,5,9070,1,1,122.0,1,37,37,...,0,0,6,0,0,0.0,0,10.8,1.4,1.74
3,1,17,2,9773,1,1,122.0,1,38,37,...,0,0,6,10,5,12.4,0,9.4,-0.8,-3.12
4,2,39,1,8014,0,1,100.0,1,37,38,...,0,0,6,6,6,13.0,0,13.9,-0.3,0.79


In [4]:
y.head()

Unnamed: 0,Target
0,Dropout
1,Graduate
2,Dropout
3,Graduate
4,Graduate
