# Intoduction

Education is a fundamental aspect of personal and professional growth, providing individuals with the knowledge and skills needed to succeed in today's rapidly changing world. Higher education institutions play a critical role in shaping the future of individuals and societies by providing opportunities for learning, personal and professional growth, and the development of important skills. However, despite the important role that higher education institutions play, student dropout and academic success continue to be significant challenges worldwide.

Student dropout rates have serious social and economic consequences, including lost potential and talent, reduced workforce, and increased social inequality. Dropout rates have been a long-standing concern for higher education institutions, policymakers, and educators, with many efforts made to address the issue. Despite these efforts, the problem continues to persist, highlighting the need for further research to identify the predictors of student dropout and academic success.

# Problem Statement

Despite the crucial role played by higher education institutions in providing knowledge and skills, student dropout and academic success continue to be significant challenges worldwide. Dropout rates have serious social and economic consequences, including lost potential and talent, reduced workforce, and increased social inequality. Therefore, understanding the factors that contribute to student dropout and academic success is crucial for educational institutions, policymakers, and educators.

To address this challenge, a comprehensive dataset has been collected that includes demographic data, socio-economic factors, and academic performance information of students enrolled in various undergraduate degrees offered at a higher education institution. The dataset also includes information about the courses chosen by the students, their application mode, marital status, and other relevant information available at the time of enrollment.

By analyzing this dataset, we can identify the predictors of student dropout and academic success across a wide range of disciplines offered at a higher education institution. This analysis can provide valuable insights into what motivates students to stay in school or abandon their studies, leading to effective interventions that support student retention and success.



![Screenshot%202023-03-09%20at%206.02.48%20AM.png](attachment:Screenshot%202023-03-09%20at%206.02.48%20AM.png)

Table of Content

# Import Libraries

In [10]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load Dataset

In [11]:
# Load the dataset
df = pd.read_csv('dataset.csv')

## Dataset Explanation

This dataset provides information about undergraduate students enrolled in various disciplines offered at a higher education institution. The dataset contains demographic data, socio-economic factors, and academic performance information, as well as information about the courses chosen by the students, their application mode, marital status, and other relevant information available at the time of enrollment.

The dataset includes categorical variables such as marital status, application mode, course, daytime/evening attendance, previous qualification, nationality, mother's qualification, father's qualification, mother's occupation, father's occupation, displaced, educational special needs, debtor, tuition fees up to date, gender, scholarship holder, and international. It also contains numerical variables such as age at enrollment, the order in which the student applied, and the number of curricular units credited, enrolled, evaluated, and approved in the first semester.

This dataset can be used to analyze the possible predictors of student dropout and academic success. By understanding the factors that contribute to student retention and success across a wide range of disciplines, educational institutions, policymakers, and educators can develop strategies to promote student retention and success, leading to positive social and economic outcomes. The dataset also provides information about economic factors such as unemployment rate, inflation rate, and GDP from the region, which can help us further understand how economic factors play into student dropout rates or academic success outcomes.

# Exploratory Data Analysis

## Basic Analysis

In [12]:
# View the first 5 rows of the dataset
display(df.head())

# View the shape of the dataset
display(df.shape)

# View summary statistics of numerical columns
display(df.describe())

# Count the number of unique values in each categorical column
for column in df.select_dtypes(include=['object']):
    display(f"{column}: {df[column].nunique()}")

# Calculate the correlation matrix between numerical columns
corr_matrix = df.corr()
display(corr_matrix)


Unnamed: 0,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Nacionality,Mother's qualification,Father's qualification,Mother's occupation,...,Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP,Target
0,1,8,5,2,1,1,1,13,10,6,...,0,0,0,0,0.0,0,10.8,1.4,1.74,Dropout
1,1,6,1,11,1,1,1,1,3,4,...,0,6,6,6,13.666667,0,13.9,-0.3,0.79,Graduate
2,1,1,5,5,1,1,1,22,27,10,...,0,6,0,0,0.0,0,10.8,1.4,1.74,Dropout
3,1,8,2,15,1,1,1,23,27,6,...,0,6,10,5,12.4,0,9.4,-0.8,-3.12,Graduate
4,2,12,1,3,0,1,1,22,28,10,...,0,6,6,6,13.0,0,13.9,-0.3,0.79,Graduate


(4424, 35)

Unnamed: 0,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Nacionality,Mother's qualification,Father's qualification,Mother's occupation,...,Curricular units 1st sem (without evaluations),Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP
count,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,...,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0
mean,1.178571,6.88698,1.727848,9.899186,0.890823,2.53142,1.254521,12.322107,16.455244,7.317812,...,0.137658,0.541817,6.232143,8.063291,4.435805,10.230206,0.150316,11.566139,1.228029,0.001969
std,0.605747,5.298964,1.313793,4.331792,0.311897,3.963707,1.748447,9.026251,11.0448,3.997828,...,0.69088,1.918546,2.195951,3.947951,3.014764,5.210808,0.753774,2.66385,1.382711,2.269935
min,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.6,-0.8,-4.06
25%,1.0,1.0,1.0,6.0,1.0,1.0,1.0,2.0,3.0,5.0,...,0.0,0.0,5.0,6.0,2.0,10.75,0.0,9.4,0.3,-1.7
50%,1.0,8.0,1.0,10.0,1.0,1.0,1.0,13.0,14.0,6.0,...,0.0,0.0,6.0,8.0,5.0,12.2,0.0,11.1,1.4,0.32
75%,1.0,12.0,2.0,13.0,1.0,1.0,1.0,22.0,27.0,10.0,...,0.0,0.0,7.0,10.0,6.0,13.333333,0.0,13.9,2.6,1.79
max,6.0,18.0,9.0,17.0,1.0,17.0,21.0,29.0,34.0,32.0,...,12.0,19.0,23.0,33.0,20.0,18.571429,12.0,16.2,3.7,3.51


'Target: 3'

Unnamed: 0,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Nacionality,Mother's qualification,Father's qualification,Mother's occupation,...,Curricular units 1st sem (without evaluations),Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP
Marital status,1.0,0.224855,-0.125854,0.018925,-0.274939,0.120925,-0.020722,0.185522,0.128326,0.069734,...,0.034711,0.062831,0.039026,0.022784,-0.043739,-0.071506,0.020426,-0.020338,0.008761,-0.027003
Application mode,0.224855,1.0,-0.246497,-0.085116,-0.268616,0.433028,-0.00136,0.092867,0.072798,0.033489,...,0.040255,0.228973,0.127461,0.164992,-0.065203,-0.104424,0.042009,0.091567,-0.019613,-0.014563
Application order,-0.125854,-0.246497,1.0,0.118928,0.158657,-0.199029,-0.029385,-0.061719,-0.049936,-0.046591,...,-0.031699,-0.125815,0.028878,-0.055089,0.071793,0.055517,-0.015757,-0.098419,-0.011133,0.030201
Course,0.018925,-0.085116,0.118928,1.0,-0.070232,-0.158382,-0.004761,0.058909,0.045659,0.029672,...,-0.060483,-0.12039,0.185879,0.049236,0.12,0.178997,-0.013984,-0.050116,0.028775,-0.012518
Daytime/evening attendance,-0.274939,-0.268616,0.158657,-0.070232,1.0,-0.103022,0.024433,-0.195346,-0.137769,-0.037986,...,0.04563,-0.111953,0.000371,0.01461,0.034022,0.050493,-0.004229,0.061974,-0.024043,0.022929
Previous qualification,0.120925,0.433028,-0.199029,-0.158382,-0.103022,1.0,-0.038997,0.018868,0.013152,0.00619,...,0.018276,0.138463,0.05645,0.101501,-0.037265,-0.038765,0.024186,0.096914,-0.056388,0.053968
Nacionality,-0.020722,-0.00136,-0.029385,-0.004761,0.024433,-0.038997,1.0,-0.043847,-0.088892,0.044123,...,0.026203,-0.000747,-0.020103,-0.018023,-0.014142,-0.005409,-0.012052,-0.006013,-0.012331,0.044563
Mother's qualification,0.185522,0.092867,-0.061719,0.058909,-0.195346,0.018868,-0.043847,1.0,0.524529,0.295178,...,0.003293,0.036986,0.03307,0.018874,-0.013161,-0.028472,0.020364,-0.106107,0.056653,-0.079664
Father's qualification,0.128326,0.072798,-0.049936,0.045659,-0.137769,0.013152,-0.088892,0.524529,1.0,0.207067,...,-0.017785,0.041695,0.023635,0.009471,0.006052,-0.006508,-0.008493,-0.075417,0.056661,-0.0702
Mother's occupation,0.069734,0.033489,-0.046591,0.029672,-0.037986,0.00619,0.044123,0.295178,0.207067,1.0,...,-0.012569,-0.002057,0.009287,0.011546,0.022309,0.03523,-0.004903,-0.011772,0.015014,0.09188


In [13]:
df['Target'].unique()

array(['Dropout', 'Graduate', 'Enrolled'], dtype=object)

# Questions 

- Identify the most common reasons for student dropout and suggest interventions to reduce the dropout rate.
- Analyze the correlation between marital status and academic success, considering other factors such as gender and age.
- Compare the academic performance of scholarship holders and non-scholarship holders, considering the course taken by the students.
- Evaluate the impact of special educational needs on academic success and suggest support measures for students with special needs.
- Analyze the relationship between the number of curricular units credited/enrolled/evaluated/approved and academic success.
- Investigate the effect of previous qualifications on student performance in higher education, considering other factors such as age, gender, and course.
- Analyze the relationship between the student's age at enrollment and academic success, considering other factors such as gender and course.
- Evaluate the impact of tuition fees payment status on academic performance.
- Investigate the impact of displacement on academic success and suggest support measures for displaced students.
- Analyze the relationship between the student's gender and academic performance, considering other factors such as course, age, and marital status.