## __Dropout and Success: Student Data Analysis__

Exploring the Impact of Dropout Rates on Student Success.

https://www.kaggle.com/datasets/marouandaghmoumi/dropout-and-success-student-data-analysis

### __Dataset Description__

This dataset was created from a higher education institution (acquired from several disjoint databases) related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies. The dataset includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students' academic performance at the end of the first and second semesters. The data is used to build classification models to predict students' dropout and academic sucess. The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes.

The dataset includes information known at the time of student enrollment – academic path, demographics, and social-economic factors:

- Marital status: Categorical variable indicating the marital status of the individual.
- Application mode: Categorical variable indicating the mode of application.
- Application order: Numeric variable indicating the order of application.
- Course: Categorical variable indicating the chosen course.
- evening attendance: Binary variable indicating whether the individual attends classes during the daytime or evening.
- Previous qualification: Numeric variable indicating the level of the previous qualification.
- Nationality: Categorical variable indicating the nationality of the individual.
- Mother's qualification: Numeric variable indicating the level of the mother's qualification.
- Father's qualification: Numeric variable indicating the level of the father's qualification.
- Mother's occupation: Categorical variable indicating the mother's occupation.
- Father's occupation: Categorical variable indicating the father's occupation.
- Displaced: Binary variable indicating whether the individual has been displaced (1 – yes 0 – no).
- Educational special needs: Binary variable indicating whether the individual has educational special needs (1 for yes, 0 for no).
- Debtor: Binary variable indicating whether the individual is a debtor (1 for yes, 0 for no).
- Tuition fees up to date: Binary variable indicating whether the tuition fees are up to date (1 for yes, 0 for no).
- Gender: Binary variable indicating the gender of the individual (1 for male, 0 for female).
- Scholarship holder: Binary variable indicating whether the individual holds a scholarship (1 for yes, 0 for no).
- Age at enrollment: Numeric variable indicating the age of the individual at the time of enrollment.
- International: Binary variable indicating whether the individual is international (1 for yes, 0 for no).
- Curricular units 1st sem (credited): Numeric variable indicating the number of credited curricular units in the 1st semester.
- Curricular units 1st sem (enrolled): Numeric variable indicating the number of enrolled curricular units in the 1st semester.
- Curricular units 1st sem (evaluations): Numeric variable indicating the number of evaluations for curricular units in the 1st semester.
- Curricular units 1st sem (approved): Numeric variable indicating the number of approved curricular units in the 1st semester.
- Curricular units 1st sem (grade): Numeric variable indicating the average grade for curricular units in the 1st semester.
- Curricular units 1st sem (without evaluations): Numeric variable indicating the number of curricular units in the 1st semester without evaluations.
- Curricular units 2nd sem (credited): Numeric variable indicating the number of credited curricular units in the 2nd semester.
- Curricular units 2nd sem (enrolled): Numeric variable indicating the number of enrolled curricular units in the 2nd semester.
- Curricular units 2nd sem (evaluations): Numeric variable indicating the number of evaluations for curricular units in the 2nd semester.
- Curricular units 2nd sem (approved): Numeric variable indicating the number of approved curricular units in the 2nd semester.
- Curricular units 2nd sem (grade): Numeric variable indicating the average grade for curricular units in the 2nd semester.
- Curricular units 2nd sem (without evaluations): Numeric variable indicating the number of curricular units in the 2nd semester without evaluations.
- Unemployment rate: variable indicating the unemployment rate(Unemployment rate (%)).
- Inflation rate: Numeric variable indicating the inflation rate(Inflation rate (%)).
- GDP: Numeric variable indicating the Gross Domestic Product.
- output: Categorical variable indicating the target variable (e.g., Dropout, Graduate, Enrolled).

#### __Fetching data from Kaggle__

In [2]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content/kaggle"
os.makedirs("/content/kaggle", exist_ok=True)
!mv kaggle.json /content/kaggle/
!chmod 600 /content/kaggle/kaggle.json
!kaggle datasets download -d marouandaghmoumi/dropout-and-success-student-data-analysis
!unzip dropout-and-success-student-data-analysis.zip

Dataset URL: https://www.kaggle.com/datasets/marouandaghmoumi/dropout-and-success-student-data-analysis
License(s): apache-2.0
Downloading dropout-and-success-student-data-analysis.zip to /content
  0% 0.00/87.2k [00:00<?, ?B/s]
100% 87.2k/87.2k [00:00<00:00, 305MB/s]
Archive:  dropout-and-success-student-data-analysis.zip
  inflating: student_data.csv        


#### __Import the necessary libraries__

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split


#### __Pre-processing the dataset__

In [5]:
student_data = pd.read_csv("student_data.csv", delimiter=';')
student_data.head()

Unnamed: 0,Marital status,Application mode,Application order,Course,evening attendance,Previous qualification,Nacionality,Mother's qualification,Father's qualification,Mother's occupation,...,Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP,Output
0,1,8,5,2,1,1,1,13,10,6,...,0,0,0,0,0.0,0,10.8,1.4,1.74,Dropout
1,1,6,1,11,1,1,1,1,3,4,...,0,6,6,6,13.666667,0,13.9,-0.3,0.79,Graduate
2,1,1,5,5,1,1,1,22,27,10,...,0,6,0,0,0.0,0,10.8,1.4,1.74,Dropout
3,1,8,2,15,1,1,1,23,27,6,...,0,6,10,5,12.4,0,9.4,-0.8,-3.12,Graduate
4,2,12,1,3,0,1,1,22,28,10,...,0,6,6,6,13.0,0,13.9,-0.3,0.79,Graduate


In [6]:
student_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4424 entries, 0 to 4423
Data columns (total 35 columns):
 #   Column                                          Non-Null Count  Dtype  
---  ------                                          --------------  -----  
 0   Marital status                                  4424 non-null   int64  
 1   Application mode                                4424 non-null   int64  
 2   Application order                               4424 non-null   int64  
 3   Course                                          4424 non-null   int64  
 4   evening attendance                              4424 non-null   int64  
 5   Previous qualification                          4424 non-null   int64  
 6   Nacionality                                     4424 non-null   int64  
 7   Mother's qualification                          4424 non-null   int64  
 8   Father's qualification                          4424 non-null   int64  
 9   Mother's occupation                      

In [7]:
student_data.describe()

Unnamed: 0,Marital status,Application mode,Application order,Course,evening attendance,Previous qualification,Nacionality,Mother's qualification,Father's qualification,Mother's occupation,...,Curricular units 1st sem (without evaluations),Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP
count,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,...,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0,4424.0
mean,1.178571,6.88698,1.727848,9.899186,0.890823,2.53142,1.254521,12.322107,16.455244,7.317812,...,0.137658,0.541817,6.232143,8.063291,4.435805,10.230206,0.150316,11.566139,1.228029,0.001969
std,0.605747,5.298964,1.313793,4.331792,0.311897,3.963707,1.748447,9.026251,11.0448,3.997828,...,0.69088,1.918546,2.195951,3.947951,3.014764,5.210808,0.753774,2.66385,1.382711,2.269935
min,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.6,-0.8,-4.06
25%,1.0,1.0,1.0,6.0,1.0,1.0,1.0,2.0,3.0,5.0,...,0.0,0.0,5.0,6.0,2.0,10.75,0.0,9.4,0.3,-1.7
50%,1.0,8.0,1.0,10.0,1.0,1.0,1.0,13.0,14.0,6.0,...,0.0,0.0,6.0,8.0,5.0,12.2,0.0,11.1,1.4,0.32
75%,1.0,12.0,2.0,13.0,1.0,1.0,1.0,22.0,27.0,10.0,...,0.0,0.0,7.0,10.0,6.0,13.333333,0.0,13.9,2.6,1.79
max,6.0,18.0,9.0,17.0,1.0,17.0,21.0,29.0,34.0,32.0,...,12.0,19.0,23.0,33.0,20.0,18.571429,12.0,16.2,3.7,3.51
