# Classification with an Academic Success Dataset
#### [Source](https://www.kaggle.com/competitions/playground-series-s4e6)

### Description
The goal of this competition is to predict academic risk of students in higher education.
Dataset description located [here](https://archive.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success)

### Imports 

In [None]:
import numpy as np
import pandas as pd
import sklearn

### Load Data

In [None]:
train = pd.read_csv('../dataset/train.csv', index_col='id')
test = pd.read_csv('../dataset/test.csv', index_col='id')
print(train.info())

In [None]:
print(train.head())

### Current task:
- Проведите предварительный анализ данных
- Выявите пропуски в данных
- Примите решение по обработке найденных пропусков
- Выявите категориальные признаки
- Преобразуйте категориальные данные
- Нормируйте данные выбранным методом

### Analysis of target

In [None]:
print(train['Target'].value_counts())

There are 3 classes in the target variable. The classes are imbalanced. Target variable is, obviously, categorical, nominal order.

### Analysis of features

In [None]:
print(train.columns)

In [None]:
# for feature in train.columns.tolist():
#     print(train[feature].value_counts())

Based of data description, we can assume types of features this way:
* Nominal scale:
    * Marital Status — Student's marital status
    * Application mode — Indicates the specific pathway or criterion under which the student was admitted
    * Course — The course the student is enrolled
    * Nationality — Encodes the nationality of the student as integers representing specific countries
    * Mother's occupation — Encodes the occupation of the student’s mother, classified into detailed job categories and broader occupational sectors.
    * Father's occupation — Encodes the occupation of the student’s father, classified into detailed job categories and broader occupational sectors.
* Binary nominal scale:
    * Daytime/Evening Attendance — Specifies whether a student attends classes during the day (1) or in the evening (0).
    * Displaced — Indicates whether the student resides away from home to attend their course (1 = Yes, 0 = No)
    * Educational special needs — Indicates whether the student has declared special educational needs (1 = Yes, 0 = No)
    * Debtor — Identifies whether the student owes debts (1 = Yes, 0 = No)
    * Tuition fees up to date — Indicates whether the student is up-to-date with tuition fee payments (1 = Yes, 0 = No)
    * Gender — Student's gender (1 = Male, 0 = Female)
    * Scholarship holder — Indicates whether the student receives a scholarship (1 = Yes, 0 = No)
    * International — Indicates whether the student is an international student (1 = Yes, 0 = No)
* Ordinal scale:
    * Application order — Denotes the student's ranking of the applied course in their list of preferences during the application process
    * Previous qualification — Represents the highest level of prior education achieved by the student
    * Mother's qualification — Represents the highest level of education achieved by the student's mother
    * Father's qualification — Represents the highest level of education achieved by the student's father
* Interval scale: 
    None
* Ratio scale:
    * Previous qualification (grade) — Represents the grade of the student’s previous qualification on a scale from min 0 to max 200. Not ordinal or interval cause have true zero and float value, not int.
    * Admission grade — The grade used for admission, ranging from min 0 to max 200. Not ordinal or interval cause have true zero and float value, not int.
    * Age at enrollment — The student's age at the time of enrollment
    * Curricular units 1st sem (credited) & Curricular units 2nd sem (credited) — The number of curricular units credited in the first and second semesters, respectively. Indicates prior recognition of a unit without requiring enrollment or evaluation
    * Curricular units 1st sem (enrolled) & Curricular units 2nd sem (enrolled) — The number of curricular units in which the student enrolled in the 1 and 2 semesters, respectively
    * Curricular units 1st sem (evaluations) & Curricular units 2nd sem (evaluations) — The number of curricular units evaluated in the 1 and 2 semesters, respectively
    * Curricular units 1st sem (approved) & Curricular units 2nd sem (approved) — The number of curricular units successfully passed by the student in the 1 and 2 semesters, respectively
    * Curricular units 1st sem (grade) & Curricular units 2nd sem (grade) — The average grade of the student in the 1 and 2 semesters, respectively. Scored between min 0 and max 20. Not ordinal or interval cause have true zero and float value, not int.
    * Curricular units 1st sem (without evaluations) & Curricular units 2nd sem (without evaluations) — The number of curricular units in which the student enrolled but were not evaluated in the 1 and 2 semesters, respectively
    * Unemployment rate — Probably(!) the percentage of the population unemployed during the semester.
    * Inflation rate — Probably(!) the percentage of inflation during the semester.
    * GDP — Probably(!) the Gross Domestic Product during the semester.

Extra notes:
* Previous qualification, Mother's qualification, Father's qualification column's numbers order do not represent what level of education is higher or lower, should be refactored.
* For mother's occupation and father's occupation missing data encoded as 99 ("blank")
* Debtor and Tuition fees up to date could have correlation, should be checked.
* Should be checked if enrolled curricular units could to be calculated by evaluations + without evaluations. The same for without evaluations = enrolled - evaluations etc.