<a href="https://colab.research.google.com/github/azakatov/ut-ml-project-2024/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classification with an academic success dataset

* Project members: Andres Alumets, Anton Zakatov, Pihla Järv, Muhammad Sohaib Anwar
* Github: https://github.com/azakatov/ut-ml-project-2024
* Kaggle: https://www.kaggle.com/competitions/playground-series-s4e6/overview


For this project we follow a workflow proposed by Will Koehrsen on https://github.com/WillKoehrsen/machine-learning-project-walkthrough/blob/master/Machine%20Learning%20Project%20Part%201.ipynb.

Applying this workflow, our report has the following structure:

1. [Setup](#setup)
1. [Data cleaning and formatting](#data-cleaning-and-formatting)
1. [Exploratory data analysis](#exploratory-data-analysis)
1. [Feature engineering and selection](#feature-engineering-and-selection)
1. [Baseline model establishing](#baseline-model-establishing)
1. [Models comparison](#models-comparison)
1. [Hyperparameter tuning](#hyperparameter-tuning)
1. [Model evaluation](#model-evaluation)
1. [Results interpretation](#results-interpretation)
1. [Conclusion](#conclusion)

<a name="setup"></a>
## Setup

### Common imports

In [10]:
import numpy as np
import pandas as pd

# Show all columns in df
pd.options.display.max_columns = None

### Setting up Kaggle API

In [2]:
from google.colab import files
import json

# Upload your Kaggle token to use its API
files.upload();

!mkdir /root/.kaggle/
!mv kaggle.json /root/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json

# Seems to work without this line but breaks data loading if present. Commenting it out
#!kaggle config set -n path -v{/content}

Saving kaggle.json to kaggle.json


In [4]:
# Load data
!kaggle competitions download -c playground-series-s4e6
!unzip playground-series-s4e6.zip

playground-series-s4e6.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive:  playground-series-s4e6.zip
  inflating: sample_submission.csv   
  inflating: test.csv                
  inflating: train.csv               


<a name="data-cleaning-and-formatting"></a>
## Data cleaning and formatting

In [8]:
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

In [11]:
train.head()

Unnamed: 0,id,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Previous qualification (grade),Nacionality,Mother's qualification,Father's qualification,Mother's occupation,Father's occupation,Admission grade,Displaced,Educational special needs,Debtor,Tuition fees up to date,Gender,Scholarship holder,Age at enrollment,International,Curricular units 1st sem (credited),Curricular units 1st sem (enrolled),Curricular units 1st sem (evaluations),Curricular units 1st sem (approved),Curricular units 1st sem (grade),Curricular units 1st sem (without evaluations),Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP,Target
0,0,1,1,1,9238,1,1,126.0,1,1,19,5,5,122.6,0,0,0,1,0,1,18,0,0,6,6,6,14.5,0,0,6,7,6,12.428571,0,11.1,0.6,2.02,Graduate
1,1,1,17,1,9238,1,1,125.0,1,19,19,9,9,119.8,1,0,0,1,0,0,18,0,0,6,8,4,11.6,0,0,6,9,0,0.0,0,11.1,0.6,2.02,Dropout
2,2,1,17,2,9254,1,1,137.0,1,3,19,2,3,144.7,0,0,0,1,1,0,18,0,0,6,0,0,0.0,0,0,6,0,0,0.0,0,16.2,0.3,-0.92,Dropout
3,3,1,1,3,9500,1,1,131.0,1,19,3,3,2,126.1,1,0,0,1,0,1,18,0,0,7,9,7,12.59125,0,0,8,11,7,12.82,0,11.1,0.6,2.02,Enrolled
4,4,1,1,2,9500,1,1,132.0,1,19,37,4,9,120.1,1,0,0,1,0,0,18,0,0,7,12,6,12.933333,0,0,7,12,6,12.933333,0,7.6,2.6,0.32,Graduate


<a name="exploratory-data-analysis"></a>
## Exploratory data analysis

<a name="feature-engineering-and-selection"></a>
## Feature engineering and selection

<a name="baseline-model-establishing"></a>
## Baseline model establishing

<a name="models-comparison"></a>
## Models comparison

<a name="hyperparameter-tuning"></a>
## Hyperparameter tuning

<a name="model-evaluation"></a>
## Model evaluation

<a name="results-interpretation"></a>
## Results interpretation

<a name="conclusion"></a>
## Conclusion