## Conclusion

### Results
Existing approaches based on classification and regression typically achieve prediction accuracies ranging from 70% to 95%. In comparison, the autoencoder model demonstrates improved performance, with accuracies between 90% and 97% and an average of approximately 93%.

### Challenges
Initially, my primary objective was to minimize the **reconstruction loss**. Through extensive optimization, I was able to reduce it to **0%** by narrowing the observed feature set from $37$ to $7$ and constraining the **latent space to just 2 neurons**. Under these conditions, the autoencoder achieved near-perfect input reconstruction.

However, the model was also accurately **reconstructing anomalous data points**, indicating that it could reconstruct any 7-dimensional input without effectively distinguishing anomalies. This suggested that the autoencoder was not learning to differentiate normal patterns from outliers. The  autoencoder basically became an **identity function**.

As a result, I shifted the focus from minimizing reconstruction loss to **maximizing anomaly detection performance**. I began evaluating the model based on its ability to correctly identify anomalies and experimented with various configurations. The best results were obtained by retaining all 37 input features and expanding the latent space to 12 neurons.

### Future work
#### Curate data
Over the course of hundreds of training iterations, I observed significant **fluctuations** in anomaly detection accuracy, occasionally dropping to around 70%. I hypothesize that these variations are primarily due to **inconsistencies within the data** itself. Supporting this, when **data randomization is disabled**, the model consistently achieves 95.92% accuracy.

This suggests that the dataset contains a substantial number of inherent anomalies. The good news is that the model could effectively detect these outliers, enabling us to curate the dataset by removing them - potentially leading to more stable and improved model performance.

#### Feature engineering
Here are a few ideas for feature engineering:
1. **Aggregate Academic Performance**: Create composite features that combine performance across multiple semesters (e.g., total curricular units enrolled, grades).
2. **Interaction Terms**: Introduce features that combine socioeconomic factors with academic performance, such as multiplying unemployment rate by grades.

#### Denoising Autoencoder
Denoising autoencoder is a modification of the original autoencoder in which instead of giving the original input we give a corrupted or noisy version of input to the encoder, while decoder loss is calculated concerning original input only. This results in efficient learning of autoencoders and the risk of autoencoder becoming an **identity function** is significantly reduced.

Few papers suggest that good results would be observed if we use an autoencoder as de-noiser, then use random forest or linear regression on top of the de-noised data.

#### Masked Autoencoder
[This paper](https://arxiv.org/abs/2111.06377) proposes a new architecture involving an encoder and a decoder that when pretrained with **masking**** results in significant improvements over the base model.

Masked Autoencoders introduce a novel approach by randomly masking portions of the input data and training the model to reconstruct the missing parts. This encourages the model to learn robust and meaningful representations that capture the underlying structure of the data.

We can experiment with this approach here.

### Final outcome
Anomaly detection is a robust and practical approach for identifying potential school dropouts. Moreover, the underlying model is relatively simple and straightforward to implement. Undertaking this type of project is highly valuable for junior data scientists, offering essential insights into both machine learning techniques and real-world problem solving.

## References

#### School Dropout Studies
* [Dropout early warning systems for high school students using machine learning](https://www.sciencedirect.com/science/article/abs/pii/S0190740918309721)
* [Application of Decision Trees for Detection of Student Dropout Profiles](https://ieeexplore.ieee.org/abstract/document/8260685)
* [A Real-Life Machine Learning Experience for Predicting University Dropout at Different Stages Using Academic Data](https://ieeexplore.ieee.org/document/9548895)
* [Multi-Model Stacking Ensemble Learning for Dropout Prediction in MOOCs](https://iopscience.iop.org/article/10.1088/1742-6596/1607/1/012004/pdf)
* [Predicting Student Dropout Using Multiclass Classification](https://arno.uvt.nl/show.cgi?fid=181524)
* [Student dropout prediction through machine learning optimization: insights from moodle log data](https://www.nature.com/articles/s41598-025-93918-1)
* [All-Year Dropout Prediction Modeling and Analysis for University Students](https://www.mdpi.com/2076-3417/13/2/1143)
* [Forecasting Students Dropout: A UTAD University Study](https://www.mdpi.com/1999-5903/14/3/76#B10-futureinternet-14-00076)
* [Predicting High School Dropout Rates: An Analysis of  Machine Learning Models](https://nhsjs.com/2025/predicting-high-school-dropout-rates-an-analysis-of-machine-learning-models-and-socioeconomic-factors/)
* [An early warning system for school dropout in the state: a machine learning approach with variable selection methods](https://www.scielo.br/j/pope/a/RVTL5qnPmrmDFtYGZb3vQMD/)
* [A model for predicting dropout of higher education students](https://www.sciencedirect.com/science/article/pii/S2666764924000341)
* [Predictive Model to Identify College Students with High Dropout Rates](https://www.scielo.org.mx/scielo.php?pid=S1607-40412023000100113&script=sci_arttext)
* [(SMOTE) The Machine Learning-Based Dropout Early Warning System for Improving the Performance of Dropout Prediction](https://www.mdpi.com/2076-3417/9/15/3093)
* [Predicting Students Academic Success and Dropout Using Supervised Machine Learning](https://www.researchgate.net/publication/384055745_Predicting_Students_Academic_Success_and_Dropout_Using_Supervised_Machine_Learning)
* [Predicting Student Dropout and Academic Success](https://www.mdpi.com/2306-5729/7/11/146)

#### School Dropout Implementations
* [DropOuts anomaly detection with Auto Encoder](https://www.kaggle.com/code/ataneja2/dropouts-anomaly-detection-with-auto-encoder)
* [Predicting Student Dropout Using Machine Learning](https://medium.com/@sanahanji/predicting-student-dropout-using-machine-learning-b8e6069d6c79)
* [Student Dropout Prediction](https://deepnote.com/app/esviswajith/Student-Dropout-Prediction-c7cdc385-bd2f-45d4-98c5-f8c691d0a4cd)
* [Predict students dropout and academic success using Ensemble Learning](https://sshivam-singh96.medium.com/predict-students-dropout-and-academic-success-using-ensemble-learning-2b1a7bf63379)
* [Predict Students Dropout and Academic Success Using Machine Learning Algorithms](https://github.com/hamzaezzine/Predict-students-dropout-and-academic-success-using-machine-learning-algorithms)
* [Final Project: Solving Edutech Company Problems](https://github.com/alessandroryo/student-dropout-prediction)
* [Student's Dropout Prediction using Supervised Machine Learning Classifiers](https://github.com/Damiieibikun/Student-s-Dropout-Prediction-using-Supervised-Machine-Learning-Classifiers)


#### Autoencoders

* [Demystifying Neural Networks: Anomaly Detection with AutoEncoder](https://medium.com/@weidagang/demystifying-anomaly-detection-with-autoencoder-neural-networks-1e235840d879)
* [Autoencoders and Testing their Potential in Anomaly Detection](https://medium.com/@amnahhmohammed/autoencoders-and-testing-their-potential-in-anomaly-detection-09135140fd56)
* [(Reddit) Struggling with Autoencoder-Based Anomaly Detection for Fraud Detection](https://www.reddit.com/r/MachineLearning/comments/1gl92zm/d_struggling_with_autoencoderbased_anomaly/)
* [AutoEncoders: Theory + PyTorch Implementation](https://medium.com/@syed_hasan/autoencoders-theory-pytorch-implementation-a2e72f6f7cb7)

#### Anomaly Detection
* [A Beginner’s Guide to Anomaly Detection](https://medium.com/data-science-collective/a-beginners-guide-to-anomaly-detection-cdebe88fc985)
* [Mastering Anomaly Detection: Strategies, Tools, and Benefits](https://koshurai.medium.com/mastering-anomaly-detection-strategies-tools-and-benefits-7c91d3b7f3d0)
* [Anomaly Detection with Unsupervised Machine Learning](https://medium.com/simform-engineering/anomaly-detection-with-unsupervised-machine-learning-3bcf4c431aff)
* [3 Types of Anomalies in Anomaly Detection](https://hackernoon.com/3-types-of-anomalies-in-anomaly-detection)


#### Backpropagation
* [Building a Simple Neural Network from Scratch for MNIST Digit Recognition without using TensorFlow/PyTorch](https://medium.com/@ombaval/building-a-simple-neural-network-from-scratch-for-mnist-digit-recognition-without-using-7005a7733418)
* [How derivative of matrix leads to its transpose (Jacobian vs Gradient convention)?](https://math.stackexchange.com/questions/3285670/how-derivative-of-matrix-leads-to-its-transpose)
* [The Matrix Calculus You Need For Deep Learning](https://explained.ai/matrix-calculus/index.html)
* [How to Code a Neural Network with Backpropagation In Python (from scratch)](https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/)
* [Introduction to Neural Networks — Part 1](https://medium.com/deep-learning-demystified/introduction-to-neural-networks-part-1-e13f132c6d7e)
* [Introduction to Neural Networks — Part 2](https://medium.com/deep-learning-demystified/introduction-to-neural-networks-part-2-c261a99f4138)
* [(Video) Backpropagation calculus](https://www.youtube.com/watch?v=tIeHLnjs5U8)
* [(Video) Backpropagation, intuitively ](https://www.youtube.com/watch?v=Ilg3gGewQ5U)
* [(Video) Gradient descent, how neural networks learn  ](https://www.youtube.com/watch?v=IHZwWFHWa-w)
* [(Video) But what is a neural network?  ](https://www.youtube.com/watch?v=aircAruvnKk)

#### Other
* [Outlier Detection and Removal using the IQR Method](https://medium.com/@pp1222001/outlier-detection-and-removal-using-the-iqr-method-6fab2954315d)
* [Activation functions in Neural Networks](https://www.geeksforgeeks.org/activation-functions-neural-networks/)