This is a mini-project for SC1015 (Data Science and Artificial Intelligence) which focuses on Learning Analytics from the Student Performance Dataset. For detailed walkthrough please view in the following format:
- Data Exploration
- Correlation Table
- KMeans Clustering
- Hierarchical Clustering
- KPrototypes Clustering
- Baseline Decision Tree
- Numerical-based Decision Tree
- Decision Tree with identified variables
- Decision Tree with identified variables (Exploration)
- Decision Tree with identified variables (Binary-xgboosted)
- UMAP/PCA (Failed Exploration)
- Unsupervised UMAP (Failed Exploration)
Video Presentation available here
Alan, Keith, Mavis, all done together with even work distribution
Are we able to determine if a student will fail based on his lifestyle?
Can we predict which "band" a student will fall into based on previous records?
Following these bands we can remedy their scores and provide appropriate support.
Elbow Plot for Cluster Selection
KMeans & KPrototype clustering
Hierarchical Clustering (Dendrogram)
Decision Tree
Parent's education place a part in one's school performance.
Mother staying at home could be detrimental to one's school performance.
Studying more equates to better grades, but given it is based on qualitative grounds, this might be reliant on one's confidence.
Aiming for higher education does improve one's school performance (driven).
Internet access plays a part in performance.
Being absent for classes does not necessarily equate to bad performance, but there is a general trend of performing badly.
Being self-aware and accepting of high alcohol consumption on weekly and daily basis would negatively affect performance.
Learning Analytics
Clustering
Decision Tree
Collaborating using GitHub
Concepts about Distance, Accuracy, Noise and Data Handling