Skip to content

alanwalker23/SC1015-Overbyte

Repository files navigation

Welcome to Learning Analytics repository

About

This is a mini-project for SC1015 (Data Science and Artificial Intelligence) which focuses on Learning Analytics from the Student Performance Dataset. For detailed walkthrough please view in the following format:

  1. Data Exploration
  2. Correlation Table
  3. KMeans Clustering
  4. Hierarchical Clustering
  5. KPrototypes Clustering
  6. Baseline Decision Tree
  7. Numerical-based Decision Tree
  8. Decision Tree with identified variables
  9. Decision Tree with identified variables (Exploration)
  10. Decision Tree with identified variables (Binary-xgboosted)
  11. UMAP/PCA (Failed Exploration)
  12. Unsupervised UMAP (Failed Exploration)

Video Presentation available here

Contributions

Alan, Keith, Mavis, all done together with even work distribution

Problem Definition

Are we able to determine if a student will fail based on his lifestyle?
Can we predict which "band" a student will fall into based on previous records?
Following these bands we can remedy their scores and provide appropriate support.

Models Used

Elbow Plot for Cluster Selection
KMeans & KPrototype clustering
Hierarchical Clustering (Dendrogram)
Decision Tree

Conclusion

Parent's education place a part in one's school performance.
Mother staying at home could be detrimental to one's school performance.
Studying more equates to better grades, but given it is based on qualitative grounds, this might be reliant on one's confidence.
Aiming for higher education does improve one's school performance (driven).
Internet access plays a part in performance.
Being absent for classes does not necessarily equate to bad performance, but there is a general trend of performing badly.
Being self-aware and accepting of high alcohol consumption on weekly and daily basis would negatively affect performance.

What did we learn from this project?

Learning Analytics
Clustering
Decision Tree
Collaborating using GitHub
Concepts about Distance, Accuracy, Noise and Data Handling

References

  1. Data Types
  2. One-Hot Encoding
  3. When to use clustering
  4. PANDAS Correlation
  5. Point Biserial
  6. Correlation
  7. Clustering
  8. Clustering
  9. K-Means
  10. K-Means
  11. K-Prototypes
  12. K-Prototypes
  13. UMAP
  14. Modelling
  15. Decision Tree
  16. xgboost
  17. xgboost
  18. xgboost

About

Data Science and Artificial Intelligence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published