Influencias en el rendimiento académico

eng: Influences on academic performance

Results of Hackathon: Ranking #2 🥇

Connect with me:

Background

A study has been carried out to see if the academic performance of children is influenced by the academic level of their parents. Therefore, the academic results of the students will be evaluated based on several variables.

Problem

To find a solution to a multi-class classification problem we have two datasets. One of them is Train, that contains student's data and data on students' parental educational level. The second dataset is Test, that contains no data on parental educational level.

We will perform EDA and create a ML model for prediction of parental educational level on Test-dataset.

Predictive model using GaussianNB.
We have evaluated over 20 basic models and come to conclusion that Gaussian Naive Bayes has the best results.

A dataset of 800 rows for the training (train) of the prediction algorithm and 200 for the testing (test).

Variables:

gender : student`s gender
parental level of education: educational level of the parents
lunch: school lunch
test preparation course: attend the prep course
math score: Math score
reading score: Reading score
writing score: Writing score

Numbers represent following parental educational level:

high school: 0,
some high school: 1,
some college: 2,
associate's degree: 3,
bachelor's degree: 4,
master's degree: 5

Goal

The goal of the challenge is to provide an answer to whether the student's academic results are influenced by the educational level of the parents.
Create a predictive model for prediction of test-dataset.

Results

The results of the first goal are provided in the Final Conclusion of (TS3-DS.ipynb-file). In short: The student's academic results are influenced by the educational level of the parents.
The results of the parental educational level are in the (predictions.csv-file).

Analysis

We've analyzed the data and come to conclusion that children from families with higher educational levels tend to score better in all areas.

However the parental educational level is not the key factor for the students performance. Students that completed Test preparation course achieved higher results than students that haven't completed the prep course.

Solution

After analyzing Correlations between features we've dropped high_income feature from datasets.

Model: GaussianNB without optimizations. The best results obtained with the selected model.

Accuracy: 0.2917
F1-Score macro: 0.2608
F1-Score micro: 0.2917
F1-Score weighted: 0.2765

License

The open source license. https://opensource.org/licenses/MIT MIT License

Connect with me:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

Improvements 💡

There are several ways to improve this model, including:

Using more and better features: The current model only uses a few features (e.g. parental education, test preparation, lunch type) to predict the overall performance of a student. Adding more relevant features, such as the student's gender, age, and socioeconomic status, could potentially improve the model's performance.
Using more advanced machine learning algorithms: The current model uses a simple Gaussian naive Bayes classifier, which may not be the most appropriate algorithm for this problem. Using more advanced algorithms, such as decision trees, random forests, or support vector machines, could potentially improve the model's performance.
Using hyperparameter tuning: The current model does not use any hyperparameter tuning, which means that the model's performance may not be optimized. Using techniques like grid search or random search to find the best hyperparameters for the model could potentially improve its performance.
Using more data: The current model uses a relatively small amount of data, which may not be enough to train a high-performance model. Using more data, either by collecting more data or using techniques like data augmentation, could potentially improve the model's performance.
Evaluating the model's performance more thoroughly: The current model only uses a few metrics (e.g. F1, accuracy, precision, recall) to evaluate its performance. Using more comprehensive evaluation metrics, such as receiver operating characteristic (ROC) curves, could provide a more thorough understanding of the model's performance.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
images		images
LICENSE		LICENSE
README.md		README.md
TS_DS3.ipynb		TS_DS3.ipynb
app.py		app.py
predictions.json		predictions.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Influencias en el rendimiento académico

Results of Hackathon: Ranking #2 🥇

Connect with me:

Background

Problem

Variables:

Goal

Results

Analysis

Solution

License

Connect with me:

Improvements 💡

About

Releases

Packages

Languages

License

GVRQ/TS3-DS

Folders and files

Latest commit

History

Repository files navigation

Influencias en el rendimiento académico

Results of Hackathon: Ranking #2 🥇

Connect with me:

Background

Problem

Variables:

Goal

Results

Analysis

Solution

License

Connect with me:

Improvements 💡

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages