PROJECT TITLE
Students Performance in Exams
GOAL
Aim- To understand the influence of the parents background, test preparation etc on students performance. Perform EDA.
DATASET
https://www.kaggle.com/spscientist/students-performance-in-exams
DESCRIPTION
By using Feature Engineering, Feature Extraction, Data Analysis, Data Visualization and then applying Classification Algorithms from Machine Learning to Separate Students with different grades.
WHAT I HAD DONE
- perfromed exploratory data analysis (EDA) on the given dataset
- it starts with loading the dataset and viewing the top 5 rows
- checking if there are any null values present- no null values present
- then comes finding correlation between the features and also finding statistical values related to the dataset
- data visualization is done with libraries such as matplotlib and seaborn
- feature engineering on the data to visualize and solve the dataset more accurately
- setting a passing mark for the students to pass on the three subjects individually
- computing the total score for each student
- checking which student is fail overall
- Assigning grades to the grades according to the passing criteria
- Different Data preprocessing techniques has been used
- finally 3 different algorithms are used to find the best algorithm
- also accuracy score of each algorithm is calculated for comparison purpose with other algorithms
MODELS USED
- Logistic Regression= simplest and most common algorithm used for classification problems
- Random Forest
- Support Vector Machine(SVM)
LIBRARIES NEEDED
- Numpy
- Pandas
- Matplotlib
- Seaborn
- Scikit-Learn
ACCURACIES
- Logistic Regression= 80.8% accurate
- Random Forest= 99.2% accurate
- Support Vector Machine= 85.2% accurate
CONCLUSION
We can conclude that Random Forest gives the most accurate results specifically for this problem statement.
CONTRIBUTED BY
Tandrima Singha