Skip to content

Latest commit

 

History

History
 
 

Student Performance in Exam

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

PROJECT TITLE

Students Performance in Exams

GOAL

Aim- To understand the influence of the parents background, test preparation etc on students performance. Perform EDA.

DATASET

https://www.kaggle.com/spscientist/students-performance-in-exams

DESCRIPTION

By using Feature Engineering, Feature Extraction, Data Analysis, Data Visualization and then applying Classification Algorithms from Machine Learning to Separate Students with different grades.

WHAT I HAD DONE

  1. perfromed exploratory data analysis (EDA) on the given dataset
  2. it starts with loading the dataset and viewing the top 5 rows
  3. checking if there are any null values present- no null values present
  4. then comes finding correlation between the features and also finding statistical values related to the dataset
  5. data visualization is done with libraries such as matplotlib and seaborn
  6. feature engineering on the data to visualize and solve the dataset more accurately
  7. setting a passing mark for the students to pass on the three subjects individually
  8. computing the total score for each student
  9. checking which student is fail overall
  10. Assigning grades to the grades according to the passing criteria
  11. Different Data preprocessing techniques has been used
  12. finally 3 different algorithms are used to find the best algorithm
  13. also accuracy score of each algorithm is calculated for comparison purpose with other algorithms

MODELS USED

  1. Logistic Regression= simplest and most common algorithm used for classification problems
  2. Random Forest
  3. Support Vector Machine(SVM)

LIBRARIES NEEDED

  1. Numpy
  2. Pandas
  3. Matplotlib
  4. Seaborn
  5. Scikit-Learn

ACCURACIES

  1. Logistic Regression= 80.8% accurate
  2. Random Forest= 99.2% accurate
  3. Support Vector Machine= 85.2% accurate

CONCLUSION

We can conclude that Random Forest gives the most accurate results specifically for this problem statement.

CONTRIBUTED BY

Tandrima Singha