This project analyzes student exam scores to uncover patterns based on gender, parental education level, lunch type, and test preparation course. The analysis is performed on the Students Performance in Exams dataset from Kaggle.
The goal is to practice data handling, exploratory data analysis (EDA), and data visualization using Python.
- Explore the dataset structure and understand data types.
- Analyze how categorical features affect numerical scores (Math, Reading, Writing).
- Visualize trends and distributions with bar plots, box plots, and histograms.
- Derive meaningful insights to summarize student performance.
- Python (Programming language)
- Google Colab (Notebook environment)
- Libraries:
- pandas – data handling
- matplotlib & seaborn – visualization
Source: Kaggle – Students Performance in Exams
Key Features:
- gender – Male or Female
- race/ethnicity
- parental level of education – e.g., Bachelor's degree, High school
- lunch – Standard or Free/Reduced
- test preparation course – None or Completed
- math score , reading score , writing score – Student exam scores
- Used files.upload() in Colab to upload the CSV file.
- Loaded the dataset into a pandas DataFrame.
- Checked dataset info (df.info()) and summary statistics (df.describe()).
- Identified unique values in categorical columns.
- Verified dataset has no missing/null values.
- Calculated average scores grouped by:
- Gender
- Lunch type
- Test preparation course
- Parental education level
- Boxplots for gender vs math scores.
- Bar plots for lunch type, test preparation, and parental education vs average scores.
- Histograms for distribution of all scores.
Key Findings:
- Gender Difference: Females perform better in reading and writing; males slightly better in math.
- Lunch Effect: Students with standard lunch score higher than those with free/reduced lunch.
- Test Preparation: Completing the test preparation course boosts reading and writing scores.
- Parental Education: Higher parental education correlates with better student performance.
- Score Distribution: Most students score between 60–80, with a few outliers.
All visualizations are created using matplotlib and seaborn:
- Boxplots, bar plots, and histograms.
- Clear titles, axis labels, and color schemes for readability.
- Clone or download the repository.
- Open the notebook Week1-StudentPerformance-EDA.ipynb in Google Colab.
- Upload StudentsPerformance.csv when prompted.
- Run all cells to view analysis and visualizations.