This repository contains a synthetic dataset of 3,000 students learning Python, designed for data analysis, machine learning, and educational research. The dataset captures demographics, learning behaviors, engagement metrics, and final exam outcomes, enabling insights into student performance and actionable recommendations for course design.
- Dataset Description
- Column Definitions
- Tools
- Project Overview
- Power BI Dashboard
- Key Insights
- Key Drivers of Performance
- Recommendations
- Conclusion
- License
The dataset simulates student engagement in a Python course and its relationship with performance outcomes.
Key Characteristics:
- Number of Students: 3,000
- Age Range: 16–55 years
- Course Duration: Up to 15 weeks
- Target Variables:
final_exam_score,passed_exam
Click to expand column definitions
| Column Name | Type | Description |
|---|---|---|
student_id |
int | Unique student identifier |
age |
int | Age of learner (16–55) |
country |
object | Student country (e.g., India, Bangladesh, USA, UK) |
prior_programming_experience |
category | Programming experience level before Python |
weeks_in_course |
int | Number of weeks enrolled (1–15) |
hours_spent_learning_per_week |
float | Average weekly learning hours |
practice_problems_solved |
int | Number of coding challenges solved |
projects_completed |
int | Number of Python projects completed |
tutorial_videos_watched |
int | Number of tutorial videos watched |
uses_kaggle |
binary | Kaggle usage (1 = Yes, 0 = No) |
participates_in_discussion_forums |
binary | Forum participation (1 = Yes, 0 = No) |
debugging_sessions_per_week |
int | Average debugging sessions per week |
self_reported_confidence_python |
int | Self-reported Python confidence (1–10) |
final_exam_score |
float | Final exam score (0–100) |
passed_exam |
binary | Exam result (1 = Passed, 0 = Failed) |
- Python – Data processing, cleaning, and exploratory data analysis (EDA)
- Power BI – Interactive data visualizations for insights and reporting
This analysis explores 3,000 students’ final exam performance to identify factors influencing pass and fail outcomes. By comparing learning behaviors, engagement metrics, and demographic variables, the project highlights actionable insights to improve pass rates.
The dashboard provides an interactive view of student performance, including pass rates, score distributions, and learning behavior comparisons between passed and failed students.
🔗 Dashboard Link: Power BI Dashboard
-
Insufficient Study Time
-
Project Completion
-
Practice Problem Solving
- Only 18% of students passed, significantly below the 80% target passing rate.
- Indicates a systemic learning and assessment issue, not isolated underperformance.
- Average final exam score: 43.32, reinforcing the urgent need for intervention.
- Students who passed showed higher engagement in active learning:
- Weekly study time: 8.6 vs. 6.7 hours
- Projects completed: 2.8 vs. 1.8
- Practice problems solved: 2.8 vs. 1.8
- These factors strongly align with higher exam scores (69.0 vs. 37.8).
- Tutorial video views, forum participation, Kaggle usage, and debugging sessions were nearly identical between pass and fail groups.
- Passive or unstructured engagement alone is insufficient for meaningful outcomes.
- Self-reported confidence correlates with higher exam scores but does not independently predict performance.
- Confidence reflects mastery from hands-on practice rather than driving success.
- Age has no meaningful relationship with exam scores.
- Performance gaps are behavioral and instructional, not demographic.
- Increase required coding projects and practice problems.
- Tie activities to graded milestones.
- Introduce minimum weekly practice thresholds.
📌 Rationale: Active engagement shows the strongest link to passing.
- Identify students with low study time, few projects, or minimal practice.
- Trigger early support via tutoring, guided labs, or structured study plans.
📌 Rationale: Most failures likely stem from insufficient early engagement.
- Convert tutorial videos into interactive exercises and code-along labs.
- Reduce reliance on purely observational formats.
📌 Rationale: Passive content shows no meaningful performance impact.
- Replace self-assessed confidence with skill-based diagnostics and weekly coding challenges.
- Use confidence metrics only as supporting indicators.
📌 Rationale: Confidence without practice may create false readiness.
- Ensure exam questions align with taught material and activities.
- Ensure difficulty progression is gradual, not abrupt.
- Pilot midterm diagnostics to measure readiness.
📌 Rationale: An 82% failure rate suggests misalignment between teaching and assessment.
Consistent hands-on practice is the primary driver of student success. Improving pass rates requires:
- Curriculum redesign for active learning
- Assessment alignment
- Early interventions guided by data-driven monitoring
This dataset is provided for educational and research purposes only. You are free to use, modify, and share it with attribution.
Author: Data Analytics Project | Tools: Python, Power BI
