# Analyzing Factors that Influence Student Performance

In this project, we will take a deep dive into a dataset containing rich details about various aspects of student life, such as hours studied, sleep patterns, attendance, and more, to uncover what truly impacts exam performance. The dataset we'll be working with includes a wide range of factors influencing student performance. The table we'll use for this project is called `student_performance` and includes the following data:


| Column                   | Definition                                                      | Data type             |
|--------------------------|-----------------------------------------------------------------|-----------------------|
| `attendance`              | Percentage of classes attended                                  |     `float`               |
| `hours_studied`           | number of study hours                            |      `float`              |
| `extracurricular_activities` | Participation in extracurricular activities                   |     `varchar` (Yes, No)    |
| `sleep_hours`             | Average number of hours of sleep per night                      |     `float`               |
| `tutoring_sessions`       | Number of tutoring sessions attended per month                  |     `integer`             |
| `teacher_quality`         | Quality of the teachers                                         |     `varchar` (Low, Medium, High) |
| `exam_score`              | Final exam score                                                |     `float`               |


We will execute SQL queries on this dataset to answer the following three questions:

1. Do more study hours and extracurricular activities lead to better scores? Analyze how studying more than 10 hours per week, while also participating in extracurricular activities, impacts exam performance.

In [4]:
-- Here we obtain the avg_exam_score of students who studied for more than 10  hours and participated extracurricular activities
SELECT hours_studied, AVG(exam_score) AS avg_exam_score
FROM student_performance
WHERE hours_studied > 10 AND extracurricular_activities = 'Yes'
GROUP BY hours_studied
ORDER BY hours_studied DESC;

Unnamed: 0,hours_studied,avg_exam_score
0,43,78.0
1,39,75.0
2,38,73.5
3,37,73.0
4,36,70.428571
5,35,72.3125
6,34,71.1875
7,33,70.333333
8,32,71.325
9,31,70.553191


2. Is there a sweet spot for study hours? Explore how different ranges of study hours impact exam performance by calculating the average exam score for each study range.

In [5]:
-- Let's categorize students into four groups based on hours studied per week: 1-5 hours, 6-10 hours, 11-15 hours, and 16+ hours and get the avg_exam_score for each group

SELECT
	CASE
	WHEN hours_studied BETWEEN 1 AND 5 THEN '1-5 hours'
	WHEN hours_studied BETWEEN 6 AND 10 THEN '6-10 hours'
	WHEN hours_studied BETWEEN 11 AND 15 THEN '11-15 hours'
	WHEN hours_studied >= 16 THEN '16+ hours' END AS hours_studied_range,
	AVG(exam_score) AS avg_exam_score
FROM student_performance
GROUP BY hours_studied_range
ORDER BY avg_exam_score DESC;



Unnamed: 0,hours_studied_range,avg_exam_score
0,16+ hours,67.923363
1,11-15 hours,65.204386
2,6-10 hours,64.22549
3,1-5 hours,62.627119


3. Teachers want to show their students their relative rank in the class, how can they do this without revealing their exam scores to each other?

In [3]:
-- We'll use a window function to assign ranks based on exam_score, ensuring that students with the same exam score share the same rank and no ranks are skipped.
SELECT
	attendance,
	hours_studied,
	sleep_hours,
	tutoring_sessions,
	exam_score,
	DENSE_RANK() OVER(ORDER BY exam_score DESC) exam_rank
FROM student_performance
ORDER BY exam_rank
LIMIT 30;

Unnamed: 0,attendance,hours_studied,sleep_hours,tutoring_sessions,exam_score,exam_rank
0,98,27,6,5,101,1
1,89,18,4,3,100,2
2,90,14,8,4,99,3
3,83,23,4,1,99,3
4,96,28,4,1,98,4
5,90,28,9,0,98,4
6,83,16,8,2,98,4
7,83,15,7,2,97,5
8,74,21,6,1,97,5
9,99,25,7,0,97,5
