Introduction: Comparing Grades of Sophomores and Juniors

Imagine a big university where students study hard. We're curious to find out if there's a real difference in the average grades of sophomores (second-year students) and juniors (third-year students). Are juniors really doing better? To solve this puzzle, we collected information from 17 sophomores and 13 juniors about their grades. We're going to use some special math to figure out if the differences we see in grades are big enough to mean something important, or if they could just be random chance. Let's find out if there's enough proof to say that sophomores and juniors have different grades.

In [1]:
import pandas as pd
from scipy.stats import t

In [4]:
data = pd.read_csv('student_gpa.txt', sep='\t')
data.head()

Unnamed: 0,Sophomores,Juniors
0,3.04,2.56
1,1.71,2.77
2,3.3,2.7
3,2.88,3.0
4,2.11,2.98


Set Up Hypotheses

In [5]:
# Null Hypothesis (H₀): The mean GPAs of sophomores and juniors are the same
null_hypothesis = "The mean GPAs of sophomores and juniors are the same (μ₁ = μ₂)"

# Alternative Hypothesis (H₁): The mean GPAs of sophomores and juniors are different
alternative_hypothesis = "The mean GPAs of sophomores and juniors are different (μ₁ ≠ μ₂)"

print("Null Hypothesis:", null_hypothesis)
print("Alternative Hypothesis:", alternative_hypothesis)

Null Hypothesis: The mean GPAs of sophomores and juniors are the same (μ₁ = μ₂)
Alternative Hypothesis: The mean GPAs of sophomores and juniors are different (μ₁ ≠ μ₂)


Null Hypothesis:

We're starting with a guess that sophomores and juniors have the same average grades.
It's like saying, "Maybe there's no real difference in how well they're doing."


Alternative Hypothesis:

But we're also curious and guessing that maybe their grades are not the same.
It's like saying, "Hmm, maybe one group is doing better than the other."
So, we're using math to figure out if there's enough proof to say if the grades are really different or if it's just random chance. It's like being a detective with numbers, trying to find clues about students' grades.

In [8]:
print(data.columns)

Index(['Sophomores', '  Juniors'], dtype='object')


Calculate Test Statistic

In [9]:
n1 = 17
n2 = 13

mean_diff = data['Sophomores'].mean() - data['  Juniors'].mean()
std_dev1 = data['Sophomores'].std()
std_dev2 = data['  Juniors'].std()

degrees_of_freedom = (n1 - 1) + (n2 - 1)

t_statistic = mean_diff / ((std_dev1**2 / n1 + std_dev2**2 / n2)**0.5)

print("Calculated t-statistic:", t_statistic)

Calculated t-statistic: -0.9231495630900276


Calculated t-statistic: -0.92

This number as a sort of "comparison score" between two groups of students, the sophomores and juniors.

If this score is really big (let's say +5), it's like one group is clearly better in something. But when the score is small, like -0.92, it's not a big difference. It's kind of like saying, "Hmm, the two groups seem quite similar in whatever we're looking at."

So, in our case, this number suggests that the difference in grades might not be very significant. It's like saying, "Well, the grades of the two groups are not that different."

Calculate Critical Value

In [10]:
alpha = 0.05  # Significance level

critical_value = t.ppf(alpha/2, df=degrees_of_freedom)

print("Calculated critical value:", critical_value)

Calculated critical value: -2.048407141795244


Calculated critical value: -2.05

This number as a sort of "special line" we draw on our measurement scale. This line helps making a decision about our comparison between sophomores and juniors.

If our "comparison score" (which we calculated earlier) is beyond this line, it's like a detective's clue saying, "Hey, there's something really interesting happening here!" But if our score isn't beyond this line, it's like the detective saying, "Well, it might not be as exciting as we thought."

In our case, since our "comparison score" isn't very far from -2.05, it suggests that the difference in grades between sophomores and juniors might not be super important. It's like saying, "Hmm, the grades don't seem to show a big difference between the two groups."

Compare Test Statistic and Critical Value

In [11]:
if t_statistic < -critical_value or t_statistic > critical_value:
    conclusion = "Reject the null hypothesis. There's enough evidence that the mean GPAs differ."
else:
    conclusion = "Fail to reject the null hypothesis. There's not enough evidence of a difference in mean GPAs."

conclusion

"Reject the null hypothesis. There's enough evidence that the mean GPAs differ."

Final Verdict: "Reject the Null Hypothesis"

We've been checking whether the grades of sophomores and juniors are really different. We've done a some calculations to figure this out.

Here's what we found:

First, we started by guessing that maybe their grades are the same. But then we thought, "Hmm, maybe they're different."

We used math to compare their grades and see if the differences are likely to be true or just random.
Now, after looking at all the numbers, we've come to a decision. We're saying, "You know what? The grades of sophomores and juniors might actually be different!"

It's like being a detective and saying, "Yes, there's enough proof to say that one group's grades are not the same as the other group's grades." This means we've found something interesting and the differences in grades might be meaningful.

We can't be 100% sure. There's always a small chance that the conclusion could be a bit wrong. But based on what I've seen, it's like saying, "Hey, there's a good chance these two groups have different grades."