✅ Problem:

You sampled 30 students. Their test scores are stored in a Python list.

Calculate:

1. The mean score

2. The 95% confidence interval

3. Interpret the result

🧾 Import libraries

In [2]:
import numpy as np 
from scipy import stats

🧾 Step 1: Sample Data (Scores of 30 students)

In [3]:
scores = [78, 85, 90, 72, 88, 95, 80, 76, 84, 91,
          89, 77, 85, 82, 90, 73, 79, 88, 81, 92,
          86, 75, 84, 87, 93, 78, 80, 83, 89, 85]

🧮 Step 2: Calculate the Mean

📘 Explanation:

- np.mean() calculates the average.

- This tells us the central tendency — the typical score among students.

In [4]:
mean_score = np.mean(scores)
print(f"Mean score: {mean_score:.2f}")

Mean score: 83.83


In [5]:
# Standard Error of the mean (SEM)
sem = stats.sem(scores)

# Confidence Interval using t-distribution
confidence = 0.95
n = len(scores)

# degree of freedom = n - 1
interval = stats.t.interval(confidence, df = n - 1, loc=mean_score, scale=sem)
print(f"95% Confidence Interval: {interval}")

95% Confidence Interval: (81.54809927790643, 86.11856738876023)


🧠 Interpretation:

- We are 95% confident that the true average score of all students (not just this sample) lies between 81.2 and 85.2.

This means: If we took 100 different samples, about 95 of them would have a mean inside this range.

🔍 Why This Matters in Data Science:

- A/B Testing: Which version of a product performs better?

- Business Decisions: Estimate true average purchase, bounce rate, or satisfaction score.

- ML Feature Understanding: Is the effect of a feature statistically significant?

