In [1]:
import pandas as pd
import numpy as np
import scipy.stats as stats

In [2]:
df = pd.read_csv("data/cleaned_student_lifestyle_dataset.csv")

In [None]:
df.head()

### 1. Can we reject the hypothesis that the students get healthy 8 hours of sleep per day on average?


Hypothesis testing

In [None]:
t_stat, p_value = stats.ttest_1samp(df['sleep_hours_per_day'], 8)

In [3]:
mean_sleep = df['sleep_hours_per_day'].mean()
std_sleep = df['sleep_hours_per_day'].std()
n = len(df['sleep_hours_per_day'])

In [4]:
for conf in [0.90, 0.95, 0.99]:
    confidence_interval = stats.t.interval(conf, n-1, loc=mean_sleep, scale=std_sleep/np.sqrt(n))
    print(f"Confidence interval at {int(conf * 100)}%: {confidence_interval[0]:.2f} to {confidence_interval[1]:.2f}")

Confidence interval at 90%: 7.45 to 7.56
Confidence interval at 95%: 7.44 to 7.57
Confidence interval at 99%: 7.42 to 7.59


### Interpretation of Results

#### T-statistic: 14.94
This indicates that the sample mean is significantly different from the hypothesized mean of 7 hours. The positive value suggests that the sample mean is greater than 7 hours.

#### P-value: 0.00
The p-value is extremely small (essentially zero), which means there is strong evidence against the null hypothesis. We can reject the null hypothesis that students study an average of 7 hours per day.

#### Confidence Intervals:
- **90% Confidence Interval: 7.42 to 7.53**: We are 90% confident that the true mean study hours per day for students lies between 7.42 and 7.53 hours.
- **95% Confidence Interval: 7.41 to 7.54**: We are 95% confident that the true mean study hours per day for students lies between 7.41 and 7.54 hours.
- **99% Confidence Interval: 7.39 to 7.56**: We are 99% confident that the true mean study hours per day for students lies between 7.39 and 7.56 hours.

Since all these intervals do not include 8 hours, it further supports rejecting the null hypothesis.

### Conclusion
We can reject the hypothesis that students study an average of 7 hours per day. The actual average study duration is likely between 7.39 and 7.56 hours, which is more than the recommended 7 hours.

Perform hypothesis testing to check for different groups the students are getting healthy 8 hours of sleep per day on average.

Ensure 'gpa_group' is of category dtype

In [5]:
df['gpa_group'] = df['gpa_group'].astype('category')

Function to perform hypothesis testing and calculate confidence intervals

In [6]:
def analyze_sleep_by_gpa_group(df, group):
    group_data = df[df['gpa_group'] == group]['sleep_hours_per_day']
    t_stat, p_value = stats.ttest_1samp(group_data, 8)
    mean_sleep = group_data.mean()
    std_sleep = group_data.std()
    n = len(group_data)

    confidence_intervals = {}
    for conf in [0.90, 0.95, 0.99]:
        confidence_interval = stats.t.interval(conf, n-1, loc=mean_sleep, scale=std_sleep/np.sqrt(n))
        confidence_intervals[f"{int(conf * 100)}%"] = confidence_interval

    return t_stat, p_value, confidence_intervals

Analyze each GPA group

In [7]:
results = {}
for group in df['gpa_group'].cat.categories:
    t_stat, p_value, confidence_intervals = analyze_sleep_by_gpa_group(df, group)
    results[group] = {
        'T-statistic': t_stat,
        'P-value': p_value,
        'Confidence Intervals': confidence_intervals
    }

In [8]:
for group, result in results.items():
    print(f"GPA Group: {group}")
    print(f"T-statistic: {result['T-statistic']:.2f}")
    print(f"P-value: {result['P-value']:.2f}")
    for conf, interval in result['Confidence Intervals'].items():
        print(f"{conf} Confidence Interval: {interval[0]:.2f} to {interval[1]:.2f}")
    print()

GPA Group: 2-2.5
T-statistic: -1.95
P-value: 0.06
90% Confidence Interval: 7.10 to 7.94
95% Confidence Interval: 7.01 to 8.02
99% Confidence Interval: 6.84 to 8.19

GPA Group: 2.5-3
T-statistic: -8.32
P-value: 0.00
90% Confidence Interval: 7.44 to 7.63
95% Confidence Interval: 7.43 to 7.65
99% Confidence Interval: 7.39 to 7.68

GPA Group: 3-3.5
T-statistic: -12.59
P-value: 0.00
90% Confidence Interval: 7.37 to 7.52
95% Confidence Interval: 7.36 to 7.53
99% Confidence Interval: 7.33 to 7.56

GPA Group: 3.5-4
T-statistic: -3.00
P-value: 0.00
90% Confidence Interval: 7.52 to 7.86
95% Confidence Interval: 7.48 to 7.89
99% Confidence Interval: 7.42 to 7.96



### Interpretation of Results

#### GPA Group: 2-2.5
- **T-statistic**: -1.95
- **P-value**: 0.06
- **90% Confidence Interval**: 7.10 to 7.94
- **95% Confidence Interval**: 7.01 to 8.02
- **99% Confidence Interval**: 6.84 to 8.19

#### GPA Group: 2.5-3
- **T-statistic**: -8.32
- **P-value**: 0.00
- **90% Confidence Interval**: 7.44 to 7.63
- **95% Confidence Interval**: 7.43 to 7.65
- **99% Confidence Interval**: 7.39 to 7.68

#### GPA Group: 3-3.5
- **T-statistic**: -12.59
- **P-value**: 0.00
- **90% Confidence Interval**: 7.37 to 7.52
- **95% Confidence Interval**: 7.36 to 7.53
- **99% Confidence Interval**: 7.33 to 7.56

#### GPA Group: 3.5-4
- **T-statistic**: -3.00
- **P-value**: 0.00
- **90% Confidence Interval**: 7.52 to 7.86
- **95% Confidence Interval**: 7.48 to 7.89
- **99% Confidence Interval**: 7.42 to 7.96

### Conclusion
For GPA group 2-2.5, the evidence is not strong enough to reject the null hypothesis at the 5% significance level. For GPA groups 2.5-3, 3-3.5, and 3.5-4, the T-statistics are significantly negative, and the P-values are extremely small, indicating strong evidence against the null hypothesis. The actual average sleep duration for these groups is significantly less than 8 hours.