# RUNNING EXAMPLE - Titanic Dataset - Hypothesis testing

### Recap: <font color ='teal'> Hypothesis Testing - Part 1 PPT (Slide 21) </font>

Let’s take our Titanic dataset. You have seen that the prices in first class were on average 85 dollars and someone told you that prices in 3rd class were usually a fifth of prices in first class. You are skeptical. Set up the hypotheses to test this.


In [None]:
!pip install --upgrade scipy

Collecting scipy
  Using cached scipy-1.14.1-cp312-cp312-macosx_14_0_arm64.whl.metadata (60 kB)
Using cached scipy-1.14.1-cp312-cp312-macosx_14_0_arm64.whl (23.1 MB)
Installing collected packages: scipy
  Attempting uninstall: scipy
    Found existing installation: scipy 1.11.2
    Uninstalling scipy-1.11.2:
      Successfully uninstalled scipy-1.11.2
Successfully installed scipy-1.14.1


**Information:**

* μ₁ = mean fare in 1st class
* μ₃ = mean fare in 3rd class

We are testing whether:
* μ₃ = (1/5) × μ₁

**Hypothesis:**

* Null Hypothesis (H₀): μ₃ = (1/5) × μ₁
* Alternative Hypothesis (H₁): μ₃ ≠ (1/5) × μ₁

This is a two-sided test.

In [4]:
import pandas as pd
from scipy import stats

# Load Titanic dataset (using seaborn for convenience)
import seaborn as sns
titanic = sns.load_dataset('titanic')

# Drop missing fare or class values
titanic = titanic.dropna(subset=['fare', 'pclass'])

# Get fares by class
fare_first_class = titanic[titanic['pclass'] == 1]['fare']
fare_third_class = titanic[titanic['pclass'] == 3]['fare']

# Compute means
mean_first = fare_first_class.mean()
mean_third = fare_third_class.mean()

# Hypothesized mean for 3rd class based on 1st class
expected_mean_third = mean_first / 5

# One-sample t-test: compare 3rd class fares against expected mean
t_stat, p_value = stats.ttest_1samp(fare_third_class, popmean=expected_mean_third)

# Output results
print(f"Mean fare (1st class): {mean_first:.2f}")
print(f"Mean fare (3rd class): {mean_third:.2f}")
print(f"Expected mean fare (3rd class, under H₀): {expected_mean_third:.2f}")
print(f"T-statistic: {t_stat:.2f}")
print(f"P-value: {p_value:.10f}")

# Interpret result
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: 3rd class fare is significantly different from 1/5 of 1st class fare.")
else:
    print("Fail to reject the null hypothesis: no significant difference.")


Mean fare (1st class): 84.15
Mean fare (3rd class): 13.68
Expected mean fare (3rd class, under H₀): 16.83
T-statistic: -5.94
P-value: 0.0000000055
Reject the null hypothesis: 3rd class fare is significantly different from 1/5 of 1st class fare.


Now, you think the prices in third class are even cheaper than that. Set up the hypotheses to test this.

In [None]:
#TODO
###########################
# Insert Your code Here  ##
###########################
