## 🧪 One-Sample *t*-Test

### 1️⃣ Purpose
The **one-sample t-test** is used to determine whether the mean of a single sample  
is significantly different from a known or hypothesized **population mean** ($\mu_0$).

---

### 2️⃣ Hypotheses

**Null Hypothesis ($H_0$):**  
$$
\mu = \mu_0
$$

**Alternative Hypothesis ($H_1$):**

- **Two-tailed:**  
  $$
  \mu \neq \mu_0
  $$

- **Left-tailed:**  
  $$
  \mu < \mu_0
  $$

- **Right-tailed:**  
  $$
  \mu > \mu_0
  $$

---

### 3️⃣ Test Statistic Formula

$$
t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}
$$

where:  
- $\bar{x}$ = sample mean  
- $\mu_0$ = hypothesized population mean  
- $s$ = sample standard deviation  
- $n$ = sample size  

---

### 4️⃣ Degrees of Freedom

$$
df = n - 1
$$

---

### 5️⃣ p-Value Calculation

Depending on the type of test:

- **Two-tailed:**  
  $$
  p = 2 \times \left(1 - F_t(|t|, df)\right)
  $$

- **Right-tailed:**  
  $$
  p = 1 - F_t(t, df)
  $$

- **Left-tailed:**  
  $$
  p = F_t(t, df)
  $$

where $F_t(t, df)$ is the **CDF (Cumulative Distribution Function)**  
of the *t*-distribution with $df$ degrees of freedom.

---

### 6️⃣ Decision Rule

For a chosen significance level $\alpha$ (e.g., 0.05):

- If $p < \alpha$: **Reject $H_0$**  
- Else: **Fail to reject $H_0$**


In [102]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [103]:
data = sns.load_dataset('titanic')

In [104]:
data.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [105]:
data.shape

(891, 15)

In [106]:
data.isna().sum()

survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64

In [107]:
pop = data['age']
len(pop)

891

In [108]:
# Note: In this hypothesis testing, we will not treat missing (NaN) values.
# Instead, we will directly remove them to ensure valid statistical analysis.
pop = pop.dropna()
len(pop)

714

In [109]:
print("mean of pop", pop.mean())

mean of pop 29.69911764705882


In [None]:
sample = pop.sample(50) 
print("sample mean", sample.mean())

sample mean 24.8


## 🚢 One-Sample *t*-Test (Titanic Dataset)

### 🎯 Objective
We want to test whether the **mean age** of our Titanic sample  
is **significantly less** than the population mean.

---

### 📊 Given Data

- Population mean ($\mu_0$): 29.69 
- Sample mean ($\bar{x}$): 24.8 
- Type of test: **Left-tailed (one-tailed)**  
- Significance level ($\alpha$): 0.05  

---

### 🧩 Step 1: State the Hypotheses

**Null Hypothesis ($H_0$):**  
$$
\mu = \mu_0
$$

**Alternative Hypothesis ($H_1$):**  
$$
\mu < \mu_0
$$

---

### 🧮 Step 2: Test Statistic Formula

$$
t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}
$$

where:  
- $\bar{x}$ = sample mean  
- $\mu_0$ = hypothesized population mean  
- $s$ = sample standard deviation  
- $n$ = sample size  

---

### 🧠 Step 3: Degrees of Freedom

$$
df = n - 1
$$

---

### 🧾 Step 4: p-Value Calculation (Left-Tailed)

For a left-tailed test:

$$
p = F_t(t, df)
$$

where $F_t(t, df)$ is the **CDF** of the *t*-distribution.

---

### 🧭 Step 5: Decision Rule

At $\alpha = 0.05$:

- If $p < \alpha$: **Reject $H_0$**  
- Else: **Fail to reject $H_0$**

---

### 🧩 Step 6: Interpretation

If the null hypothesis is rejected,  
we conclude that the **sample mean age** is **significantly lower**  
than the population mean (29.6991).  
Otherwise, we conclude that the sample does not provide enough evidence  
to say that the mean age is lower.


In [None]:
from scipy.stats import t, shapiro
def T_test(sample_mean, sample_size, pop_mean, sample_std, alpha, tails):
    T_value = ((sample_mean - pop_mean) / (sample_std / np.sqrt(sample_size)))
    return P_value(T_value, sample_size, alpha, tails)
    
    
def P_value(t_value, df, alpha, tails):
    if tails == 2:  
         P_value =  tails * (1 - t.cdf(abs(t_value), df-1))
    else:
        P_value = t.cdf(t_value, df-1)
    if P_value < alpha:
        print("✅ Reject H₀ → H₁ wins")
    else:
        print("❌ Fail to reject H₀ → H₀ wins")
    return t_value, P_value  

In [144]:
print(T_test(sample.mean(), len(sample), pop.mean(), sample.std(), 0.05, 1))

❌ Fail to reject H₀ → H₀ wins
(0.094346146394838, 0.4626094724894808)
