## 1 Kruskal-Wallis Test (KW test)

### 1.1 Assumptions and Conditions for KW test

1. It is not mandatory for the data to be normally distributed to apply KW test.
1. KW test uses median hence KW test is not impacted by outliers.

### 1.2 Nature of Hypothesis

- $H_0: \text{Median of all the groups is same}$
- $H_a: \text{At least one of the groups has significantly different Median}$

### 1.3 Test statistic

1. The name of the test statistic used in KW-test is called as H-statistic.
2. Sample follows Chi-square distribution.

### 1.4 API

```python
from scipy.stats import kruskal
```

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html

### 1.5 Examples

In [1]:
import numpy as np
import pandas as pd

from scipy.stats import kruskal

##### Import Dataset

In [2]:
af_df = pd.read_csv("../0_data/02_aerofit/aerofit_treadmill.csv")
af_df.head(3)

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
0,KP281,18,Male,14,Single,3,4,29562,112
1,KP281,19,Male,15,Single,2,3,31836,75
2,KP281,19,Female,14,Partnered,4,3,30699,66


##### Create Groups

###### Group: KP281

In [3]:
mask = af_df["Product"] == "KP281"
kp_281 = af_df[mask]
kp_281.head(2)

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
0,KP281,18,Male,14,Single,3,4,29562,112
1,KP281,19,Male,15,Single,2,3,31836,75


###### Group: KP481

In [4]:
mask = af_df["Product"] == "KP481"
kp_481 = af_df[mask]
kp_481.head(2)

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
80,KP481,19,Male,14,Single,3,3,31836,64
81,KP481,20,Male,14,Single,2,3,32973,53


###### Group: KP781

In [5]:
mask = af_df["Product"] == "KP781"
kp_781 = af_df[mask]
kp_781.head(2)

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
140,KP781,22,Male,14,Single,4,3,48658,106
141,KP781,22,Male,16,Single,3,5,54781,120


#### KW Test

##### STEP #1: Formulate Hypothesis

In [6]:
# H_0: NO significant difference in median among the groups
# H_a: At least one of the group has significantly different Median

##### STEP #2: Select Significance level

In [7]:
alpha = 0.05

##### STEP #3: Select type of test

In Kruskal-Wallis, the test is always right-tailed chi-squared test.

##### STEP 4: Compute test-statistic and p-value

In [8]:
test_stat, p_value = kruskal(kp_281["Income"], kp_481["Income"], kp_781["Income"])
print("H-Statistic:", test_stat.item())
print("p-value:", p_value.item())

H-Statistic: 61.43670384567185
p-value: 4.562357014275808e-14


##### STEP 5: Compare p-value with alpha

In [9]:
if p_value < alpha:
    print("Reject Null hypothesis")
else:
    print("Failed to reject Null hypothesis")

Reject Null hypothesis


## Quizzes

### Quiz #1

#### Solution