## In This Notebook we will perform Hypothesis Testing on those [sex, chest_pain, fast_bld_sugar, rest_ecg, ex_angina, slope, colored_vessels, thalassemia]
- As we've already done the linear correlations testing in the visual analysis notebook. we will do only the two sample tests.
- We will define a function to make the job a bit easier.

In [1]:
import scipy.stats as stats
import pandas as pd
import warnings

warnings.filterwarnings('ignore')

In [14]:
df = pd.read_csv('heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [15]:
def test_two_samples(col, categroy_of_interest, alternative='greater'):
    mask = df[col] == categroy_of_interest
    
    disease = df['target'][mask]
    no_disease = df['target'][~mask]

    t_stat, p_value = stats.ttest_ind(disease, no_disease, alternative=alternative)
    print(f'statistic is : {t_stat:.003f}, p-value: {p_value:.003f}')

- Males are highest with heart disease cases. (fail to reject the null hypothesis)
    - H_0 : Female cases >= Male cases
    - H_A : Female cases < Male cases

In [19]:
test_two_samples(col='sex', categroy_of_interest=1)

statistic is : -5.079, p-value: 0.000


- People with Non-Anginal pain are highest with heart disease cases. (reject the null hypothesis)
    - H_0 : other anginal pains >= People with Non-Anginal pain
    - H_A : other anginal pains < People with Non-Anginal pain

In [17]:
test_two_samples(col='cp', categroy_of_interest=2)

statistic is : 5.794, p-value: 0.000


- People with <= 120 mg/dl fast blood sugar are highest with heart disease cases. (fail to reject the null hypothesis)
    - H_0 : People with > 120 mg/dl >= People with <= 120 mg/dl
    - H_A : People with > 120 mg/dl < People with <= 120 mg/dl

In [22]:
test_two_samples(col='fbs', categroy_of_interest=0)

statistic is : 0.487, p-value: 0.313


- People with ST-T wave abnormality are highest with heart disease cases. (reject the null hypothesis)
    - H_0 : other rest_ecg >= People with ST-T wave abnormality
    - H_A : other rest_ecg < People with ST-T wave abnormality

In [23]:
test_two_samples(col='restecg', categroy_of_interest=1)

statistic is : 3.090, p-value: 0.001


- People with no exercise included angina are highest with heart disease cases. (reject the null hypothesis)
    - H_0 : People with exercise included angina >= People with no exercise included angina
    - H_A : People with exercise included angina < People with no exercise included angina

In [24]:
test_two_samples(col='exang', categroy_of_interest=0)

statistic is : 8.423, p-value: 0.000


- People with Downsloping are highest with heart disease cases. (reject the null hypothesis)
    - H_0 : other slope types >= People with Downsloping
    - H_A : other slope types < People with Downsloping

In [25]:
test_two_samples(col='slope', categroy_of_interest=2)

statistic is : 7.439, p-value: 0.000


- People no colored vessels are highest with heart disease cases. (reject the null hypothesis)
    - H_0 : People with colored vessels >= People no colored vessels
    - H_A : People with colored vessels < People no colored vessels

In [26]:
test_two_samples(col='ca', categroy_of_interest=0)

statistic is : 9.127, p-value: 0.000


- People with Fixed Defect are highest with heart disease cases. (reject the null hypothesis)
    - H_0 : people with other thalassemia types >= People with Fixed Defect
    - H_A : people with other thalassemia types < People with Fixed Defect

In [27]:
test_two_samples(col='thal', categroy_of_interest=2)

statistic is : 10.768, p-value: 0.000
