# What is hypothesis testing

Hypothesis testing in the context of machine learning often involves evaluating whether the observed performance of a model is statistically signficant or if difference performance between models are due to random chance.

# Steps to Perform hypothesis testing

1. Define the hypothesis:

      * Null Hypothesis (h0): There's no signiticant difference between the performances of two models.

      * Alternative Hypothesis(h1): There is a significant difference between the performances of the two models.

2. Select the test:
  
    * For comparing the means of model performance metrics (e.g.,accuracy)common tests include the t-test (for normally distributed data) or the `Wilcoxon signed rank test `(for non-normal distributions).

3. Calculate the Test Statistic:

     * Use cross-validation to obtain performance scores (e.g.,accuracy) for the models.

     * Apply the selected statistical test to compare the distributions of these scores.

4. Interpret the Results:

      * Compare the p_value to the significance level (e.g., 0.05) to decide whether to reject or fail to reject the null hypothesis.

STEP 1: Import libiaries and load data

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from scipy.stats import ttest_ind, wilcoxon

# Load Titanic dataset
df = sns.load_dataset('titanic')

# Data preprocessing (simplified)
df.drop(columns=['deck'], inplace=True)
df['age'].fillna(df['age'].median(), inplace=True)
df.dropna(subset=['embarked'], inplace=True)
df['sex'] = df['sex'].astype('category')
df['embarked'] = df['embarked'].astype('category')

# Select features and target
X = df[['age', 'fare', 'sibsp', 'parch']]
y = df['survived']

Step 2: Train Two different models and perform  cross-validation

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# define models
logistic_model = make_pipeline(StandardScaler(),LogisticRegression())
decision_tree_model= DecisionTreeClassifier()

# Perform Cross-validation
logistic_scores = cross_val_score(logistic_model,X,y,cv=10,scoring='accuracy')
decision_tree_scores = cross_val_score(decision_tree_model,X,y,cv=10,scoring='accuracy')

# Display cross-validation results
print( f'Logistic Regression Mean Accuracy: {np.mean(logistic_scores): .4f}')
print( f'Decision Tree Mean Accuracy: {np.mean(decision_tree_scores): .4f}')

In [None]:
sns.kdeplot(logistic_scores)

In [None]:
sns.kdeplot(decision_tree_scores)

Step 3: Perform hypothesis testing

In [None]:
# Perform independent t-test
t_stat,p_value_ttest= ttest_ind(logistic_scores,decision_tree_scores)
print(f"T-test p-value: {p_value_ttest: .4f}")

step 4 : Interpret the Results


In [None]:
alpha=0.05

# Interpretation for t-test
if p_value_ttest<alpha:
  print("Reject the null hypothesis (t-test): There is a significant difference")
else:
  print("Fail to reject the null hypothesis (t-test): No significant difference")