### Hypothesis Testing
  Hypothesis testing is a statistical method that is used to make **statistical decisions** using experimental data.
  * It is basically an assumption that we make about the populatiion parameter.
  h0 --> Null hypothesis (What we belive generally)
  h1 --> Alternative hypothesis ( what we want to prove)

In [1]:
import pandas as pd

# Sample data for a survey on smart home devices
# This data includes satisfaction levels for two types of devices: Smart Thermostat and Smart Light
data = {
    'Satisfaction': ['Very Satisfied', 'Satisfied', 'Neutral', 'Unsatisfied', 'Very Unsatisfied', 'Total'],
    'Smart Thermostat': [50, 80, 60, 30, 20, 240],
    'Smart Light': [70, 100, 90, 50, 50, 360],
    'Total': [120, 180, 150, 80, 70, 600]
}
# Create a DataFrame from the sample data
df = pd.DataFrame(data)
display(df)

Unnamed: 0,Satisfaction,Smart Thermostat,Smart Light,Total
0,Very Satisfied,50,70,120
1,Satisfied,80,100,180
2,Neutral,60,90,150
3,Unsatisfied,30,50,80
4,Very Unsatisfied,20,50,70
5,Total,240,360,600


In [2]:
from scipy.stats import chi2_contingency

# Prepare the observed data for the chi-square test
# We exclude the last row (Total) and the first column (Satisfaction)
observed_df = df.iloc[:-1, 1:3]

# Perform the chi-square test
# chi2_contingency returns the chi-square statistic, p-value, degrees of freedom, and expected frequencies
# chi square test is used to determine if there is a significant association between two categorical variables
# It compares the observed frequencies in each category to the expected frequencies if there were no association
chi2_stat, p, dof, expected = chi2_contingency(observed_df)

print(f"Chi-square statistic: {chi2_stat}")
print(f"P-value: {p}")
print(f"Degrees of freedom: {dof}")
print(f"Expected frequencies: {expected}")


Chi-square statistic: 5.638227513227513
P-value: 0.22784371130697179
Degrees of freedom: 4
Expected frequencies: [[ 48.  72.]
 [ 72. 108.]
 [ 60.  90.]
 [ 32.  48.]
 [ 28.  42.]]


To make a decision about the null hypothesis, we compare the calculated Chi-Square statistic to the critical value:

*   If the Chi-Square statistic is greater than the critical value, we reject the null hypothesis.
*   If the Chi-Square statistic is less than or equal to the critical value, we fail to reject the null hypothesis.

This leads to the same conclusion as comparing the p-value to the significance level.

In [3]:
from scipy.stats import chi2

# Calculate the critical value for the chi-square test
# The critical value is determined based on the significance level (alpha) and degrees of freedom (dof)
# For a significance level of 0.05 and the calculated degrees of freedom
alpha = 0.05
critical_value = chi2.ppf(1 - alpha, dof)

print(f"The critical value for a significance level of {alpha} and {dof} degrees of freedom is: {critical_value}")

# Decision based on the chi-square statistic and critical value
# If the chi-square statistic is greater than the critical value, we reject the null hypothesis
# The null hypothesis (H0) states that there is no association between the two categorical variables
if chi2_stat > critical_value:
    print("Decision: Reject H0 → There is a significant association.")
else:
    print("Decision: Fail to reject H0 → No significant association.")

The critical value for a significance level of 0.05 and 4 degrees of freedom is: 9.487729036781154
Decision: Fail to reject H0 → No significant association.


With a chi-square statistic of 5.638, a p-value of 0.2278, and 4 degrees of freedom, and using a significance level (alpha) of 0.05:

Since the p-value (0.2278) is greater than the significance level (0.05), we fail to reject the null hypothesis.

**Conclusion:** There is no statistically significant association between customer satisfaction levels and the preference for smart thermostats versus smart lights based on this data.