# **Chi-Square test**

Association between Device Type and Customer Satisfaction

Background:
Mizzare Corporation has collected data on customer satisfaction levels for two types of smart home devices: Smart Thermostats and Smart Lights. They want to determine if there's a significant association between the type of device purchased and the customer's satisfaction level.

Objective:
To use the Chi-Square test for independence to determine if there's a significant association between the type of smart home device purchased (Smart Thermostats vs. Smart Lights) and the customer satisfaction level.

Assignment Tasks:

1) State the Hypotheses

2) Compute the Chi-Square Statistic

3) Determine the Critical Value: Using the significance level (alpha) of 0.05 and the degrees of freedom (which is the number of categories minus 1)

4) Make a Decision: Compare the Chi-Square statistic with the critical value to decide whether to reject the null hypothesis.

In [None]:
import pandas as pd
df=pd.read_csv("/content/SmartDevices.csv")
df

Unnamed: 0,Satisfaction,Smart Thermostat,Smart Light,Total
0,Very Satisfied,50,70,120
1,Satisfied,80,100,180
2,Neutral,60,90,150
3,Unsatisfied,30,50,80
4,Very Unsatisfied,20,50,70


**1) State the Hypotheses**

When conducting hypothesis testing, you generally state a null hypothesis (
𝐻
0
H
0
​
 ) and an alternative hypothesis (
𝐻
𝑎
H
a
​
 ). These hypotheses are used to make inferences about a population based on sample data.

Here is how you can state the hypotheses for different scenarios:

Example 1: Testing a Population Mean
Suppose we want to test if the average height of students in a school is 170 cm.

Null Hypothesis (
𝐻
0
H
0
​
 ): The average height of students is 170 cm.

𝐻
0
:
𝜇
=
170
 cm
H
0
​
 :μ=170 cm
Alternative Hypothesis (
𝐻
𝑎
H
a
​
 ): The average height of students is not 170 cm.

𝐻
𝑎
:
𝜇
≠
170
 cm
H
a
​
 :μ

=170 cm
Example 2: Testing a Population Proportion
Suppose we want to test if the proportion of defective products in a batch is 5%.

Null Hypothesis (
𝐻
0
H
0
​
 ): The proportion of defective products is 5%.

𝐻
0
:
𝑝
=
0.05
H
0
​
 :p=0.05
Alternative Hypothesis (
𝐻
𝑎
H
a
​
 ): The proportion of defective products is not 5%.

𝐻
𝑎
:
𝑝
≠
0.05
H
a
​
 :p

=0.05
Example 3: Testing for a Difference Between Two Means
Suppose we want to test if there is a difference in the average test scores between two groups of students.

Null Hypothesis (
𝐻
0
H
0
​
 ): The average test scores of the two groups are equal.

𝐻
0
:
𝜇
1
=
𝜇
2
H
0
​
 :μ
1
​
 =μ
2
​

Alternative Hypothesis (
𝐻
𝑎
H
a
​
 ): The average test scores of the two groups are not equal.

𝐻
𝑎
:
𝜇
1
≠
𝜇
2
H
a
​
 :μ
1
​


=μ
2
​

Example 4: Testing for a Difference Between Two Proportions
Suppose we want to test if there is a difference in the proportion of males and females who prefer online shopping.

Null Hypothesis (
𝐻
0
H
0
​
 ): The proportions of males and females who prefer online shopping are equal.

𝐻
0
:
𝑝
1
=
𝑝
2
H
0
​
 :p
1
​
 =p
2
​

Alternative Hypothesis (
𝐻
𝑎
H
a
​
 ): The proportions of males and females who prefer online shopping are not equal.

𝐻
𝑎
:
𝑝
1
≠
𝑝
2
H
a
​
 :p
1
​


=p
2
​

General Steps for Stating Hypotheses
Identify the Parameter: Determine whether you are testing a mean, proportion, difference of means, or difference of proportions.
State the Null Hypothesis (
𝐻
0
H
0
​
 ): This hypothesis represents the status quo or no effect. It is what you assume to be true before collecting data.
State the Alternative Hypothesis (
𝐻
𝑎
H
a
​
 ): This hypothesis represents the effect or difference you are testing for. It is what you are trying to find evidence for through your data.
The null hypothesis (
𝐻
0
H
0
​
 ) is typically a statement of no effect or no difference, while the alternative hypothesis (
𝐻
𝑎
H
a
​
 ) represents a statement of an effect or difference.

**2) Compute the Chi-Square Statistic**

In [None]:
import numpy as np
from scipy.stats import chi2

# Observed frequencies
observed = np.array([
    [50, 70],
    [80, 100],
    [60, 90],
    [30, 50],
    [20, 50]
])

# Row totals and column totals
row_totals = observed.sum(axis=1)
column_totals = observed.sum(axis=0)
grand_total = observed.sum()

# Expected frequencies
expected = np.outer(row_totals, column_totals) / grand_total

# Chi-Square statistic
chi_square_stat = ((observed - expected) ** 2 / expected).sum()

# Degrees of freedom
df = (observed.shape[0] - 1) * (observed.shape[1] - 1)

# p-value
p_value = 1 - chi2.cdf(chi_square_stat, df)

print(f"Chi-Square Statistic: {chi_square_stat}")
print(f"P-Value: {p_value}")
print(f"Degrees of Freedom: {df}")


Chi-Square Statistic: 5.638227513227513
P-Value: 0.22784371130697179
Degrees of Freedom: 4


**3) Determine the Critical Value:**

Using the significance level (alpha) of 0.05 and the degrees of freedom (which is the number of categories minus 1)

In [None]:
from scipy.stats import chi2

# Significance level
alpha = 0.05

# Degrees of freedom
df = 4

# Critical value
critical_value = chi2.ppf(1 - alpha, df)

print(f"Critical Value: {critical_value}")

Critical Value: 9.487729036781154


**4) Make a Decision:**

Compare the Chi-Square statistic with the critical value to decide whether to reject the null hypothesis.

In [None]:
# Decision / Conclusion

if chi_square_stat > critical_value:
    decision = "Reject the null hypothesis"
else:
    decision = "Fail to reject the null hypothesis"

print(f"Chi-Square Statistic: {chi_square_stat}")
print(f"Critical Value: {critical_value}")
print(f"Decision/Conclusion: {decision}")


Chi-Square Statistic: 5.638227513227513
Critical Value: 9.487729036781154
Decision/Conclusion: Fail to reject the null hypothesis
