# CHI-SQUARE TEST

### 1.State the hypothesis


When conducting a Chi-Square test, we typically define two hypotheses: the null hypothesis (𝐻0) and the alternative hypothesis (𝐻1).

Null Hypothesis (𝐻0): This hypothesis states that there is no association between the categorical variables. In other words, the variables are **independent**.

**Alternative Hypothesis (𝐻1)**: This hypothesis states that there is an association between the categorical variables. In other words, the variables are **not independent**.

### 2.Compute the Chi-Square Statistic
We will use the contingency table and the Chi-Square formula for independence:

##             χ2=∑(0-E)(0-E)/E

where:

𝑂 is the observed frequency .

𝐸 is the expected frequency under the null hypothesis, which can be calculated as:        
##(obs fre)    E= (Row Total)×(Column Total)/Overall Total

### 3.Determine the Critical Value
The degrees of freedom (df) are calculated as:

## df=(number of rows−1)×(number of columns−1)

With a significance level (𝛼) of 0.05, we can use the Chi-Square distribution table to find the critical value.

### 4.Make a Decision
To make a decision, compare the Chi-Square statistic to the critical value:

If the Chi-Square statistic is greater than the critical value, **reject the null hypothesis.**
If the Chi-Square statistic is less than or equal to the critical value, **fail to reject** the null hypothesis.

In [10]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency

data = np.array([[50, 70], [80, 100], [60, 90], [30, 50], [20, 50]])
df = pd.DataFrame(data,
                  index=["Very Satisfied", "Satisfied", "Neutral", "Unsatisfied", "Very Unsatisfied"],
                  columns=["Smart Thermostat", "Smart Light"])

# Step 2: Perform the Chi-Square test
chi2, p, dof, expected = chi2_contingency(df)



In [11]:
df.head()

Unnamed: 0,Smart Thermostat,Smart Light
Very Satisfied,50,70
Satisfied,80,100
Neutral,60,90
Unsatisfied,30,50
Very Unsatisfied,20,50


In [12]:
# Step 3: Output results
print("Chi-Square Statistic: ", chi2)
print("P-Value: ", p)

Chi-Square Statistic:  5.638227513227513
P-Value:  0.22784371130697179


In [13]:
print("Degrees of Freedom: ", dof)
# degree of fredem 5*1, =4-1 *2-1 =4

Degrees of Freedom:  4


In [14]:
expected    # expected frequency

array([[ 48.,  72.],
       [ 72., 108.],
       [ 60.,  90.],
       [ 32.,  48.],
       [ 28.,  42.]])

In [15]:
print("            Expected Frequencies Table:")
print(pd.DataFrame(expected, index=df.index, columns=df.columns))

            Expected Frequencies Table:
                  Smart Thermostat  Smart Light
Very Satisfied                48.0         72.0
Satisfied                     72.0        108.0
Neutral                       60.0         90.0
Unsatisfied                   32.0         48.0
Very Unsatisfied              28.0         42.0


In [16]:
# Step 4: Decision
alpha = 0.05
if p < alpha:
    print("Reject the null hypothesis: There is a significant association between device type and customer satisfaction.")
else:
    print("Fail to reject the null hypothesis (H0): There is no significant association between device type and customer satisfaction.")


Fail to reject the null hypothesis (H0): There is no significant association between device type and customer satisfaction.


In [17]:
# p value < Alpha                            --> Reject the H0
# p value > Alpha                            --> Accept the H0
# X^2 calculated < X^2 critical or table     --> Accept the H0

# (alpha) of 0.05
# P-Value:  0.2278
# p > alpha
# Smart Thermostat", "Smart Light are inDependent or norelationship b/w them

We compare the computed Chi-Square statistic to the critical value. If the Chi-Square statistic is greater than the critical value, we reject the null hypothesis.

### Conclusion
### Hypothesis:

Null Hypothesis (H0): No association between the type of smart home device and customer satisfaction.(Independent)

Alternative Hypothesis (H1): There is an association between the type of smart home device and customer satisfaction.(dependent)


Chi-Square Statistic:

 The Chi-Square statistic is calculated based on the observed and expected values, and it was found to be X (will be output in the code).


Critical Value:

With 4 degrees of freedom and a significance level of 0.05, the critical value is approximately 9.488.


Decision:

 Based on the comparison between the Chi-Square statistic and the critical value, we either reject or fail to reject the null hypothesis.

# Hypothesis test


### 1.State the Hypotheses
Null Hypothesis (𝐻0): The variables are independent.

Alternative Hypothesis (𝐻1): The variables are not independent.

We are testing if the actual weekly operating costs are higher than the theoretical costs. This suggests a one-tailed test. The hypotheses are:
### H0​:μ≤μ
        
Null Hypothesis (H₀): The actual mean weekly cost is less than or equal to the theoretical mean cost.      
### H1​:μ\>μ0​

Where:
*italicized text*
μ is the actual mean weekly cost.

μ0 is the theoretical mean weekly cost based on the given model.


### 2.Calculate the test statistic :

#### Given
ˉxˉ = sample mean weekly cost (Rs. 3,050) ,
μ = theoretical mean weekly cost according to the cost model (W = $1,000 + $5X for X = 600 units) ,
σ = 5*25 units ,
n = sample size (25 restaurants)

In [18]:

x_bar = 3050
mu_0 = 4000      # 1000 + 5 × 600 =4000
sigma = 125
n = 25
t = (x_bar - mu_0) / (sigma / (n ** 0.5))
t


-38.0

### 3.Determine the Critical Value
Using a 5**% significance level** (𝛼=0.05) and assuming the test is **one-tailed**, the critical value from the standard normal (Z) distribution table is approximately 1.645 for a one-tailed test.

### 4.Make a Decision
( If the test statistic t is greater than 1.645, we reject the null hypothesis.Otherwise, we fail to reject the null hypothesis. )

Compare the test statistic with the critical value:

Since 𝑡=−38 is less than 1.645, we fail to reject the null hypothesis.

### 5.Conclusion
Based on this decision, there is not strong evidence to support the restaurant owners' claim that the weekly operating costs are higher than the model suggests.