# CHI-SQUARE TEST

Assignment Tasks:
1. State the Hypotheses:
2. Compute the Chi-Square Statistic:
3. Determine the Critical Value:
Using the significance level (alpha) of 0.05 and the degrees of freedom (which is the number of categories minus 1)
4. Make a Decision:
Compare the Chi-Square statistic with the critical value to decide whether to reject the null hypothesis.


### 1. State the Hypotheses:

H0 (Null Hypothesis): There is no significant association between the type of device and customer satisfaction.


H1 (Alternative Hypothesis): There is a significant association between the type of device and customer satisfaction.

In [6]:
import numpy as np
import scipy.stats as stats

# Data provided
observed = np.array([
    [50, 70],
    [80, 100],
    [60, 90],
    [30, 50],
    [20, 50]
])

# Step 1: State the Hypotheses
# H0 (Null Hypothesis): There is no significant association between the type of device and customer satisfaction.
# H1 (Alternative Hypothesis): There is a significant association between the type of device and customer satisfaction.

# Step 2: Compute the Chi-Square Statistic
chi2_stat, p_val, dof, expected = stats.chi2_contingency(observed)

# Step 3: Determine the Critical Value
alpha = 0.05
critical_value = stats.chi2.ppf(1 - alpha, dof)

# Step 4: Make a Decision
if chi2_stat > critical_value:
    decision = "Reject the null hypothesis"
else:
    decision = "Fail to reject the null hypothesis"

# Print the results
print(f"Chi-Square Statistic: {chi2_stat:.3f}")
print(f"P-Value: {p_val:.3f}")
print(f"Degrees of Freedom: {dof}")
print(f"Critical Value: {critical_value:.3f}")
print(f"Decision: {decision}")


Chi-Square Statistic: 5.638
P-Value: 0.228
Degrees of Freedom: 4
Critical Value: 9.488
Decision: Fail to reject the null hypothesis


In [5]:
print("Decision = ", decision)

Decision =  Fail to reject the null hypothesis


# Explanation of each part:

### 1. Hypotheses
Null Hypothesis (H₀)
- Definition: There is no significant association between the type of device purchased (Smart Thermostats vs. Smart Lights) and customer satisfaction levels.
- Implication: Any observed differences in satisfaction levels between the two types of devices are due to random chance rather than a real underlying relationship.

Alternative Hypothesis (H₁)
- Definition: There is a significant association between the type of device purchased (Smart Thermostats vs. Smart Lights) and customer satisfaction levels.
- Implication: The satisfaction levels differ between the two types of devices in a way that is not due to random chance.


# ---------------------------------------------------------------------------------------------------------

### 2.Chi-Square Statistic Calculation:
- Chi-Square Test          : The stats.chi2_contingency() function from the scipy.stats library performs the Chi-Square test                                for independence on this observed data. 

It computes:

1. Chi-Square Statistic (χ²): Measures how much the observed counts deviate from the expected counts if there were no                                        association between the variables.

2. P-Value                  : Indicates the probability of observing a Chi-Square statistic as extreme as, or more extreme                                   than, the one calculated, assuming the null hypothesis is true.

3. Degrees of Freedom (dof) : Represents the number of independent values or quantities which can vary in the analysis. For a                                 contingency table, it is calculated as (number of rows - 1) * (number of columns - 1).
 
4. Expected Frequencies     : The counts you would expect in each cell of the table if the null hypothesis were true.


# ---------------------------------------------------------------------------------------------------------




# 3.Determine the Critical Value:
- The critical value for the Chi-Square distribution is calculated using the degrees of freedom and the significance level (alpha). 
- Significance Level (α): Set at 0.05 (5%), which is a common choice for determining statistical significance.


# ---------------------------------------------------------------------------------------------------------



# 4.Making a Decision:

The decision is made by comparing the calculated Chi-Square statistic to the critical value.


        -  If the Chi-Square statistic is greater than the critical value: Reject the null hypothesis. This means there is a               significant association between the type of device and customer satisfaction.
                
        -  If the Chi-Square statistic is less than or equal to the critical value: Fail to reject the null hypothesis. This               means there is no significant association between the type of device and customer satisfaction.
        
        
        
# ---------------------------------------------------------------------------------------------------------

## Summary:
### Chi-Square Statistic (χ²): 5.638
### P-Value: 0.228
### Degrees of Freedom (dof): 4
### Critical Value: 9.488
### Decision: Fail to reject the null hypothesis
- This means the evidence is not strong enough to conclude a significant relationship between the type of device and customer satisfaction.
        

# ---------------------------------------------------------------------------------------------------------


# HYPOTHESIS TESTING

In [3]:
import scipy.stats as stats
import numpy as np

# Given data
sample_mean = 3050  # Mean weekly cost from the sample
theoretical_mean = 4000  # Theoretical mean weekly cost
std_deviation_per_unit = 5 * 25  # Standard deviation (5 units per unit * 25 units)
sample_size = 25  # Sample size

# Calculate the test statistic (t)
# Standard error (SE) = standard deviation / sqrt(sample size)
standard_error = std_deviation_per_unit / np.sqrt(sample_size)
test_statistic = (sample_mean - theoretical_mean) / standard_error

# Determine the critical value for a one-tailed test at alpha = 0.05
alpha = 0.05
critical_value = stats.norm.ppf(1 - alpha)  # ppf is the percentile point function, equivalent to inverse of cdf

# Print the results
print(f"Test Statistic (t): {test_statistic:.2f}")
print(f"Critical Value (Z): {critical_value:.2f}")

# Decision
if test_statistic > critical_value:
    print("Reject the null hypothesis. There is strong evidence to support the claim that weekly operating costs are higher.")
else:
    print("Fail to reject the null hypothesis. There is not enough evidence to support the claim that weekly operating costs are higher.")


Test Statistic (t): -38.00
Critical Value (Z): 1.64
Fail to reject the null hypothesis. There is not enough evidence to support the claim that weekly operating costs are higher.


### 1. State the Hypotheses statement:

**Null Hypothesis **: The average weekly operating cost is equal to the cost predicted by the model.

**Alternative Hypothesis **: The average weekly operating cost is greater than the cost predicted by the model.



### Explanation

- **Null Hypothesis (\(H_0\))**: This hypothesis states that the average weekly operating cost is exactly as predicted by the cost model, which is $4000.

- **Alternative Hypothesis (\(H_1\))**: This hypothesis suggests that the average weekly operating cost is greater than the $4000 predicted by the model. 

The objective is to test if there is enough statistical evidence to support the claim that the weekly operating costs are higher than predicted.



### Explanation

1. **Import Libraries**:
   - `scipy.stats` is used for statistical functions, and `numpy` is used for numerical operations.

2. **Given Data**:
   - `sample_mean`: The mean weekly cost observed in the sample.
   - `theoretical_mean`: The mean weekly cost predicted by the model.
   - `std_deviation_per_unit`: Standard deviation calculated from the cost model.
   - `sample_size`: Number of restaurants in the sample.

3. **Calculate the Test Statistic**:
   - Using the formula
   - The test statistic (t) is calculated using: = Sample Mean−Theoretical Mean/ Standard Error
 

4. **Determine the Critical Value**:
   - The critical value for a one-tailed test at α = 0.05 is obtained using the `ppf` function from `scipy.stats.norm`, which provides the quantile function (inverse of the cumulative distribution function).
Critical Value:
- For a one-tailed test at 𝛼 = 0.05, the critical value is approximately 1.645 (as obtained from the Z-distribution).

5. **Decision and Conclusion**:
   - Test Statistic: -38 (negative because the sample mean is less than the theoretical mean).
   - Critical Value: 1.645 (positive because it is for the upper tail).
   
## Comparison:
Since -38 (test statistic) is less than 1.645 (critical value), it falls in the non-rejection region for the null hypothesis in this one-tailed test. Thus, we fail to reject the null hypothesis.

# Conclusion
Given the calculations, we fail to reject the null hypothesis is correct. The result indicates that there isn't sufficient evidence to support the claim that the weekly operating costs are higher than what the model predicts.




# Thank you!

# ------------------------------------------------------------------------------------------------------