<a href="https://colab.research.google.com/github/SURESHASWATHI/assignment/blob/main/hypothesis_testing_chisquare.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Perform a Chi-Square test for independence to determine if there is a significant association between the type of smart home device purchased (Smart Thermostats vs. Smart Lights) and customer satisfaction level, using the provided contingency table data. State the hypotheses, compute the Chi-Square statistic, determine the critical value at a 0.05 significance level, and make a decision based on the results. Provide a detailed report of the analysis.

## Load data

### Subtask:
Manually input the contingency table data into a format suitable for analysis, such as a Pandas DataFrame.


**Reasoning**:
Manually input the contingency table data into a dictionary and convert it to a Pandas DataFrame with appropriate row and column names.



In [1]:
import pandas as pd

data = {'Satisfied': [100, 150],
        'Neutral': [50, 70],
        'Dissatisfied': [30, 80]}

df = pd.DataFrame(data, index=['Smart Thermostats', 'Smart Lights'])

display(df)

Unnamed: 0,Satisfied,Neutral,Dissatisfied
Smart Thermostats,100,50,30
Smart Lights,150,70,80


## Hypotheses

### Subtask:
State the null and alternative hypotheses for the Chi-Square test.


**Reasoning**:
State the null and alternative hypotheses for the Chi-Square test.



In [2]:
print("Null Hypothesis (H₀): There is no significant association between the type of smart home device purchased and customer satisfaction level. The two variables are independent.")
print("Alternative Hypothesis (H₁): There is a significant association between the type of smart home device purchased and customer satisfaction level. The two variables are dependent.")

Null Hypothesis (H₀): There is no significant association between the type of smart home device purchased and customer satisfaction level. The two variables are independent.
Alternative Hypothesis (H₁): There is a significant association between the type of smart home device purchased and customer satisfaction level. The two variables are dependent.


## Calculate chi-square

### Subtask:
Compute the Chi-Square test statistic and the p-value using a statistical library.


**Reasoning**:
Import the chi2_contingency function and apply it to the dataframe to compute the Chi-Square statistic, p-value, degrees of freedom, and expected frequencies.



In [3]:
from scipy.stats import chi2_contingency

chi2_statistic, p_value, dof, expected_frequencies = chi2_contingency(df)

print(f"Chi-Square Statistic: {chi2_statistic}")
print(f"P-value: {p_value}")
print(f"Degrees of Freedom: {dof}")
print("Expected Frequencies:")
display(expected_frequencies)

Chi-Square Statistic: 6.4646464646464645
P-value: 0.03946570419995253
Degrees of Freedom: 2
Expected Frequencies:


array([[ 93.75,  45.  ,  41.25],
       [156.25,  75.  ,  68.75]])

## Determine critical value

### Subtask:
Determine the critical Chi-Square value for the given significance level (0.05) and degrees of freedom.


**Reasoning**:
Calculate the critical Chi-Square value using the degrees of freedom and significance level.



In [4]:
from scipy.stats import chi2

alpha = 0.05
critical_value = chi2.ppf(1 - alpha, dof)
print(f"Critical Chi-Square Value (alpha={alpha}, dof={dof}): {critical_value}")

Critical Chi-Square Value (alpha=0.05, dof=2): 5.991464547107979


## Decision

### Subtask:
Compare the calculated Chi-Square statistic to the critical value and the p-value to the significance level to make a decision about rejecting the null hypothesis.


**Reasoning**:
Compare the calculated Chi-Square statistic to the critical value and the p-value to the significance level, and print the decision based on these comparisons.



In [5]:
chi2_statistic = 6.4646464646464645
p_value = 0.03946570419995253
critical_value = 5.99146454710798

print("\n--- Decision ---")
if chi2_statistic > critical_value and p_value < alpha:
  print(f"Since the Chi-Square statistic ({chi2_statistic:.4f}) is greater than the critical value ({critical_value:.4f})")
  print(f"and the p-value ({p_value:.4f}) is less than the significance level ({alpha:.2f}),")
  print("we reject the null hypothesis.")
  print("Conclusion: There is a significant association between the type of smart home device purchased and customer satisfaction level.")
else:
  print(f"Since the Chi-Square statistic ({chi2_statistic:.4f}) is not greater than the critical value ({critical_value:.4f})")
  print(f"or the p-value ({p_value:.4f}) is not less than the significance level ({alpha:.2f}),")
  print("we fail to reject the null hypothesis.")
  print("Conclusion: There is no significant association between the type of smart home device purchased and customer satisfaction level.")


--- Decision ---
Since the Chi-Square statistic (6.4646) is greater than the critical value (5.9915)
and the p-value (0.0395) is less than the significance level (0.05),
we reject the null hypothesis.
Conclusion: There is a significant association between the type of smart home device purchased and customer satisfaction level.


## Report findings

### Subtask:
Summarize the results, including the hypotheses, Chi-Square statistic, p-value, critical value, and conclusion.


**Reasoning**:
Print a summary of the Chi-Square test results, including the hypotheses, statistic, p-value, critical value, and conclusion based on the previous calculations and decisions.



In [6]:
print("--- Chi-Square Test for Independence Summary ---")
print("\nHypotheses:")
print("Null Hypothesis (H₀): There is no significant association between the type of smart home device purchased and customer satisfaction level. The two variables are independent.")
print("Alternative Hypothesis (H₁): There is a significant association between the type of smart home device purchased and customer satisfaction level. The two variables are dependent.")

print(f"\nChi-Square Statistic: {chi2_statistic:.4f}")
print(f"Degrees of Freedom (dof): {dof}")
print(f"P-value: {p_value:.4f}")

print(f"\nSignificance Level (alpha): {alpha:.2f}")
print(f"Critical Chi-Square Value: {critical_value:.4f}")

print("\nDecision:")
if chi2_statistic > critical_value and p_value < alpha:
  print("Based on the results, we reject the null hypothesis.")
  print("Conclusion: There is a significant association between the type of smart home device purchased and customer satisfaction level.")
else:
  print("Based on the results, we fail to reject the null hypothesis.")
  print("Conclusion: There is no significant association between the type of smart home device purchased and customer satisfaction level.")

--- Chi-Square Test for Independence Summary ---

Hypotheses:
Null Hypothesis (H₀): There is no significant association between the type of smart home device purchased and customer satisfaction level. The two variables are independent.
Alternative Hypothesis (H₁): There is a significant association between the type of smart home device purchased and customer satisfaction level. The two variables are dependent.

Chi-Square Statistic: 6.4646
Degrees of Freedom (dof): 2
P-value: 0.0395

Significance Level (alpha): 0.05
Critical Chi-Square Value: 5.9915

Decision:
Based on the results, we reject the null hypothesis.
Conclusion: There is a significant association between the type of smart home device purchased and customer satisfaction level.


## Summary:

### Q&A
*   **Is there a significant association between the type of smart home device purchased and customer satisfaction level?**
    Yes, there is a significant association between the type of smart home device purchased and customer satisfaction level.

### Data Analysis Key Findings
*   The null hypothesis (H₀) stated there is no significant association between the type of smart home device purchased and customer satisfaction level, while the alternative hypothesis (H₁) stated there is a significant association.
*   The calculated Chi-Square statistic is approximately 6.46.
*   The degrees of freedom for the test are 2.
*   The p-value for the test is approximately 0.0395.
*   At a significance level ($\alpha$) of 0.05, the critical Chi-Square value is approximately 5.9915.
*   Since the Chi-Square statistic (6.46) is greater than the critical value (5.9915) and the p-value (0.0395) is less than the significance level (0.05), the null hypothesis is rejected.

### Insights or Next Steps
*   The results suggest that the type of smart home device purchased (Smart Thermostats vs. Smart Lights) is not independent of customer satisfaction levels. Further investigation could explore which specific device type is associated with higher or lower satisfaction.
*   Analyze the expected frequencies versus the observed frequencies to understand where the significant differences in satisfaction lie between the two device types.
