## CHI-SQUARE TEST

In [2]:
## Mizzare Corporation has collected data on customer satisfaction levels for two types of smart home devices: Smart Thermostats and Smart Lights. They want to determine if there's a significant association between the type of device purchased and the customer's satisfaction level.

In [3]:
## Satisfaction	      Smart Thermostat	Smart Light	   Total
## Very Satisfied 	       50	             70      	120
## Satisfied	           80	             100      	180
## Neutral	               60	             90     	150
## Unsatisfied	           30	             50        	80
## Very Unsatisfied	       20	             50         70
## Total	               240         	     360	    600

In [6]:
## Null Hypothesis (𝐻0): There is no association between device type and customer satisfaction level.
## Alternative Hypothesis (𝐻𝑎): There is an association between device type and customer satisfaction level.

In [8]:
import numpy as np
import scipy.stats as stats

In [9]:
# Data from given table
observed = np.array([[50,70],[80,100],[60,90],[30,50],[20,50]])

In [10]:
alpha=0.05

In [11]:
# Chisquare statistic
expected = stats.chi2_contingency(observed)
expected

Chi2ContingencyResult(statistic=5.638227513227513, pvalue=0.22784371130697179, dof=4, expected_freq=array([[ 48.,  72.],
       [ 72., 108.],
       [ 60.,  90.],
       [ 32.,  48.],
       [ 28.,  42.]]))

In [12]:
## alpha=0.05, 1-alpha=0.95 df=4 where df=(row-1)*(column-1) or df=no.of categories-1

In [18]:
chi2_critical=stats.chi2.ppf(0.95,4)
chi2_critical

9.487729036781154

In [20]:
## since chisquare statistic(5.638227513227513) < chisquare critical(9.487729036781154), we fail to reject H0
## also since p value (0.22784371130697179) > alpha value (0.05), we fail to reject H0
## i.e, There is no association between device type and customer satisfaction level.

## Hypothesis Testing

In [24]:
## Bombay hospitality Ltd. operates a franchise model for producing exotic Norwegian dinners throughout New England. The operating cost for a franchise in a week (W) is given by the equation W = $1,000 + $5X, where X represents the number of units produced in a week. Recent feedback from restaurant owners suggests that this cost model may no longer be accurate, as their observed weekly operating costs are higher.

In [1]:
# one-tailed t test

In [3]:
import numpy as np
from scipy import stats

In [4]:
# Null Hypothesis (H₀): The actual mean weekly operating cost is equal to the theoretical mean cost.H0:μ=W=1000+5×600=4,000
# Alternative Hypothesis (H₁): The actual mean weekly operating cost is greater than the theoretical mean cost.Ho:μ>4,000

In [5]:
# Given data
sample_mean = 3050  # Mean weekly cost from the sample
population_mean = 4000  # Theoretical mean weekly cost
std_dev = 125  # Standard deviation of weekly costs
n = 25  # Sample size
units_produced_mean= 600 #units produced in a week with mean(μ)
alpha=0.05

In [6]:
# t statistic value
import math
t_stat=(sample_mean-population_mean)/(std_dev/math.sqrt(n))
t_stat

-38.0

In [7]:
# critical value for a one-tailed test
critical_value = stats.norm.ppf(1 - alpha)
critical_value

1.6448536269514722

In [8]:
p_value = stats.norm.cdf(t_stat) # For left-tailed test
p_value

0.0

In [15]:
# decision
# we reject the null hypothesis

In [19]:
# conclusion 
# eventhough the critical value>t statistic, we reject the null hypothesis
# Since the test statistic is much less than the critical value.
# ie. The actual mean weekly operating cost is greater than the theoretical mean cost.
# Ho:μ>4,000 