### 1. Hypothesis Testing of Proportion

In [3]:
from statsmodels.stats.proportion import proportions_ztest

#### Proportion

A sample of 300 companies found that 183 of CEO were male. Test the claim that most CEOs are male. use alpa = 0.05 

In [4]:
# We can use stats model from python to do the test
# https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_ztest.html


# can we assume anything from our sample
significance = 0.05

# our sample - 82% are good
sample_success = 183
sample_size = 300

# our Ho is  greater than equal to 50%
null_hypothesis = 0.50

In [5]:
stat, p_value = proportions_ztest(count=sample_success, nobs=sample_size, value=null_hypothesis, alternative='larger')

In [6]:

# report
print('z_stat: %0.3f, p_value: %0.5f' % (stat, p_value))

if p_value > significance:
    print("Fail to reject the null hypothesis - we have nothing else to say")
else:
    print("Reject the null hypothesis - suggest the alternative hypothesis is true")

z_stat: 3.906, p_value: 0.00005
Reject the null hypothesis - suggest the alternative hypothesis is true


### 2. Hypothesis testing of mean

#### You sample 465 M&Ms, the sample had mean of 0.8635. The population stdev is 0.0565. Test the claim that mean weight is greater than 0.8535. Significance level = 0.01 

In [7]:
import math 
x_mean = 0.8635
mu = 0.8535
sigma = 0.0565
n = 465

z_stat = (x_mean - mu)/(sigma/math.sqrt(n))

print(z_stat)

3.8166121509465203


In [8]:
import scipy.stats

alpha = 0.01
#find Z critical value
scipy.stats.norm.ppf(alpha)

-2.3263478740408408

In [9]:
#find p-value
p_value = scipy.stats.norm.sf(abs(z_stat))

print('p_value: %0.5f' % (p_value))

if p_value > alpha:
    print("Fail to reject the null hypothesis - we have nothing else to say")
else:
    print("Reject the null hypothesis - suggest the alternative hypothesis is true")

p_value: 0.00007
Reject the null hypothesis - suggest the alternative hypothesis is true


### Practice

The National Institute of Diabetes and Digestive and Kidney Diseases reports that the average cost of bariatric (weight loss) surgery is about \\$22,500. You think this information is incorrect. You randomly select 30 bariatric surgery patients and find that the average cost for their surgeries is \\$21,545, with a standard deviation of \\$3015. Is there enough evidence to support your claim at a = 0.05? Use a P-value.


In [28]:
## Z Test, Two tail

mu = 22500
n = 30
x_mean = 21545
sigma = 3015 
z_stat = (x_mean - mu)/(sigma/math.sqrt(n))
print(z_stat)

alpha = 0.05
#find Z critical value for two tail
scipy.stats.norm.ppf(1-0.05/2)


-1.734908930074407


1.959963984540054

In [31]:
#find p-value for Z test, two tail
#https://www.geeksforgeeks.org/how-to-find-a-p-value-from-a-z-score-in-python/

p_value = scipy.stats.norm.sf(abs(z_stat))*2

print('p_value: %0.5f' % (p_value))

if p_value > alpha:
    print("Fail to reject the null hypothesis - we have nothing else to say")
else:
    print("Reject the null hypothesis - suggest the alternative hypothesis is true")

p_value: 0.08276
Fail to reject the null hypothesis - we have nothing else to say


The U.S. Department of Agriculture claims that the mean cost of raising a child from birth to age 2 by husband-wife families in the United States is \\$13,120. A random sample of 500 children (age 2) has a mean cost of \\$12,925  with a standard deviation of \\$1745. At a = 0.10, is there enough evidence to reject the claim? 

In [35]:
## Z Test, Two Tail

mu = 13120
n = 500
x_mean = 12925
sigma = 1745
z_stat = (x_mean - mu)/(sigma/math.sqrt(n))
print(z_stat)

alpha = 0.10 
#find Z critical value for two tail
scipy.stats.norm.ppf(1-0.10/2)


-2.4987579118192493


1.6448536269514722

In [37]:
#find p-value for z test, two tail
p_value = scipy.stats.norm.sf(abs(z_stat))*2

print('p_value: %0.5f' % (p_value))

if p_value > alpha:
    print("Fail to reject the null hypothesis - we have nothing else to say")
else:
    print("Reject the null hypothesis - suggest the alternative hypothesis is true")

p_value: 0.01246
Reject the null hypothesis - suggest the alternative hypothesis is true


A used car dealer says that the mean price of a 2008 Honda CR-V is at least \\$20,500. You suspect this claim is incorrect and find that a random sample of 14 similar vehicles has a mean price of \\$19,850 and a standard deviation of \\$1084. Is there enough evidence to reject the dealer’s claim at $a = 0.05?$ Assume the population is normally distributed.

In [40]:
## T-test, Left tail

mu = 20500
n = 14
x_mean = 19850
sigma = 1084
t_stat = (x_mean - mu)/(sigma/math.sqrt(n))
print(t_stat)

alpha = 0.05 
df=n-1
#find t critical value for left tail
scipy.stats.t.ppf(q=0.05,df=13)


-2.2436137466817914


-1.7709333959867992

In [41]:
#find p-value for t-test, left tail
#https://www.geeksforgeeks.org/how-to-find-a-p-value-from-a-t-score-in-python/

p_value = scipy.stats.t.sf(abs(-t_stat), df=13)

print('p_value: %0.5f' % (p_value))

if p_value > alpha:
    print("Fail to reject the null hypothesis - we have nothing else to say")
else:
    print("Reject the null hypothesis - suggest the alternative hypothesis is true")

p_value: 0.02146
Reject the null hypothesis - suggest the alternative hypothesis is true
