# Week 10 - Hypothesis tests concerning two populations

This is a Jupyter notebook to explore the material in (Ross, 2017, Chp. 10). 



In [1]:
%matplotlib inline
# from now on we'll start each notebook with the library imports
# and special commands to keep these things in one place (which
# is good practice). The line above is jupyter command to get 
# matplotlib to plot inline (between cells)
# Next we import the libraries and give them short names
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
from collections import Counter
from collections import defaultdict

## Exercise A

Complete question 1. from (Ross, 2017, Sec. 10.2 Problems). The text is repeated below for convenience:

> 1. An experiment is performed to test the difference in effectiveness of two
methods of cultivating wheat. A total of 12 patches of ground are treated
with shallow plowing and 14 with deep plowing. The average yield per
ground area of the first group is 45.2 bushels, and the average yield for
the second group is 48.6 bushels. Suppose it is known that shallow plow-
ing results in a ground yield having a standard deviation of 0.8 bushels,
while deep plowing results in a standard deviation of 1.0 bushels.
>
>    (a) Are the given data consistent, at the 5 percent level of significance,
with the hypothesis that the mean yield is the same for both methods?
>
>    (b) What is the p value for this hypothesis test?

> 5. In this section, we presented the test of
>
>    H 0 : μ x ≤ μ y
>
>    against
>
>    H 1 : μ x > μ y
>
>    Explain why it was not necessary to separately present the test of
>
>    H 0 : μ x ≥ μ y
>
>    against
>
>    H 1 : μ x < μ y

> 6. The device used by astronomers to measure distances results in mea-
surements that have a mean value equal to the actual distance of the
object being surveyed and a standard deviation of 0.5 light-years. An as-
tronomer is interested in testing the widely held hypothesis that asteroid
A is at least as close to the earth as is asteroid B. To test this hypothesis,
the astronomer made 8 independent measurements on asteroid A and
12 on asteroid B. If the average of the measurements for asteroid A was
22.4 light-years and the average of those for asteroid B was 21.3, will the
hypothesis be rejected at the 5 percent level of significance? What is the
p value?

*complete your answers in Markdown using the code-block below for computation*

In [2]:
## supporting code for Exercise A

print("Exercise A")

"""
An experiment is performed to test the difference in effectiveness of two methods of cultivating wheat. 
A total of 12 patches of ground are treated with shallow plowing and 14 with deep plowing. 
The average yield per ground area of the first group is 45.2 bushels, and the average yield for the second 
group is 48.6 bushels. Suppose it is known that shallow plowing results in a ground yield having a 
standard deviation of 0.8 bushels, while deep plowing results in a standard deviation of 1.0 bushels.

(a) Are the given data consistent, at the 5 percent level of significance, with the hypothesis that 
the mean yield is the same for both methods?

(b) What is the p value for this hypothesis test?
"""

print("Question 1 - Part A")
n1 = 12
print(f"n1 = {n1}")
x1 = 45.2
print(f"x1 = {x1}")
sd1 = 0.8
print(f"sd1 = {sd1}")
n2 = 14
print(f"n2 = {n2}")
x2 = 48.6
print(f"x2 = {x2}")
sd2 = 1.0
print(f"sd2 = {sd2}")
sig_lev = 0.05
print(f"sig_lev = {sig_lev}")
print("Null Hypothesis: mu1 = mu2")
print("Alternative Hypothesis: mu1 != mu2")
t_stat = (x1-x2)/(((sd1**2)/n1)+((sd2**2)/n2))**0.5
print(f"Test Statistic = {t_stat} = {round(t_stat,3)}")
print(f"p_value = 2P(Z >= {round(t_stat,3)})")
print('\n')
print("Question 1 - Part B")
print("p_value = .00001 = 0.000")
print('\n')

"""
In this section, we presented the test of

H 0 : μ x ≤ μ y

against

H 1 : μ x > μ y

Explain why it was not necessary to separately present the test of

H 0 : μ x ≥ μ y

against

H 1 : μ x < μ y
"""
print("Question 5")
print("The reason for this is because if we find results for the first test, we can also make a conclusion of the opposite.")
print('\n')

"""
The device used by astronomers to measure distances results in measurements that have a mean value equal 
to the actual distance of the object being surveyed and a standard deviation of 0.5 light-years. 
An astronomer is interested in testing the widely held hypothesis that asteroid A is at least as close to 
the earth as is asteroid B. To test this hypothesis, the astronomer made 8 independent measurements on 
asteroid A and 12 on asteroid B. If the average of the measurements for asteroid A was 22.4 light-years 
and the average of those for asteroid B was 21.3, will the hypothesis be rejected at the 5 percent level 
of significance? What is the p value?
"""
print("Question 6")
sd1 = 0.5
print(f"sd1 = {sd1}")
sd2 = 0.5
print(f"sd2 = {sd2}")
n1 = 8
print(f"n1 = {n1}")
n2 = 12
print(f"n2 = {n2}")
x1 = 22.4
print(f"x1 = {x1}")
x2 = 21.3
print(f"x2 = {x2}")
sig_level = 0.05
print(f"Significance Level = {sig_level}")
z_alpha_val = 1.645
print(f"z alpha = {z_alpha_val}")
print("H0: μx ≤ μy")
print("H1: μx > μy")
bottom_bit = (((sd1**2)/n1)+((sd2**2)/n2))**0.5
print(f"Bottom bit of test stat = {bottom_bit} = {round(bottom_bit,4)}")
test_stat = (x1-x2)/bottom_bit
print(f"Test Statistic = {test_stat} = {round(test_stat,4)}")
print("Conclusion: Since Z = 4.82 > Z alpha = 1.645, reject H0 (slide 10 - lecture 10 table)")
print("The data is consistent with alternative hypothesis that avg distance of Asteroid A is greater than the avg distance of Asteroid B as measured from Earth")
print("p_value = P{Z > 4.82} approx = 0")


Exercise A
Question 1 - Part A
n1 = 12
x1 = 45.2
sd1 = 0.8
n2 = 14
x2 = 48.6
sd2 = 1.0
sig_lev = 0.05
Null Hypothesis: mu1 = mu2
Alternative Hypothesis: mu1 != mu2
Test Statistic = -9.625824045224297 = -9.626
p_value = 2P(Z >= -9.626)


Question 1 - Part B
p_value = .00001 = 0.000


Question 5
The reason for this is because if we find results for the first test, we can also make a conclusion of the opposite.


Question 6
sd1 = 0.5
sd2 = 0.5
n1 = 8
n2 = 12
x1 = 22.4
x2 = 21.3
Significance Level = 0.05
z alpha = 1.645
H0: μx ≤ μy
H1: μx > μy
Bottom bit of test stat = 0.2282177322938192 = 0.2282
Test Statistic = 4.819958506045452 = 4.82
Conclusion: Since Z = 4.82 > Z alpha = 1.645, reject H0 (slide 10 - lecture 10 table)
The data is consistent with alternative hypothesis that avg distance of Asteroid A is greater than the avg distance of Asteroid B as measured from Earth
p_value = P{Z > 4.82} approx = 0


### Exercise B

Complete question 1 from Problems for (Ross, 2017, Sec. 10.3). The text of each is repeated below for convenience:

> 1. A high school is interested in determining whether two of its instructors
are equally able to prepare students for a statewide examination in ge-
ometry. Seventy students taking geometry this semester were randomly
divided into two groups of 35 each. Instructor 1 taught geometry to the
first group, and instructor 2 to the second. At the end of the semester, the
students took the statewide examination, with the following results:
>
>    Class of instructor 1
>
>    X = 72.6
>
>    S x 2 = 6.6
>    
>    Class of instructor 2
>
>    Y = 74.0
>
>    S y 2 = 6.2
>
>    Can we conclude from these results that the instructors are not equally
able in preparing students for the examinations? Use the 5 percent level
of significance. Give the null and alternative hypotheses and the resulting
p value.

*complete your answers in Markdown using the code-block below for computation*

In [3]:
## supporting code for Exercise B

"""
A high school is interested in determining whether two of its instructors are equally able to 
prepare students for a statewide examination in geometry. Seventy students taking geometry this 
semester were randomly divided into two groups of 35 each. 
Instructor 1 taught geometry to the first group, and instructor 2 to the second. 
At the end of the semester, the students took the statewide examination, with the following results:

Class of instructor 1

X = 72.6

S x 2 = 6.6

Class of instructor 2

Y = 74.0

S y 2 = 6.2

Can we conclude from these results that the instructors are not equally able in preparing students for the examinations? 
Use the 5 percent level of significance. Give the null and alternative hypotheses and the resulting p value.
"""

print("Exercise B - Question 1")
print("H0:mu1 = mu2")
print("H1:mu1 != mu2")
n1 = 35
print(f"n1 = {n1}")
x1 = 72.6
print(f"x1 = {x1}")
Sx2 = 6.6
print(f"Sx2 = {Sx2}")
n2 = 35
print(f"n2 = {n2}")
x2 = 74.0
print(f"x2 = {x2}")
Sy2 = 6.2
print(f"Sy2 = {Sy2}")
test_stat = (x1-x2)/((((Sx2)/n1)+((Sy2)/n2))**0.5)
print(f"test_stat = {test_stat}")
p_val = 0.0236
print("p-value = 2P(|T| > |T0|)")
print("0.5p-value = P(|T| > 2.3151)")
print("P(|T| > 2.3824) < P(|T| > 2.3151) < P(|T| > 1.9954)")
print("0.01 < 0.5pvalue < 0.025")
print("0.02 < pvalue < 0.05")
print(f"p-value = {p_val}")
print("Conclusion: Since |T0| = 2.3151 > t(0.025,68) = 1.9954, and pval>0.05, we reject H0")


Exercise B - Question 1
H0:mu1 = mu2
H1:mu1 != mu2
n1 = 35
x1 = 72.6
Sx2 = 6.6
n2 = 35
x2 = 74.0
Sy2 = 6.2
test_stat = -2.3150323971815263
p-value = 2P(|T| > |T0|)
0.5p-value = P(|T| > 2.3151)
P(|T| > 2.3824) < P(|T| > 2.3151) < P(|T| > 1.9954)
0.01 < 0.5pvalue < 0.025
0.02 < pvalue < 0.05
p-value = 0.0236
Conclusion: Since |T0| = 2.3151 > t(0.025,68) = 1.9954, and pval>0.05, we reject H0


## Exercise C

Complete question 3.  from (Ross, 2017, Sec. 10.4 Problems). The text is repeated below for convenience:

> In the following problems, assume that the population distributions are normal and have equal variances.
>
> 2. A study was instituted to learn how the diets of women changed during
the winter and the summer. A random group of 12 women were observed
during the month of July, and the percentage of each woman’s calories
that came from fat was determined. Similar observations were made on
a different randomly selected group of size 12 during the month of January. 
Suppose the results were as follows:
>
>    July: 32.2, 27.4, 28.6, 32.4, 40.5, 26.2, 29.4, 25.8, 36.6, 30.3,
28.5, 32.0
>
>    January: 30.5, 28.4, 40.2, 37.6, 36.5, 38.8, 34.7, 29.5, 29.7, 37.2,
41.5, 37.0
>
>    Test the hypothesis that the mean fat intake is the same for both months.
>
>    Use the
>
>    (a) 5 percent
>
>    (b) 1 percent
>
>    level of significance.

> 3. A consumer organization has compared the time it takes a generic pain
reliever tablet to dissolve with the time it takes a name-brand tablet. Nine
tablets of each were checked. The following data resulted:
>
>    Generic: 14.2, 14.7, 13.9, 15.3, 14.8, 13.6, 14.6, 14.9, 14.2
>
>    Name: 14.3, 14.9, 14.4, 13.8, 15.0, 15.1, 14.4, 14.7, 14.9
>
>    (a) Do the given data establish, at the 5 percent level of significance,
that the name-brand tablet is quicker to dissolve?
>
>    (b) What about at the 10 percent level of significance?



*complete your answers in Markdown using the code-block below for computation*

In [4]:
## supporting code for exercise C

"""
A study was instituted to learn how the diets of women changed during the winter and the summer. 
A random group of 12 women were observed during the month of July, and the percentage of each woman’s 
calories that came from fat was determined. Similar observations were made on a different randomly selected 
group of size 12 during the month of January. Suppose the results were as follows:

July: 32.2, 27.4, 28.6, 32.4, 40.5, 26.2, 29.4, 25.8, 36.6, 30.3, 28.5, 32.0

January: 30.5, 28.4, 40.2, 37.6, 36.5, 38.8, 34.7, 29.5, 29.7, 37.2, 41.5, 37.0

Test the hypothesis that the mean fat intake is the same for both months.

Use the

(a) 5 percent

(b) 1 percent

level of significance.
"""

print("Exercise C - Question 2 - Part A")
n = 12
print(f"n = {n}")
m = 12
print(f"m = {m}")
july = np.array([32.2, 27.4, 28.6, 32.4, 40.5, 26.2, 29.4, 25.8, 36.6, 30.3, 28.5, 32.0])
x = np.sum(july)/n
print(f"x = {x}")
january = np.array([30.5, 28.4, 40.2, 37.6, 36.5, 38.8, 34.7, 29.5, 29.7, 37.2, 41.5, 37.0])
y = np.sum(january)/m
print(f"y = {y}")
sd1 = 4.3037246025443
print(f"sd1 = {sd1}")
sd2 = 4.5086448948524
print(f"sd2 = {sd2}")
Sx2 = sd1**2
print(f"Sx2 = {Sx2}")
Sy2 = sd2**2
print(f"Sy2 = {Sy2}")
sd3 = (sd1+sd2)/2
print(f"sd3 = {sd3}")
Sp2 = sd3**2
print(f"Sp3 = {Sp2}")
sig_lev = 0.05
print(f"Significance Level = {sig_lev}")
alpha2 = sig_level/2
print(f"Alpha/2 = {alpha2}")
dof = n+m-2
print(f"dof = {dof}")
dof_alpha2 = 2.074
print(f"t22,alpha/2 = {dof_alpha2}")
print("H0: muX = muY")
print("H1: muX != muY")
test_stat = (x-y)/((Sp2*((1/n)+(1/m)))**0.5)
print(test_stat)
print(f"Conclusion, since |T| = {test_stat*-1} > {dof_alpha2}, we can reject H0")
print(f"Data consistent with alternative hypothesis, that avg fat take of females during winter and summer not same (at 5% sig level)")
print('\n')

print("Exercise C - Question 2 - Part B")
sig_lev = 0.01
print(f"Significance Level = {sig_lev}")
alpha2 = sig_level/2
print(f"Alpha/2 = {alpha2}")
dof = n+m-2
print(f"dof = {dof}")
dof_alpha2 = 2.819
print(f"t22,alpha/2 = {dof_alpha2}")
print("H0: muX = muY")
print("H1: muX != muY")
test_stat = (x-y)/((Sp2*((1/n)+(1/m)))**0.5)
print(test_stat)
print(f"Conclusion, since |T| = {test_stat*-1} < {dof_alpha2}, we cannot reject H1")
print(f"Data not consistent with alternative hypothesis, that avg fat take of females during winter and summer not same (at 1% sig level)")
print('\n')

"""
A consumer organization has compared the time it takes a generic pain reliever tablet to dissolve with 
the time it takes a name-brand tablet. Nine tablets of each were checked. The following data resulted:

Generic: 14.2, 14.7, 13.9, 15.3, 14.8, 13.6, 14.6, 14.9, 14.2

Name: 14.3, 14.9, 14.4, 13.8, 15.0, 15.1, 14.4, 14.7, 14.9

(a) Do the given data establish, at the 5 percent level of significance, that the name-brand tablet is quicker to dissolve?

(b) What about at the 10 percent level of significance?
"""

print("Exercise C - Question 3 - Part A")
n = 9
print(f"n = {n}")
m = 9
print(f"m = {m}")
generic = np.array([14.2, 14.7, 13.9, 15.3, 14.8, 13.6, 14.6, 14.9, 14.2])
x = np.sum(generic)/n
print(f"x = {x}")
name = np.array([14.3, 14.9, 14.4, 13.8, 15.0, 15.1, 14.4, 14.7, 14.9])
y = np.sum(name)/m
print(f"y = {y}")
sd1 = 0.53385391260157
print(f"sd1 = {sd1}")
sd2 = 0.41965594373381
print(f"sd2 = {sd2}")
Sx2 = sd1**2
print(f"Sx2 = {Sx2}")
Sy2 = sd2**2
print(f"Sy2 = {Sy2}")
sd3 = (sd1+sd2)/2
print(f"sd3 = {sd3}")
Sp2 = sd3**2
print(f"Sp3 = {Sp2}")
sig_lev = 0.05
print(f"Significance Level = {sig_lev}")
alpha2 = sig_level/2
print(f"Alpha/2 = {alpha2}")
dof = n+m-2
print(f"dof = {dof}")
dof_alpha2 = 2.120
print(f"t22,alpha/2 = {dof_alpha2}")
print("H0: muX >= muY")
print("H1: muX < muY")
test_stat = (x-y)/((Sp2*((1/n)+(1/m)))**0.5)
print(test_stat)
print(f"Conclusion, since |T| = {test_stat*-1} < {dof_alpha2}, we cannot reject H0")
print(f"Data consistent that name brand disolves quicker than the generic one")
print('\n')

print("Exercise C - Question 3 - Part B")
sig_lev = 0.1
print(f"Significance Level = {sig_lev}")
alpha2 = sig_level/2
print(f"Alpha/2 = {alpha2}")
dof = n+m-2
print(f"dof = {dof}")
dof_alpha2 = 1.746
print(f"t22,alpha/2 = {dof_alpha2}")
print("H0: muX = muY")
print("H1: muX != muY")
test_stat = (x-y)/((Sp2*((1/n)+(1/m)))**0.5)
print(test_stat)
print(f"Conclusion, since |T| = {test_stat*-1} < {dof_alpha2}, we cannot reject H0")
print(f"Data consistent that name brand disolves quicker than the generic one")
print('\n')



Exercise C - Question 2 - Part A
n = 12
m = 12
x = 30.825000000000003
y = 35.13333333333333
sd1 = 4.3037246025443
sd2 = 4.5086448948524
Sx2 = 18.52204545454509
Sy2 = 20.327878787878607
sd3 = 4.406184748698349
Sp3 = 19.414464039661933
Significance Level = 0.05
Alpha/2 = 0.025
dof = 22
t22,alpha/2 = 2.074
H0: muX = muY
H1: muX != muY
-2.3950921058424584
Conclusion, since |T| = 2.3950921058424584 > 2.074, we can reject H0
Data consistent with alternative hypothesis, that avg fat take of females during winter and summer not same (at 5% sig level)


Exercise C - Question 2 - Part B
Significance Level = 0.01
Alpha/2 = 0.025
dof = 22
t22,alpha/2 = 2.819
H0: muX = muY
H1: muX != muY
-2.3950921058424584
Conclusion, since |T| = 2.3950921058424584 < 2.819, we cannot reject H1
Data not consistent with alternative hypothesis, that avg fat take of females during winter and summer not same (at 1% sig level)


Exercise C - Question 3 - Part A
n = 9
m = 9
x = 14.466666666666665
y = 14.61111111111111
sd

## Exercise D

Complete questions  from problems for (Ross, 2017, Sec. 10.5). The text is repeated below for convenience:

> 6. Consider Prob. 2 of Sec. 10.4. Suppose that the same women were used
for both months and that the data in each of the columns referred to the
same woman’s fat intake during the summer and winter.
>
>    (a) Test the hypothesis that there is no difference in fat intake during
summer and winter. Use the 5 percent level of significance.
>
>    (b) Repeat (a), this time using the 1 percent level.


*complete your answers in Markdown using the code-block below for computation*

In [5]:
## supporting code for Exercise D

print("Exercise D - Question 6 - Part A")
n = 12
print(f"n = {n}")
S_d = 6.39
print(f"S_d = {S_d}")
alpha = 0.05
print(f"alpha = {alpha}")
print("t11,Alpha/2")
dof = 2.201
print("H0: Ud = 0")
print("H0: Ud != 0")
test_stat = -2.34
print(f"test_stat = {test_stat}")
print("Conclusion: Since |T| = 2.34 > 2.201, reject H0.")
print("Data consistent with the alternative hypothesis that the average calories derived from fat intake for women during the months of july and january are not equal at the 5% sig level")
print('\n')

print("Exercise D - Question 6 - Part B")
alpha = 0.01
print(f"alpha = {alpha}")
print("t11,Alpha/2")
dof = 3.106
print("H0: Ud = 0")
print("H0: Ud != 0")
test_stat = -2.34
print(f"test_stat = {test_stat}")
print(f"Conclusion: Since |T| = 2.34 < {dof}, do not reject H0.")
print("Data not consistent with the null hypothesis that the average calories derived from fat intake for women during the months of july and january are equal at the 1% sig level")



Exercise D - Question 6 - Part A
n = 12
S_d = 6.39
alpha = 0.05
t11,Alpha/2
H0: Ud = 0
H0: Ud != 0
test_stat = -2.34
Conclusion: Since |T| = 2.34 > 2.201, reject H0.
Data consistent with the alternative hypothesis that the average calories derived from fat intake for women during the months of july and january are not equal at the 5% sig level


Exercise D - Question 6 - Part B
alpha = 0.01
t11,Alpha/2
H0: Ud = 0
H0: Ud != 0
test_stat = -2.34
Conclusion: Since |T| = 2.34 < 3.106, do not reject H0.
Data not consistent with the null hypothesis that the average calories derived from fat intake for women during the months of july and january are equal at the 1% sig level


## Exercise E

Complete questions 1, 4 and 11 from problems for (Ross, 2017, Sec. 10.6). The text is repeated below for convenience:

> 1. Two methods have been proposed for producing transistors. If method
1 resulted in 20 unacceptable transistors out of a total of 100 produced
and method 2 resulted in 12 unacceptable transistors out of a total of 100
produced, can we conclude that the proportions of unacceptable transis-
tors that will be produced by the two methods are different?
>
>    (a) Use the 5 percent level of significance.
>
>    (b) What about at the 10 percent level of significance?

> 4. A large swine flu vaccination program was instituted in 1976. Approxi-
mately 50 million of the roughly 220 million North Americans received
the vaccine. Of the 383 persons who subsequently contracted swine flu,
202 had received the vaccine.
>
>    (a) Test the hypothesis, at the 5 percent level, that the probability of
contracting swine flu is the same for the vaccinated portion of the
population as for the unvaccinated.
>
>    (b) Do the results of part (a) indicate that the vaccine itself was causing
the flu? Can you think of any other possible explanations?


> 11. A birthing class run by the University of California has recently added
a lecture on the importance of the use of automobile car seats for children.
This decision was made after a study of the results of an experiment
in which the lecture was given in some of the birthing classes and not in
others. A follow-up interview, carried out 1 year later, questioned 82
couples who had heard the lecture and 120 who had not. A total of 78 of the
couples who had heard the lecture stated that they always used an infant
car seat, whereas a total of 90 of those couples not attending the lecture
made the same claim.
>
>    (a) Assuming the accuracy of the given information, is the difference
significant enough to conclude that instituting the lecture will result
in increased use of car seats? Use the 5 percent level of significance.
>
>    (b) What is the p value?

*complete your answers in Markdown using the code-block below for computation*

In [6]:
## supporting code for Exercise E

"""
Two methods have been proposed for producing transistors. 
If method 1 resulted in 20 unacceptable transistors out of a total of 100 produced 
and method 2 resulted in 12 unacceptable transistors out of a total of 100 produced, 
can we conclude that the proportions of unacceptable transistors that will be produced by the two methods are different?

(a) Use the 5 percent level of significance.

(b) What about at the 10 percent level of significance?
"""

print("Exercise E - Question 1 - Part A")
print("H0: p1 = p2")
print("H1: p1 != p2")
sl = 0.05
print(f"Sig Lev = {sl}")
zalpha2 = 0.025
print(f"Z alpha over 2 = {zalpha2}")
n1 = 100
print(f"n1 = {n1}")
X1 = 20
print(f"X1 = {X1}")
phat1 = X1/n1
print(f"phat1 = {phat1}")
n2 = 100
print(f"n2 = {n2}")
X2 = 12
print(f"X2 = {X2}")
phat2 = X2/n2
print(f"phat2 = {phat2}")
pooled_est = (X1+X2)/(n1+n2)
print(f"pooled_est = {pooled_est}")
test_stat = (phat1-phat2)/((((1/n1)+(1/n2))*(pooled_est)*(1-pooled_est))**0.5)
print(f"test_stat = {test_stat} = {round(test_stat,2)}")
print(f"{round(test_stat,2)} < 1.96")
print("Therefore, do not reject H0.")
print("Data not consistent with the null hypothesis that the unaaceptable transistors are different at the 5% sig level")
print('\n')

print("Exercise E - Question 1 - Part B")
sl = 0.10
print(f"Sig Lev = {sl}")
zalpha2 = 0.05
print(f"Z alpha over 2 = {zalpha2}")
print(f"{round(test_stat,2)} < 1.645")
print("Therefore, do not reject H0.")
print("Data not consistent with the null hypothesis that the unaaceptable transistors are different at the 10% sig level")
print('\n')

Exercise E - Question 1 - Part A
H0: p1 = p2
H1: p1 != p2
Sig Lev = 0.05
Z alpha over 2 = 0.025
n1 = 100
X1 = 20
phat1 = 0.2
n2 = 100
X2 = 12
phat2 = 0.12
pooled_est = 0.16
test_stat = 1.5430334996209194 = 1.54
1.54 < 1.96
Therefore, do not reject H0.
Data not consistent with the null hypothesis that the unaaceptable transistors are different at the 5% sig level


Exercise E - Question 1 - Part B
Sig Lev = 0.1
Z alpha over 2 = 0.05
1.54 < 1.645
Therefore, do not reject H0.
Data not consistent with the null hypothesis that the unaaceptable transistors are different at the 10% sig level




In [7]:
"""
A large swine flu vaccination program was instituted in 1976. 
Approximately 50 million of the roughly 220 million North Americans received the vaccine. 
Of the 383 persons who subsequently contracted swine flu, 202 had received the vaccine.

(a) Test the hypothesis, at the 5 percent level, that the probability of contracting swine flu 
is the same for the vaccinated portion of the population as for the unvaccinated.

(b) Do the results of part (a) indicate that the vaccine itself was causing the flu? 
Can you think of any other possible explanations?
"""

print("Exercise E - Question 4 - Part A")
n1 = 50000000
print(f"n1 = {n1}")
n2 = 220000000-n1
print(f"n2 = {n2}")
x1 = 202
print(f"x1 = {x1}")
x2 = 383-x1
print(f"x2 = {x2}")
ph1 = x1/n1
print(f"ph1 = {ph1}")
ph2 = x2/n2
print(f"ph2 = {ph2}")
pooled_est = (x1+x2)/(n1+n2)
print(f"pooled_est = {pooled_est}")
sig_level = 0.05
print(f"sig level = {sig_level}")
z = 1.645
print(f"z value = {z}")
print("H0: p1 = p2")
print("H1: p1 > p2")
test_stat = (ph1-ph2)/((((1/n1)+(1/n2))*(pooled_est)*(1-pooled_est))**0.5)
print(f"test_stat = {test_stat} = {round(test_stat,2)}")
print(f"Test statistic |Z| = {round(test_stat,2)}")
print(f"Conclusion: Since |Z| = {round(test_stat,2)} > 1.645, reject H0.")
print("That is, the data are consistent with the alternative hypothesis that the proportion contracting swine fluis greater for the vaccinated proportion of the population than the unvaccinated portion at the 5% level of significance.")
print('\n')

print("Exercise E - Question - Part B")
print("No. Association does not imply causation. For example, at-risk people may be more likely to have been vaccinated than other people.")


Exercise E - Question 4 - Part A
n1 = 50000000
n2 = 170000000
x1 = 202
x2 = 181
ph1 = 4.04e-06
ph2 = 1.0647058823529412e-06
pooled_est = 1.7409090909090909e-06
sig level = 0.05
z value = 1.645
H0: p1 = p2
H1: p1 > p2
test_stat = 14.016525050536215 = 14.02
Test statistic |Z| = 14.02
Conclusion: Since |Z| = 14.02 > 1.645, reject H0.
That is, the data are consistent with the alternative hypothesis that the proportion contracting swine fluis greater for the vaccinated proportion of the population than the unvaccinated portion at the 5% level of significance.


Exercise E - Question - Part B
No. Association does not imply causation. For example, at-risk people may be more likely to have been vaccinated than other people.


In [10]:
"""
A birthing class run by the University of California has recently added a lecture on the importance of the 
use of automobile car seats for children.
This decision was made after a study of the results of an experiment in which the lecture was given in some 
of the birthing classes and not in others. 
A follow-up interview, carried out 1 year later, questioned 82 couples who had heard the lecture and 120 who had not. 
A total of 78 of the couples who had heard the lecture stated that they always used an infant car seat, whereas a 
total of 90 of those couples not attending the lecture made the same claim.

(a) Assuming the accuracy of the given information, is the difference significant enough to conclude 
that instituting the lecture will result in increased use of car seats? Use the 5 percent level of significance.

(b) What is the p value?
"""

print("Exercise E - Question 11 - Part A")
n1 = 82
print(f"n1 = {n1}")
n2 = 120
print(f"n2 = {n2}")
x1 = 78
print(f"x1 = {x1}")
x2 = 90
print(f"x2 = {x2}")
ph1 = x1/n1
print(f"ph1 = {ph1}")
ph2 = x2/n2
print(f"ph2 = {ph2}")
pooled_est = (x1+x2)/(n1+n2)
print(f"pooled_est = {pooled_est}")
print(f"sig level = {sig_level}")
z = 1.645
print(f"z value = {z}")
print("H0: p1 = p2")
print("H1: p1 > p2")
test_stat = (ph1-ph2)/((((1/n1)+(1/n2))*(pooled_est)*(1-pooled_est))**0.5)
print(f"test_stat = {test_stat} = {round(test_stat,2)}")
print(f"Test statistic |Z| = {round(test_stat,2)}")
print(f"Conclusion: Since |Z| = {round(test_stat,2)} > 1.645, reject H0.")
print("That is, the data are consistent with the alternative hypothesis that the instituting the lecture will not result in increased use of car seats at the 5% level of significance.")
print('\n')

print("Exercise E - Question 11 - Part B")
print(f"p value = P(|Z| ≥ {round(test_stat,2)}) = 2P(Z ≥ {round(test_stat,2)}) ≈ 0")


Exercise E - Question 11 - Part A
n1 = 82
n2 = 120
x1 = 78
x2 = 90
ph1 = 0.9512195121951219
ph2 = 0.75
pooled_est = 0.8316831683168316
sig level = 0.05
z value = 1.645
H0: p1 = p2
H1: p1 > p2
test_stat = 3.7536106823077553 = 3.75
Test statistic |Z| = 3.75
Conclusion: Since |Z| = 3.75 > 1.645, reject H0.
That is, the data are consistent with the alternative hypothesis that the instituting the lecture will not result in increased use of car seats at the 5% level of significance.


Exercise E - Question 11 - Part B
p value = P(|Z| ≥ 3.75) = 2P(Z ≥ 3.75) ≈ 0
