### STATISTICAL HYPOTHESES

Null Hypothesis

$H_{0}$ : $\mu_{1} - \mu_{2} \leq 0$

Alternative Hypothesis (Research Hypothesis)

$H_{1}$ : $\mu_{1} - \mu_{2} > 0$

##### Two Other Possible Alternative Hypotheses

1. Another directional hypothesis, expressed as

$H_{1}$ : $\mu_{1} - \mu_{2} < 0$ translates into a one-tailed test with the lower tail critical.

2. A nondirectional hypothesis, expressed as

$H_{1}$ : $\mu_{1} - \mu_{2} \neq 0$ translates into a two-tailed test.

##### Progress Check *14.1 Identifying the treatment group with μ1, specify both the null and alternative hypotheses for each of the following studies. Select a directional alternative hypothesis only when a word or phrase justifies an exclusive concern about population mean differences in a particular direction.

(a) After randomly assigning migrant children to two groups, a school psychologist determines whether there is a difference in the mean reading scores between groups exposed to either a special bilingual or a traditional reading program.

Answer:

$H_{0}$ : $\mu_{1} - \mu_{2} = 0$

$H_{1}$ : $\mu_{1} - \mu_{2} \neq 0$

(b) On further reflection, the school psychologist decides that, because of the extra expense of the special bilingual program, the null hypothesis should be rejected only if there is evidence that reading scores are improved, on average, for the group exposed to the special
bilingual program.

Answer:

$H_{0}$ : $\mu_{1} - \mu_{2} \leq 0$

$H_{1}$ : $\mu_{1} - \mu_{2} > 0$

(c) An investigator wishes to determine whether, on average, cigarette consumption is reduced for smokers who chew caffeine gum. Smokers in attendance at an antismoking workshop are randomly assigned to two groups—one that chews caffeine gum and one that does not—and their daily cigarette consumption is monitored for six months after the workshop.

Answer:

$H_{0}$ : $\mu_{1} - \mu_{2} \geq 0$

$H_{1}$ : $\mu_{1} - \mu_{2} < 0$

(d) A political scientist determines whether males and females differ, on average, about the amount of money that, in their opinion, should be spent by the U.S. government on homeland security. After being informed about the size of the current budget for homeland security, in billions of dollars, randomly selected males and females are asked to indicate the percent by which they would alter this amount—for example, –8 percent for an 8 percent reduction, 0 percent for no change, 4 percent for a 4 percent increase.

Answer:

$H_{0}$ : $\mu_{1} - \mu_{2} = 0$

$H_{1}$ : $\mu_{1} - \mu_{2} \neq 0$

### Mean of the Sampling Distribution , $\mu_{\overline{X}_1-\overline{X}_2}$

The mean of the sampling distribution of X equals thepopulation mean, that is,

$\mu_{\overline{X}} = \mu$

Similarly, the mean of the new sampling distribution of $\overline{X}_1-\overline{X}_2$ equals the difference between population means, that is,

$\mu_{\overline{X}_1-\overline{X}_2} = \mu_{1} - \mu_{2}$

### Standard Error of the Sampling Distribution, (standard error of the difference between means) $\sigma_{\overline{X}_1-\overline{X}_2}$

$\sigma_\overline{X}$ = $\dfrac{\sigma}{\sqrt{n}} = \sqrt{\dfrac{\sigma^2}{n}}$

The standard deviation of the new sampling distribution of $\overline{X}_1 - \overline{X}_2$ equals

$\sigma_{\overline{X}_1-\overline{X}_2} = \sqrt{\dfrac{\sigma^2_1}{n_1} + \dfrac{\sigma^2_2}{n_2}}$

### t TEST

t Ratio

$t = \dfrac{(difference~between~sample~means)-(hypothesized~difference~between~population~means)}{estimated~standard~error}$

Expressed in symbols,

### t RATIO FOR TWO POPULATION MEANS (TWO INDEPENDENT SAMPLES)

$t = \dfrac{(\overline{X}_1 - \overline{X}_2)-(\mu_1 - \mu_2)_{hyp}}{S_{\overline{X}_1-\overline{X}_2}}$

### ESTIMATED STANDARD ERROR OF THE MEAN
$S_\overline{X}$ = $\dfrac{S}{\sqrt{n}}$

### Sample Sums of Squares
$SS$ = $\Sigma({X}-\overline{X})^2$ = $\Sigma{X}^2$ $-$ $\dfrac{(\Sigma{X})^2}{n}$

### THE POOLED VARIANCE

$S_P^2 = \dfrac{SS_1 + SS_2}{n_1 + n_2 - 2}$

### STANDARD ERROR

$S_{\overline{X}_1-\overline{X}_2} = \sqrt{\dfrac{S^2_P}{n_1} + \dfrac{S^2_P}{n_2}}$

##### Progress Check *14.2 Using Table B in Appendix C, find the critical t values for each of the following hypothesis tests:
(a) two-tailed test; α = .05; n1 = 12; n2 = 11; df = 12+11-2 = 21; t_value = $\pm$2.080

(b) one-tailed test, upper tail critical; α = .05; n1 = 15; n2 = 13; df = 15+13-2 = 26; t_value = +1.706

(c) one-tailed test, lower tail critical; α = .01; n1 = n2 = 25; df=25+25-2 = 48; t_value = -2.423

(d) two-tailed test; α = .01; n1 = 8; n2 = 10; df=8+10-2=16; t_value = $\pm$2.921

### Table 14.1 CALCULATIONS FOR THE t TEST: TWO INDEPENDENT SAMPLES (EPO EXPERIMENT)

In [1]:
import statistics
import math

X1 = [12,5,11,11,9,18]
X2 = [7,3,4,6,3,13]

n1 = len(X1)
n2 = len(X2)
u_hyp = 0

mean_X1 = statistics.mean(X1)
mean_X2 = statistics.mean(X2)
print(f'mean_X1 = {mean_X1}; mean_X2 = {mean_X2}')

sum_of_X1_list = sum(X1)
sum_of_X2_list = sum(X2)
print(f'sum_of_X1_list = {sum_of_X1_list}; sum_of_X2_list = {sum_of_X2_list}')

square_of_each_X1 = [num1**2 for num1 in X1]
square_of_each_X2 = [num2**2 for num2 in X2]

sum_of_square_of_each_X1 = sum(square_of_each_X1)
sum_of_square_of_each_X2 = sum(square_of_each_X2)
print(f'sum_of_square_of_each_X1 = {sum_of_square_of_each_X1}; sum_of_square_of_each_X2 = {sum_of_square_of_each_X2}')

SS1 = sum_of_square_of_each_X1 - (sum_of_X1_list**2/n1)
SS2 = sum_of_square_of_each_X2 - (sum_of_X2_list**2/n1)
print(f'SS1 = {SS1}; SS2 = {SS2}')

pooled_variance = (SS1+SS2)/(n1+n2-2)
print(f'pooled_variance = {pooled_variance}')

std_error = math.sqrt((pooled_variance/n1)+(pooled_variance/n2))
print(f'std_error = {std_error}')

t_ratio = ((mean_X1-mean_X2)-u_hyp)/std_error
print(f't_ratio = {t_ratio}')

mean_X1 = 11; mean_X2 = 6
sum_of_X1_list = 66; sum_of_X2_list = 36
sum_of_square_of_each_X1 = 816; sum_of_square_of_each_X2 = 288
SS1 = 90.0; SS2 = 72.0
pooled_variance = 16.2
std_error = 2.32379000772445
t_ratio = 2.151657414559676


#### Markdown for Table 14.1 CALCULATIONS FOR THE t TEST: TWO INDEPENDENT SAMPLES (EPO EXPERIMENT)

$\overline{X}_1 = 11; \overline{X}_2 = 6$

$\Sigma{X}_1 = 66; \Sigma{X}_2 = 36$

$\Sigma{X}_1^2 = 816; \Sigma{X}_2^2 = 288$

$SS_1 = 90; SS_2 = 72$
    
$S_P^2 = 16.2$

$S_{\overline{X}_1-\overline{X}_2} = 2.32379000772445$

$t = 2.151657414559676$

For a one-tailed test at 95% confidence and significance level/p-value of 0.05, degrees of freedom df=6+6-2=10, t_value = 1.812. 
Decision: Reject H0 at the .05 level of significance because t = 2.151657414559676 exceeds 1.812

Interpretation: The difference between population means is greater than zero. There is evidence that EPO increases the mean endurance scores of treatment patients.

##### Progress Check *14.3 A psychologist investigates the effect of instructions on the time required to solve a puzzle. Each of 20 volunteers is given the same puzzle to be solved as rapidly as possible. Subjects are randomly assigned, in equal numbers, to receive two different sets of instructions prior to the task. One group is told that the task is difficult (X1), and the other group is told that the task is easy (X2). The score for each subject reflects the time in minutes required to solve the puzzle. Use a t to test the null hypothesis at the .05 level of significance.

In [2]:
import statistics
import math

X1 = [5,20,7,23,30,24,9,8,20,12]
X2 = [13,6,6,5,3,6,10,20,9,12]

n1 = len(X1)
n2 = len(X2)
u_hyp = 0

mean_X1 = statistics.mean(X1)
mean_X2 = statistics.mean(X2)
print(f'mean_X1 = {mean_X1}; mean_X2 = {mean_X2}')

sum_of_X1_list = sum(X1)
sum_of_X2_list = sum(X2)
print(f'sum_of_X1_list = {sum_of_X1_list}; sum_of_X2_list = {sum_of_X2_list}')

square_of_each_X1 = [num1**2 for num1 in X1]
square_of_each_X2 = [num2**2 for num2 in X2]

sum_of_square_of_each_X1 = sum(square_of_each_X1)
sum_of_square_of_each_X2 = sum(square_of_each_X2)
print(f'sum_of_square_of_each_X1 = {sum_of_square_of_each_X1}; sum_of_square_of_each_X2 = {sum_of_square_of_each_X2}')

SS1 = sum_of_square_of_each_X1 - (sum_of_X1_list**2/n1)
SS2 = sum_of_square_of_each_X2 - (sum_of_X2_list**2/n1)
print(f'SS1 = {SS1}; SS2 = {SS2}')

pooled_variance = (SS1+SS2)/(n1+n2-2)
print(f'pooled_variance = {pooled_variance}')

std_error = math.sqrt((pooled_variance/n1)+(pooled_variance/n2))
print(f'std_error = {std_error}')

t_ratio = ((mean_X1-mean_X2)-u_hyp)/std_error
print(f't_ratio = {t_ratio}')

mean_X1 = 15.8; mean_X2 = 9
sum_of_X1_list = 158; sum_of_X2_list = 90
sum_of_square_of_each_X1 = 3168; sum_of_square_of_each_X2 = 1036
SS1 = 671.5999999999999; SS2 = 226.0
pooled_variance = 49.86666666666666
std_error = 3.158058475287203
t_ratio = 2.1532216876958206


#### Markdown for Table 14.3
Statistical Hypothesis

$H_{0}$ : $\mu_{1} - \mu_{2} = 0$

$H_{1}$ : $\mu_{1} - \mu_{2} \neq 0$

$\overline{X}_1 = 15.8; \overline{X}_2 = 9$

$\Sigma{X}_1 = 158; \Sigma{X}_2 = 90$

$\Sigma{X}_1^2 = 3168; \Sigma{X}_2^2 = 1036$

$SS_1 = 671.5999999999999; SS_2 = 226.0$
    
$S_P^2 = 49.86666666666666$

$S_{\overline{X}_1-\overline{X}_2} = 3.158058475287203$

$t = 2.1532216876958206$

At 95% confidence level and a siginificance level/p-value of 0.05, degrees of freedom df=10+10-2=18, one-tailed test upper tail critical has a t_value = 2.101

Decision: Reject H0 at the .05 level of significance because t = 2.1532216876958206 exceeds 2.101.

Interpretation: Puzzle-solving times are longer, on average, for subjects who are told that the
puzzle is difficult than for those who are told that the puzzle is easy.

In [5]:
# Function to compute for t Test for two samples
X1 = [5,20,7,23,30,24,9,8,20,12]
X2 = [13,6,6,5,3,6,10,20,9,12]
def t_Test_two_samples(X1,X2):
    import statistics
    import math
    n1 = len(X1)
    n2 = len(X2)
    u_hyp = 0
    
    mean_X1 = statistics.mean(X1)
    mean_X2 = statistics.mean(X2)
    print(f'mean_X1 = {mean_X1}; mean_X2 = {mean_X2}')
    
    sum_of_X1_list = sum(X1)
    sum_of_X2_list = sum(X2)
    print(f'sum_of_X1_list = {sum_of_X1_list}; sum_of_X2_list = {sum_of_X2_list}')
    
    square_of_each_X1 = [num1**2 for num1 in X1]
    square_of_each_X2 = [num2**2 for num2 in X2]
    
    sum_of_square_of_each_X1 = sum(square_of_each_X1)
    sum_of_square_of_each_X2 = sum(square_of_each_X2)
    print(f'sum_of_square_of_each_X1 = {sum_of_square_of_each_X1}; sum_of_square_of_each_X2 = {sum_of_square_of_each_X2}')
    
    SS1 = sum_of_square_of_each_X1 - (sum_of_X1_list**2/n1)
    SS2 = sum_of_square_of_each_X2 - (sum_of_X2_list**2/n1)
    print(f'SS1 = {SS1}; SS2 = {SS2}')
    
    pooled_variance = (SS1+SS2)/(n1+n2-2)
    print(f'pooled_variance = {pooled_variance}')
    
    std_error = math.sqrt((pooled_variance/n1)+(pooled_variance/n2))
    print(f'std_error = {std_error}')
    
    t_ratio = ((mean_X1-mean_X2)-u_hyp)/std_error
    print(f't_ratio = {t_ratio}')
t_Test_two_samples(X1,X2)

mean_X1 = 15.8; mean_X2 = 9
sum_of_X1_list = 158; sum_of_X2_list = 90
sum_of_square_of_each_X1 = 3168; sum_of_square_of_each_X2 = 1036
SS1 = 671.5999999999999; SS2 = 226.0
pooled_variance = 49.86666666666666
std_error = 3.158058475287203
t_ratio = 2.1532216876958206


#### Progress Check *14.4 Find the approximate p-value for each of the following test results:
(a) one-tailed test, upper tail critical; df = 12; t = 4.61 Answer: p < 0.001

(b) one-tailed test, lower tail critical; df = 19; t = –2.41 Answer: p < 0.05

(c) two-tailed test; df = 15; t = 3.76 Answer: p < 0.01

(d) two-tailed test; df = 42; t = 1.305 Answer: p > 0.05

(e) one-tailed test, upper tail critical; df = 11; t = –4.23 (Be careful!) Answer: p > 0.05

#### Progress Check *14.5 Indicate which member of each of the following pairs of p-values describes the more rare test result:
(a1) p > .05(a2) p < .05

(b1) p < .001(b2) p < .01

(c1) p < .05(c2) p < .01

(d1) p < .10(d2) p < .20

(e1) p = .04(e2) p = .02

a2, b1, c2, d1, e2