## Chapter 12 [Onlinestatsbook.com](onlinestatsbook.com) :  "Test of Means"
------  


#### Below are selected formulas and exercises from chapter 11 of the infamous onlinestatsbook.com, a highly trusted resource for learning about statistics.  

#### The formulas and exercises were chosen based on difficulty and based on if using python to understand the concept or answer the question was deemed useful.

#### Please note the below does not include the questions from the case studies.  A separate notebook for each case study can be found in this repository or is forthcoming. 

In [2]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

from scipy import stats

### Section 1: "Contents"

Many, if not most experiments are designed to compare means. The experiment may involve only one sample mean that is to be compared to a specific value. Or the experiment could be testing differences among many different experimental conditions, and the experimenter could be interested in comparing each mean with each of the other means. This chapter covers methods of comparing means in many different experimental situations.

### Section 2: "Testing a Single Mean"

we wish to know the probability of obtaining a sample mean of 51 **or more** when the sampling distribution of the mean has a mean of 50 and a standard deviation (standard error) of 1.667. Note this was calculated for us.  To compute this probability, we will make the assumption that the sampling distribution of the mean is normally distributed. Using the online statsbook normal calculator we can find out the probability is .274293. Here's the image:

![alt text][img1]

[img1]:http://onlinestatbook.com/2/tests_of_means/graphics/sig_mean1.gif

We can also obtain this value with python.

In [5]:
1-stats.norm(50, 1.667).cdf(51)

0.2742930981455225

The test conducted above was a one-tailed test because it computed the probability of a sample mean being one or more points higher than the hypothesized mean of 50 and the area computed was the area above 51.

To test the two-tailed hypothesis, you would compute the probability of a sample mean differing by one or more in either direction from the hypothesized mean of 50. You would do so by computing the probability of a mean being less than or equal to 49 or greater than or equal to 51.  Here's what that looks like with the onlinestatsbook normal calc:

![alt text][img1]

[img1]:http://onlinestatbook.com/2/tests_of_means/graphics/sig_mean2.gif

Here's how to calculate that in python:


In [6]:
(1-stats.norm(50, 1.667).cdf(51))+stats.norm(50, 1.667).cdf(49)

0.54858619629104499

Can also leverage the survivial function which is 1-cdf to somewhat simplify:

In [16]:
stats.norm(50, 1.667).sf(51)+ stats.norm(50, 1.667).cdf(49)

0.5485861962910451

Typically σ is not known and is estimated in a sample by s, and σM is estimated by sM. Using an example from the ADHD study, we test the null hypothesis is that the mean difference score in the population is 0 meaning the drug makes no difference.

1.  calculate the t statistic using a special case of this formula:

  ![alt text][img1]

[img1]:http://onlinestatbook.com/2/tests_of_means/graphics/t_general.gif

  ![alt text][img2]

[img2]:http://onlinestatbook.com/2/tests_of_means/graphics/t_mean.gif

where t is the value we compute for the significance test, M is the sample mean, μ is the hypothesized value of the population mean, and sM is the estimated standard error of the mean. 

2.  The mean (M) of the N = 24 difference scores is 4.958, the hypothesized value of μ is 0, and the standard deviation (s) is 7.538. obtain sM and calculate

![alt text][img3]

[img3]: http://onlinestatbook.com/2/tests_of_means/graphics/se1.gif

Therefore, t = 4.96/1.54 = 3.22.

3. Use t to calculate the shaded area.

![alt text][img4]

[img4]: http://onlinestatbook.com/2/tests_of_means/graphics/t_calc.gif


This is rather easy to do with python:

In [42]:
(1-stats.t.cdf(3.22, df = 23))*2

0.003792872940570291

##### Question 1 out of 9.
You should do the test with Z rather than t when:

answer: The numbers are sampled from a normal distribution.

##### Question 2 out of 9.
Assume you know the standard deviation of test scores is 10 and that the distribution is normal. You sample 16 scores and find that the sample mean is 25. Find the p value for a two-tailed test of the hypothesis that the population mean is 20. 


In [64]:
#need to calculate the standard error of the mean
std_err = 10/16**.5
stats.norm(25, std_err).cdf(20)*2

0.045500263896358389

##### Question 3 out of 9.
What is the standard deviation of these sample data?



Y
 -2
  1
  3
  2
 -1
  0
  4
  6


In [82]:
x = np.asarray(list(map(int,"-2 1 3 2 -1 0 4 6".split())))
x.std(ddof=1)

2.6692695630078278

##### Question 4 out of 9.
What is the estimated standard error of the mean based on these sample data? (These are the same data as the previous question.)

In [83]:
stats.sem(x)

0.94372930440884362

##### Question 5 out of 9.
What is the value of t testing the null hypothesis that the population mean is 0? (These are the same data as the previous question.)

In [88]:
t = x.mean()/stats.sem(x)
t

1.7218920641845572

##### Question 6 out of 9.
What is the two-tailed probability value testing the null hypothesis that the population mean is 0? (These are the same data as the previous question.)

In [89]:
(1-stats.t.cdf(t, df = 7))*2

0.1287617132182699

#### Question 7 out of 9.
Using these data below, what is the t statistic for a single-sample t test (null hypothesis is that μ = 0)

Y
  2.29
  2.20
  2.28
  0.65
 -0.18
 -1.18
  1.22
 -0.07

In [91]:
x = np.asarray(list(map(float,"2.29 2.20 2.28 0.65 -0.18 -1.18 1.22 -0.07".split())))
x.mean()/stats.sem(x)

1.9368538470747008

##### Question 8 out of 9.
Using these data below, what is the t statistic for a single-sample t test (null hypothesis is that μ = .5)?

In [95]:
(x.mean()-.5)/stats.sem(x)

0.8623163452302065

##### Question 9 out of 9.
Using these data below, what is the two-tailed p value for a single-sample t test (null hypothesis is that μ = .5)?

Y
  0.05
  1.80
  0.23
  1.33
  0.17
  0.48
  0.47
  0.72

In [97]:
x = np.asarray(list(map(float,"0.05 1.80 0.23 1.33 0.17 0.48 0.47 0.72".split())))
t = (x.mean()-.5)/stats.sem(x)
(1-stats.t.cdf(t, df = len(x)-1))*2

0.4932915965578788

### Section 4: "Difference between Two Means (Independent Groups)"

It is much more common for a researcher to be interested in the difference between means than in the specific values of the means themselves.  Here's a table of stats for two groups

<div class="tableHolder300">
               <table>
                <tbody><tr>
                	<th>
                    	Group
                    </th>
                	<th>
                    	n
                    </th>
                	<th>
                    	Mean
                    </th>
                	<th>
                    	Variance
                    </th>
                </tr>
                <tr>
                	<td>
                    	Females
                    </td>
                	<td>
                    	17
                    </td>
                	<td>
                    	5.353
                    </td>
                	<td>
                    	2.743
                    </td>
                </tr>
                <tr>
                	<td>
                    	Males
                    </td>
                	<td>
                    	17
                    </td>
                	<td>
                    	3.882
                    </td>
                	<td>
                    	2.985
                    </td>
                </tr>
              </tbody></table>
              </div>

In order to test whether there is a difference between population means, we are going to make three assumptions:
1.  The two populations have the same variance. This assumption is called the assumption of homogeneity of variance.
2.  The populations are normally distributed.
3.  Each value is sampled independently from each other value. This assumption requires that each subject provide only one value. If a subject provides two scores, then the scores are not independent. The analysis of data with two scores per subject is discussed later (correlated t test)

**Note**: small-to-moderate violations of assumptions 1 and 2 do not make much difference. It is important not to violate assumption 3.


As seen in the previous section:

  ![alt text][img1]

[img1]:http://onlinestatbook.com/2/tests_of_means/graphics/t_general.gif


In this case, our statistic is the difference between sample means and our hypothesized value is 0. The hypothesized value is the null hypothesis that the difference between population means is 0.



In [98]:
diff = 5.353-3.882

The next step is to compute the estimate of the standard error of the statistic. In this case, the statistic is the difference between means, so the estimated standard error of the statistic is (). Recall the formula for the standard error of the difference between means is:

  ![alt text][img1]

[img1]:http://onlinestatbook.com/2/sampling_distributions/graphics/equal_var.gif

we estimate σ2 and use that estimate in place of σ2. Since we are assuming the two population variances are the same, we estimate this variance by averaging our two sample variances. Thus, our estimate of variance is

  ![alt text][img2]

[img2]:http://onlinestatbook.com/2/estimation/graphics/MSE.gif



In [106]:
mse = (2.743 + 2.985)/2
mse

2.864

therefore the standard error would be:

In [108]:
std_err = ((2*mse)/17)**.5 #17 is the numer in each group
std_err

0.5804663439602578

and t is:

In [109]:
t = diff/std_err
t

2.534169319730126

Lastly, we compute the probability of getting a t as large or larger than 2.533 or as small or smaller than -2.533. To do this, we need to know the degrees of freedom. The degrees of freedom is the number of independent estimates of variance on which MSE is based. This is equal to (n1 - 1) + (n2 - 1)

In [110]:
df = 17*2-1

We can see below from onlinstatbook t distribution calculator that the probability value for a two-tailed test with t = t and df = df is 0.0164

  ![alt text][img1]

[img1]:http://onlinestatbook.com/2/tests_of_means/graphics/t_prob_2-tail.gif

We can also compute this fairly easily in python:

In [114]:
(1-stats.t.cdf(t,df))*2

0.016199890812905293

##### Computations for Unequal Sample Sizes (optional)

Problem with above is that MSE, the estimate of variance, counts the group with the larger sample size more than the group with the smaller sample size. 

for group_a = [3,4,5] and group_b = [2,4]:

  * M1 = 4 and M2 = 3

  * SSE = (3-4)2 + (4-4)2 + (5-4)2 + (2-3)2 + (4-3)2 = 4

Then, MSE is computed by: 

   * MSE = SSE/df

where the degrees of freedom (df) is computed as before: 

  * df = (n1 - 1) + (n2 - 1) = (3 - 1) + (2 - 1) = 3. 
  * MSE = SSE/df = 4/3 = 1.333.
  
the formula for the standard error is replaced by:

  ![alt text][img1]

[img1]:http://onlinestatbook.com/2/estimation/graphics/sed_uneq.gif

where nh is the harmonic mean and is calculated as follows:

  ![alt text][img2]

[img2]:http://onlinestatbook.com/2/estimation/graphics/nh.gif   


Therefore nh = 1.054 and:


  * t = (4-3)/1.054 = 0.949

  * and the two-tailed p = 0.413.
  
  
##### Question 8 out of 9.
If there are 4 scores per group and the t value is 2.34, what is the p value for a two-tailed test (to 3 decimal places)? 



In [116]:
(1-stats.t.cdf(2.34,6))*2

0.057843940267015004

##### Question 9 out of 9.
What is the t for an independent-groups t test for these data? 



  G 1	  G 2
 54	 44
 39	 47
 40	 45
 45	 30
 56	 38
 51	 17
 67	 37
 48	 53
 
 note:  this data is displaying funky.  every other number is in a separate group.  

In [138]:
x = list(map(int,"54 44 39 47 40 45 45 30 56 38 51 17 67 37 48 53".split()))
x

[54, 44, 39, 47, 40, 45, 45, 30, 56, 38, 51, 17, 67, 37, 48, 53]

In [139]:
g_a = []
g_b = []
for i in range(0,len(x)):
    if i%2==0:
        g_a.append(x[i])
    else:
        g_b.append(x[i])
print(g_a)
print(g_b)

[54, 39, 40, 45, 56, 51, 67, 48]
[44, 47, 45, 30, 38, 17, 37, 53]


In [153]:
diff = np.mean(g_a)-np.mean(g_b)
mse = (np.var(g_a, ddof=1) + np.var(g_b, ddof=1))/2
std_err = ((2*mse)/len(g_a))**.5
t = diff/std_err
t

2.1619306640686724

### Section 6: "All Pairwise Comparisons Among Means"

Many experiments are designed to compare more than two conditions. An obvious way to proceed would be to do a t test of the difference between each group mean and each of the other group means. The problem with this approach is that if you did this analysis, you would have six chances to make a Type I error. The more means that are compared, the more the Type I error rate is inflated. Figure 1 shows the number of possible comparisons between pairs of means (pairwise comparisons) as a function of the number of means. 

  ![alt text][img1]

[img1]:http://onlinestatbook.com/2/tests_of_means/graphics/number_of_comparisons.gif

The figure below shows the probability of a Type I error as a function of the number of means. 

  ![alt text][img2]

[img2]:http://onlinestatbook.com/2/tests_of_means/graphics/familywise.gif


The probability of a type I error rate is high even for a small number of means.  The Type I error rate can be controlled using a test called the **Tukey Honestly Significant Difference test or Tukey HSD** for short. The Tukey HSD is based on a variation of the t distribution that takes into account the number of means being compared. This distribution is called the **studentized range distribution.**

Note:  The assumptions of the Tukey test are essentially the same as for an independent-groups t test: normality, homogeneity of variance, and independent observations. The test is quite robust to violations of normality. Violating homogeneity of variance can be more problematical than in the two-sample case since the MSE is based on data from all groups. The assumption of independence of observations is important and should not be violated.

Using the leniency study we will do an example using the data below:

<table>
  <tbody><tr>
   <th>
                        	Condition
                        </th>
                        <th> 
                        	Mean
                        </th>
                        <th> 
                        	Variance
                        </th>
                      </tr>
                      <tr> 
                        <td> 
                        	False
                        </td>
                        <td> 
                        	5.37
                        </td>
                        <td> 
                        	3.34
                        </td>
                      </tr>
                      <tr> 
                        <td> 
                        	Felt
                        </td>
                        <td> 
                        	4.91
                        </td>
                        <td> 
                        	2.83
                        </td>
                      </tr>
                      <tr> 
                        <td> 
                       		Miserable
                        </td>
                        <td> 
                        	4.91
                        </td>
                        <td> 
                        	2.11
                        </td>
                      </tr>
                      <tr> 
                        <td> 
                        	Neutral
                        </td>
                        <td> 
                        	4.12
                        </td>
                        <td> 
                        	2.32
                        </td>
                      </tr>
              </tbody></table>

1.  Compute MSE, which is simply the mean of the variances. It is equal to 2.65.

2.  Compute Q:

  ![alt text][img3]

[img3]:http://onlinestatbook.com/2/tests_of_means/graphics/ts_form.gif

   for each pair of means, where Mi is one mean, Mj is the other mean, and n is the number of scores in each group. For these data, there are 34 observations per group. The value in the denominator is 0.279.
   
   
3.  Compute p for each comparison using the Studentized Range Calculator. The degrees of freedom is equal to the total number of observations minus the number of means. For this experiment, df = 136 - 4 = 132.

<table>
               		 <tbody><tr>
                        <th>
                        	Comparison
                        </th>
                        <th> 
                        	M<sub>i</sub>-M<sub>j</sub>
                        </th>
                        <th> 
                        	Q
                        </th>
                        <th> 
                        	p
                        </th>
                      </tr>
                      <tr> 
                        <td> 
                        	False - Felt
                        </td>
                        <td> 
                        	0.46
                        </td>
                        <td> 
                        	1.65
                        </td>
                        <td> 
                        	0.649
                        </td>
                      </tr>
                      <tr> 
                        <td> 
                        	False - Miserable
                        </td>
                        <td> 
                        	0.46
                        </td>
                        <td> 
                        	1.65
                        </td>
                        <td> 
                        	0.649
                        </td>
                      </tr>
                      <tr> 
                        <td> 
                       		 False - Neutral
                        </td>
                        <td> 
                        	1.25
                        </td>
                        <td> 
                        	4.48
                        </td>
                        <td> 
                        	0.010
                        </td>
                      </tr>
                      <tr> 
                        <td> 
                        	Felt - Miserable
                        </td>
                        <td> 
                        	0.00
                        </td>
                        <td> 
                        	0.00
                        </td>
                        <td> 
                        	1.000
                        </td>
                      </tr>
                      <tr> 
                        <td> 
                        	Felt - Neutral
                        </td>
                        <td> 
                        	0.79
                        </td>
                        <td> 
                        	2.83
                        </td>
                        <td> 
                        	0.193
                        </td>
                      </tr>
                      <tr> 
                        <td> 
                        	Miserable - Neutral
                        </td>
                        <td> 
                        	0.79
                        </td>
                        <td> 
                        	2.83
                        </td>
                        <td> 
                        	0.193
                        </td>
                      </tr>
              </tbody></table>
      
The only significant comparison is between the false smile and the neutral smile.
 

Tukey for Unequal Sample Sizes (optional)
The calculation of MSE for unequal sample sizes is similar to its calculation in an independent-groups t test. Here are the steps:

1. Compute a Sum of Squares Error (SSE) using the following formula  

   ![alt text][img1]
   [img1]:http://onlinestatbook.com/2/tests_of_means/graphics/SSE.gif
    where Mi is the mean of the ith group and k is the number of groups. 

2. Compute the degrees of freedom error (dfe) by subtracting the number of groups (k) from the total number of observations (N). Therefore, dfe = N - k.

3.  Compute MSE by dividing SSE by dfe: MSE = SSE/dfe.

4.  For each comparison of means, use the harmonic mean of the n's for the two means (nh).

All other aspects of the calculations are the same as when you have equal sample sizes.

##### Question 4 out of 7.
Assume you did an experiment with 3 groups and 16 subjects per group. The sample variances in the three groups were 14, 16, and 18. The value of MSE would be


In [154]:
sum([14, 16, 18])/3

16.0

##### Question 5 out of 7.
Assume you did an experiment with 3 groups and 16 subjects per group. The sample variances in the three groups were 14, 16, and 18. Using Tukey's test to compare the means, what would be the value of Q for a comparison of the first mean (14) with the last mean (18)?


In [155]:
14-18/(16/16)**.5

-4.0

##### Question 6 out of 7.
Assume you did an experiment with 3 groups and 16 subjects per group. The sample variances in the three groups were 14, 16, and 18. Using Tukey's test to compare the means, what would be the df for the test?

In [156]:
16*3-3

45

##### Question 7 out of 7.
Assume you did an experiment with 3 groups and 16 subjects per group. The sample variances in the three groups were 14, 16, and 18. Using Tukey's test to compare the means, what would be the two-tailed probability for a comparison of the first mean (14) with the last mean (18)? 

In [168]:
from statsmodels.stats.libqsturng import psturng
mse = (14+16+18)/3
q = (18-14)/((mse/16)**.5)
n = 3
df = 45
psturng(q,n,df)

0.018714290629434527