# T-Distribution and Comparing Two Means

## T-Distribution

* When $\sigma$ is unknown (almost always), use the t-distribution to address
the uncertainty of the standard error estimate
* Bell shaped but thicker tails than the normal distribution
    * Observations more likely to fall beyond 2 SDs from the mean (more conservative)
    * Extra thick tails helpful for mitigating the effect of a less reliable estimate for the standard error of the sampling distribution


* Always centered at 0 (like the standard normal)
* One parameter: **degrees of freedom (df)** - determines the thickness of tails (normal distribution has two parameters: mean and SD)
* When degrees of freedom increases, the shape of the t-distribution approaches the normal distribution

### T-Score

$$
T = \frac{obs-null}{SE}
$$

In [1]:
# P(|Z| > 2)
pnorm(2, lower.tail = FALSE) * 2

# P(|t_df=50| > 2)
pt(2, df = 50, lower.tail = FALSE) * 2

# P(|t_df=10| > 2)
pt(2, df = 10, lower.tail = FALSE) * 2

## Inference for a Mean

### Confidence Interval

$$
\bar{x} \pm t^{\star}_{df} SE_\bar{x} \\
\bar{x} \pm t^{\star}_{df} \frac{s}{\sqrt{n}} \\
\bar{x} \pm t^{\star}_{n-1} \frac{s}{\sqrt{n}}
$$

### Degrees of Freedom for T-Statistic for Inference on One Sample Mean

$$
df = n - 1
$$

In [2]:
# Critical t-score for 0.95 confidence interval with df = 21
qt((1-0.95)/2, df = 21)

**Example**

Suppose the suggested serving of these biscuits is 30 grams. Do these data provide convincing evidence that the amount of snacks consumed by distracted eaters post lunch is different than the suggested serving size?

* x̄ = 52.1
* s = 45.1
* n = 22
* t_21 = 2.08

In [4]:
# Confidence interval
52.1 + 2.08 * 45.1/sqrt(22)
52.1 - 2.08 * 45.1/sqrt(22)

In [7]:
# H0: mu = 30
# HA: mu != 30

# T-score
(t = (52.1 - 30) / (45.1/sqrt(22)))

# P-value
pt(t, df = 21, lower.tail = FALSE) * 2

## Inference for Comparing Two Independent Means

### Confidence Interval

$$
(\bar{x}_1 - \bar{x}_2) \pm t^{\star}_{df} SE_{(\bar{x}_1 - \bar{x}_2)} \\
(\bar{x}_1 - \bar{x}_2) \pm t^{\star}_{df} \sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}} \\
(\bar{x}_1 - \bar{x}_2) \pm t^{\star}_{min(n_1 - 1, n_2 - 1）} \sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}}
$$

### Standard Error of Difference between Two Independent Means

$$
SE_{(\bar{x}_1 - \bar{x}_2)} = \sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}}
$$

### Degrees of Freedom for T-Statistic for Inference on Difference of Two Means

$$
df = min(n_1 - 1, n_2 - 1）
$$

**Example**

* Solitaire
    * x̄ = 52.1
    * s = 45.1
    * n = 22
* No Distraction
    * x̄ = 27.1
    * s = 26.4
    * n = 22


In [12]:
# Confidence interval

(df = 22-1)
(t_21 = qt((1-0.95)/2, df = 21)) 

(se = sqrt(45.1^2/22 + 26.4^2/22))

(52.1 - 27.1) - 2.08 * se
(52.1 - 27.1) + 2.08 * se

# We are 95% confident that those who eat with distractions 
# consume 1.83 g and 48.17 g more snacks than those 
# who eat without distractions, on average.

In [14]:
# H0: mu1 - mu2 = 0
# HA: mu1 - mu2 != 0

# T-score
(t = ((52.1 - 27.1) - 0) / se)

# P-value
pt(t, df = 21, lower.tail = FALSE) * 2

## Inference for Comparing Two Paired Means

* Same as the inference for a single population mean, only the mean is a difference between the two paired means.


* $\mu_{\text{diff}}$ : Parameter of Interest
* $\bar{x}_{\text{diff}}$ : Point Estimate

**Example**

* x̄_diff = -0.545
* s_diff = 8.887
* n_diff = 200

In [18]:
# H0: mu_diff = 0
# HA: mu_diff != 0

# Standard error
se = 8.887 / sqrt(200)

# T-score
(t = (-0.545 - 0) / se)

# P-value
pt(t, df = 199) * 2

## Power

* $\alpha$ : Type I error - P(reject H0 | H0 true)
* $\beta$ : Type II error - P(fail to reject H0 | HA true)
* $1 - \beta$ : **Power** - P(reject H0 | HA true)

**Example**

* sd = 12
* n = 100 (per group)

In [21]:
# H0: mu1 - mu2 = 0
# HA: mu1 - mu2 != 0

# Standard error
sqrt(12^2/100 + 12^2/100)

* For what values of difference would we reject the null hypothesis at the 5% significance level?

In [23]:
1.96 * 1.7

* Suppose the company cares about finding any effect that is 3mmHg or larger vs
the standard medication.
* What is the power of the test that can detect this effect.


* effect size = -3

In [26]:
# Distribution with mu1 - mu2 = -3

(z = (-3.332 - (-3)) / 1.7)

# Power of the test
pnorm(z)

* What sample size will lead to a power of 80% for this test?

In [28]:
# Z-score that marks the 80th percentile of the normal curve
qnorm(0.8)

## Exercises

OpenIntro Statistics, 3rd edition<br>
5.1, 5.3, 5.5, 5.13, 5.17, 5.19, 5.21, 5.23, 5.27, 5.31, 5.35, 5.37<br>
5.39<br>
5.41, 5.43, 5.45, 5.47, 5.49, 5.51

**5.1 Identify the critical t.**

* An independent random sample is selected from an approximately normal population with unknown standard deviation. Find the degrees of freedom and the critical t-value (t*) for the given sample size and confidence level.
* (a) n = 6, CL = 90% 
* (b) n = 21, CL = 98% 
* (c) n = 29, CL = 95% 
* (d) n = 12, CL = 99%

In [1]:
# (a)
qt(.05, 5)

In [2]:
# (b)
qt(.01, 20)

In [3]:
# (c)
qt(.025, 28)

In [4]:
# (d)
qt(.005, 11)

**5.3 Find the p-value, Part I.** 
* An independent random sample is selected from an approximately normal population with an unknown standard deviation. Find the p-value for the given set of hypotheses and T test statistic. Also determine if the null hypothesis would be rejected at α = 0.05.
* (a) HA : μ > μ0, n = 11, T = 1.91 
* (b) HA : μ < μ0, n = 17, T = −3.45 
* (c) HA : μ != μ0, n = 7, T = 0.83 
* (d) HA : μ > μ0, n = 28, T = 2.1

In [6]:
# (a)
pt(1.91, 10, lower.tail = FALSE)
# H0 rejected

In [7]:
# (b)
pt(-3.45, 16)
# H0 rejected

In [9]:
# (c)
pt(0.83, 6, lower.tail = FALSE) * 2
# H0 not rejected

In [11]:
# (d)
pt(2.1, 27, lower.tail = FALSE)
# H0 rejected

**5.5 Working backwards, Part I.** 
* A 95% confidence interval for a population mean, μ, is given as (18.985, 21.015). This confidence interval is based on a simple random sample of 36 observations. 
* Calculate the sample mean and standard deviation. Assume that all conditions necessary for inference are satisfied. Use the t-distribution in any calculations

In [14]:
# t
qt(.025, 35)

In [16]:
# me
(21.015 - 18.985) / 2

In [22]:
# me = t * s / sqrt(n)
# 1.015 = 2.03 * s / sqrt(36)
# s
1.015 / 2.03 * sqrt(36)

In [18]:
# mean
18.985 + 1.015

**5.13 Car insurance savings.** 
* A market researcher wants to evaluate car insurance savings at a competing company. Based on past studies he is assuming that the standard deviation of savings is 100. 
He wants to collect data such that he can get a margin of error of no more than 10 at a 95% confidence level. How large of a sample should he collect?

In [27]:
# s = 100
# me = 10

# me = z * s / sqrt(n)

# 10 = 1.96 * 100 / sqrt(n)
(1.96 * 100 / 10)^2

# 385

**5.17 Paired or not, Part I?** 
* In each of the following scenarios, determine if the data are paired.
* (a) Compare pre- (beginning of semester) and post-test (end of semester) scores of students.
* (b) Assess gender-related salary gap by comparing salaries of randomly sampled men and women.
* (c) Compare artery thicknesses at the beginning of a study and after 2 years of taking Vitamin E for the same group of patients.
* (d) Assess effectiveness of a diet regimen by comparing the before and after weights of subjects.

In [28]:
# (a)
# Yes, pre- and post-test scores are paired because they're scores of the same student, and thus are not independent.

# (b)
# No, as data are randomly sampled, they're assuemed to be independent.

# (c)
# Yes, because they're of the same group of patients.

# (d)
# Yes, because they're of the same subjects.

**5.19 Global warming, Part I.** 
* Is there strong evidence of global warming? Let’s consider a small scale example, comparing how temperatures have changed in the US from 1968 to 2008. The daily high temperature reading on January 1 was collected in 1968 and 2008 for 51 randomly selected locations in the continental US. Then the difference between the two readings (temperature in 2008 - temperature in 1968) was calculated for each of the 51 different locations. The average of these 51 values was 1.1 degrees with a standard deviation of 4.9 degrees. We are interested in determining whether these data provide strong evidence of temperature warming in the continental US.
* (a) Is there a relationship between the observations collected in 1968 and 2008? Or are the observations in the two groups independent? Explain.
* (b) Write hypotheses for this research in symbols and in words.
* (c) Check the conditions required to complete this test.
* (d) Calculate the test statistic and find the p-value.
* (e) What do you conclude? Interpret your conclusion in context.
* (f) What type of error might we have made? Explain in context what the error means.
* (g) Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the temperature measurements from 1968 and 2008 to include 0? Explain your reasoning.

In [30]:
# (a)
# The observations collected in 1968 and 2008 are not independent. 
# Although the 51 locations are randomly selected, the observations collected
# in two different years are from the same locations.
# The data are paired.

In [31]:
# (b)
# H0: mu_diff = 0
# HA: mu_diff > 0

# The null hypothesis assume that the mean of the difference in temperature in 2008 and 1968 is 0.
# The alternative hypothesis assume that the mean of the difference greater than 0, i.e. warming.

In [36]:
# (d)
n = 51
x = 1.1
s = 4.9

(se = 4.9 / sqrt(n))

(t = 1.1 / se)

pt(t, 50, lower.tail = FALSE) 

In [37]:
# (e)
# Assume the significance level to be 0.05, the p-value is not large enough to reject the null hypothesis.
# Hence, there is not enough evidence to say that the temperature is warming in the continental US.

In [38]:
# (f)
# Type II error. There is actually temperature warming but we failed to reject the null hypothesis.

In [39]:
# (g)
# Yes. Since the null hypothesis is not rejected, the null hypothesis assume that the mean is 0. 
# Hence the confidence interval should include 0.

**5.21 Global warming, Part II.** 
* We considered the differences between the temperature readings in January 1 of 1968 and 2008 at 51 locations in the continental US in Exercise 5.19. The mean and standard deviation of the reported differences are 1.1 degrees and 4.9 degrees.
* (a) Calculate a 90% confidence interval for the average difference between the temperature measurements between 1968 and 2008.
* (b) Interpret this interval in context.
* (c) Does the confidence interval provide convincing evidence that the temperature was higher in 2008 than in 1968 in the continental US? Explain

In [46]:
# (a)
# t
qt(0.05, 50)

# me
(me = 1.6759 * se)

In [47]:
# ci
1.1 + me
1.1 - me

In [48]:
# (b)
# We are 90% confident that the average difference between the temperature measurements between 1968 and 2008 
# with a sample size of 51 is between -0.05 and 2.25.

In [49]:
# (c)
# The confidence interval also has a negative range. Hence there is no convincing evidence that the temperature 
# was higher in 2008 than in 1968.

**5.23 Gifted children.** 
* Researchers collected a simple random sample of 36 children who had been identified as gifted in a large city. The following histograms show the distributions of the IQ scores of mothers and fathers of these children. Also provided are some sample statistics.

|/|Mother|Father|Diff|
|---|---|---|---|
|Mean|118.2|114.8|3.4|
|SD|6.5|3.5|7.5|
|n|36|36|36|

* (a) Are the IQs of mothers and the IQs of fathers in this data set related? Explain.
* (b) Conduct a hypothesis test to evaluate if the scores are equal on average. Make sure to clearly state your hypotheses, check the relevant conditions, and state your conclusion in the context of the data

In [51]:
# (a)
# Yes. Since IQ could be a factor affecting marriage. The IQ of mothers and fathers are paired.

In [54]:
# (b)

# H0: mu_diff = 0
# HA: mu_diff != 0 

# The null hypothesis assumes that there are no difference between the average IQ of mother and father.
# The alternative assumes that there are difference between the average IQ of mother and father.

# A random sample of 36 obviously will be less than 10% of the population of a large city.
# The distribution of IQ difference is slightly skewed.
# But the sample size of 36 is large enough to meet the condition.

In [56]:
(se = 7.5/sqrt(36))
(t = (3.4-0)/se)
pt(t, 35, lower.tail = FALSE) * 2

In [57]:
# With a significance level of 0.05, a p-value of 0.01 is small enough to reject the null hypothesis
# and conclude that our data provide strong evidence that there are difference between the average IQ of mother and father,
# and the data indicate that mothers’ scores are higher than fathers’ scores for the parents of gifted children.

**5.27 Friday the 13th, Part I.** 
* In the early 1990’s, researchers in the UK collected data on traffic flow, number of shoppers, and traffic accident related emergency room admissions on Friday the 13th and the previous Friday, Friday the 6th. The histograms below show the distribution of number of cars passing by a specific intersection on Friday the 6th and Friday the 13th for many such date pairs. Also given are some sample statistics, where the difference is the number of cars on the 6th minus the number of cars on the 13th.

|/|6th|13th|Diff|
|---|---|---|---|
|Mean|128385|126550|1835|
|SD|7259|7664|1176|
|n|10|10|10|

* (a) Are there any underlying structures in these data that should be considered in an analysis? Explain.
* (b) What are the hypotheses for evaluating whether the number of people out on Friday the 6th is different than the number out on Friday the 13th? 
* (c) Check conditions to carry out the hypothesis test from part (b).
* (d) Calculate the test statistic and the p-value.
* (e) What is the conclusion of the hypothesis test? 
* (f) Interpret the p-value in this context.
* (g) What type of error might have been made in the conclusion of your test? Explain.

In [58]:
# (a)
# The number of cars on the 6th and the number of cars on the 13th should be paired.

In [59]:
# (b)
# H0: mu_diff = 0
# HA: mu_diff != 0

In [66]:
# (d)
(df = 10 - 1)
(se = 1176/sqrt(10))
(t = (1835 - 0) / se)

pt(t, df, lower.tail = FALSE) * 2

In [70]:
# (e)
# The p-value of 0.0008 is much smaller than the significance level of 0.05
# and hence reject the null hypothesis. 

In [68]:
# (f)
# The p-value is the probability of observing a difference of the mean of the number of cars
# on 6th and 13th as large as the observation difference under
# the assumption that the null hypothesis is true, i.e. there are no difference.

In [69]:
# (g)
# Type I error. There might actually be no difference but we wrongly 
# rejected the null hypothesis and state that there is a difference.

**5.31 Chicken diet and weight, Part I.** 

* Chicken farming is a multi-billion dollar industry, and any methods that increase the growth rate of young chicks can reduce consumer costs while increasing company profits, possibly by millions of dollars. An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens. Newly hatched chicks were randomly allocated into six groups, and each group was given a different feed supplement. Below are some summary statistics from this data set along with box plots showing the distribution of weights by feed type.

|/|Mean|SD|n|
|---|---|---|---|
|casein |323.58 |64.43 |12 |
|<mark>horsebean</mark> |160.20 |38.63 |10 |
|<mark>linseed</mark> |218.75 |52.24| 12 |
|meatmeal |276.91 |64.90 |11 |
|soybean |246.43 |54.13 |14 |
|sunflower |328.92 |48.84 |12 |

* (a) Describe the distributions of weights of chickens that were fed linseed and horsebean.
* (b) Do these data provide strong evidence that the average weights of chickens that were fed linseed and horsebean are different? Use a 5% significance level.
* (c) What type of error might we have committed? Explain.
* (d) Would your conclusion change if we used α = 0.01?

In [71]:
# (a)
# The distribution of chickens fed linseed is normal, whereas that of linseed is slightly skewed.

In [77]:
# (b)

# The newly hatched chicks were randomly allocated into groups, there are no evidence of dependent relationship between them.

# H0: mu_linseed - mu_horsebean = 0
# HA: mu_linseed - mu_horsebean != 0

df = min(12,10) - 1
(se = sqrt(52.24^2/12 + 38.63^2/10))

(t = (218.75-160.2)/se)

pt(t, df, lower.tail = FALSE) * 2

# Reject H0. There is strong evidence that average weights of checking fed linseed is different from horsebean.

In [78]:
# (c)
# Type I error. Failed to reject H0.

In [79]:
# (d)
# If alpha is 0.01, we failed to reject H0 with a p-value of 0.015.

**5.35 Gaming and distracted eating, Part I.**
* A group of researchers are interested in the possible effects of distracting stimuli during eating, such as an increase or decrease in the amount of food consumption. 
* To test this hypothesis, they monitored food intake for a group of 44 patients who were randomized into two equal groups. The treatment group ate lunch while playing solitaire, and the control group ate lunch without any added distractions. 
* Patients in the treatment group ate 52.1 grams of biscuits, with a standard deviation of 45.1 grams, and patients in the control group ate 27.1 grams of biscuits, with a standard deviation of 26.4 grams. 
* Do these data provide convincing evidence that the average food intake (measured in amount of biscuits consumed) is different for the patients in the treatment group? Assume that conditions for inference are satisfied.

In [80]:
# Since the two groups of patients are randomly assigned, they're independent of each other.

# H0: mu_a - mu_b = 0
# HA: mu_a - mu_b != 0

df = 22 - 1
(se = sqrt(45.1^2/22 + 26.4^2/22))
(t = (52.1-27.1-0)/se)

pt(t, df, lower.tail = FALSE) * 2

# Yes.

**5.37 Prison isolation experiment, Part I.**
* Subjects from Central Prison in Raleigh, NC, volunteered for an experiment involving an “isolation” experience. The goal of the experiment was to find a treatment that reduces subjects’ psychopathic deviant T scores. This score measures a person’s need for control or their rebellion against control, and it is part of a commonly used mental health test called the Minnesota Multiphasic Personality Inventory (MMPI) test. The experiment had three treatment groups: 
* (1) Four hours of sensory restriction plus a 15 minute “therapeutic” tape advising that professional help is available.
* (2) Four hours of sensory restriction plus a 15 minute “emotionally neutral” tape on training hunting dogs.
* (3) Four hours of sensory restriction but no taped message.
* Forty-two subjects were randomly assigned to these treatment groups, and an MMPI test was administered before and after the treatment. Distributions of the differences between pre and post treatment scores (pre - post) are shown below, along with some sample statistics. Use this information to independently test the effectiveness of each treatment. Make sure to clearly state your hypotheses, check conditions, and interpret results in the context of the data.

|/|Tr 1|Tr 2|Tr 3|
|---|---|---|---|
|Mean|6.21|2.86|-3.21|
|SD|12.3|7.94|8.57|
|n|14|14|14|

In [89]:
# The 42 subjects are randomly assigned, hence they're independent of each other.

# Treatment 1 

# H0: mu_diff_1 = 0
# HA: mu_diff_1 > 0

df = 13 - 1 
(se = 12.3/sqrt(14))
(t = (6.21-0)/se)

pt(t, df, lower.tail = FALSE) 

In [90]:
# Treatment 2

# H0: mu_diff_2 = 0
# HA: mu_diff_2 > 0

df = 13 - 1 
(se = 7.94/sqrt(14))
(t = (2.86-0)/se)

pt(t, df, lower.tail = FALSE) 

In [92]:
# Treatment 3

# H0: mu_diff_3 = 0
# HA: mu_diff_3 > 0

df = 13 - 1 
(se = 8.57/sqrt(14))
(t = (-3.21-0)/se)

pt(t, df, lower.tail = FALSE) 