[![Open In
Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/CU-Denver-MathStats-OER/Statistical-Theory/blob/main/Chap6/22-Permutation-Tests.ipynb)



# <a name="22intro">6.2 Permutation Tests</a>

---



<figure>
<img
src="https://upload.wikimedia.org/wikipedia/commons/b/b3/Permutation_generation_algorithms.svg"
alt="Permutation Illustration" width = "55%"/>
<figcaption aria-hidden="true">
Credit: Torsten Mütze, <a
href="https://creativecommons.org/licenses/by-sa/4.0">CC BY-SA 4.0</a>,
via Wikimedia Commons
</figcaption>
</figure>


# <a name="22summ">A Summary of Hypothesis Testing</a>

---

In the section [Introduction to Hypothesis
Tests](21-Intro-Hypothesis-Tests.ipynb), we walked through the process of
performing a statistical hypothesis test:

1.  State the <font color="dodgerblue">**null and alternative   hypotheses**</font> in terms of population parameter(s).
2.  Compute <font color="dodgerblue">**test statistic(s)**</font>   from random sample(s).
3.  Calculate the <font color="dodgerblue">**p-value**</font> to   help assess which claim is more likely to be true.
4.  What <font color="dodgerblue">**conclusions**</font> (if any)   can we make about the two competing claims?

See [Introduction to Hypothesis Tests](https://githubtocolab.com/CU-Denver-MathStats-OER/Statistical-Theory/blob/main/Chap6/21-Intro-Hypothesis-Tests.ipynb)
for a refresher on Steps 1 and 2. We also informally explored the
concept of [statistical significance](https://githubtocolab.com/CU-Denver-MathStats-OER/Statistical-Theory/blob/main/Chap6/21-Intro-Hypothesis-Tests.ipynb#21informal-p). Today, we
will mainly focus on Step 3 and discuss a resampling method, called a
<font color="dodgerblue">**permutation test**</font>, that we can
use to calculate p-value to assess significance.

## <a name="22null-dist">The Null Distribution and p-values</a>

---

Recall the logic of hypothesis test. We want to assess which of two
competing claims, the null hypothesis $H_0$ and the alternative
hypothesis $H_a$, are more likely to be true. We assume $H_0$ is true
and consider whether the sample data supports or refutes the null claim.

-   The <font color="dodgerblue">**null distribution**</font> is   the distribution of the test statistic **if the null hypothesis is   true**. We use the null distribution to calculate the p-value.
-   The <font color="dodgerblue">**p-value**</font> is the   **probability that we would get a random sample** with a test   statistic as or more extreme as the observed test statistic **if the   null hypothesis were true**.

$${\large {\color{dodgerblue}{p\mbox{-value} = P( \mbox{test statistic as or more extreme than observed} \ | \ H_0 \mbox{ is true} )}}}.$$

-   The **smaller the $p$-value**, the **less likely the sample** is   assuming $H_0$ is true.
-   There is evidence that **contradicts $H_0$ and supports $H_a$**.
-   The **smaller the $p$-value**, the **more statistically   significant** the result.

# <a name="22diner">Case Study: The Unscrupulous Diner's Dilemma</a>

---

We previously considered the [unscrupulous diner's
dilemma](https://githubtocolab.com/CU-Denver-MathStats-OER/Statistical-Theory/blob/main/Chap6/21-Intro-Hypothesis-Tests.ipynb#21q4)<sup>1</sup>.

> The unscrupulous diner's dilemma is a problem faced frequently in social settings. When a group of diners jointly enjoys a meal at a restaurant, often an unspoken agreement exists to divide the check equally. A selfish diner could thereby enjoy exceptional dinners at bargain prices. This dilemma typifies a class of serious social problems<sup>2</sup> from environmental protection and resource conservation to eliciting charity donations and slowing arms races.

Researchers wanted to test whether people order more food and beverages
when they know the bill is going to be split evenly compared to when
each person only pays for what they ordered.

<br>

<font size=2>1. Gneezy, U., E. Haruvy, and H. Yafe (2004), [“The Inefficiency of
Splitting the Bill”](https://rady.ucsd.edu/_files/faculty-research/uri-gneezy/splitting-bill.pdf), *The Economic Journal* 114.</font>

<font size=2>2. Glance, N., and B. Huberman, [“The Dynamics of Social
Dilemmas”](http://www.uvm.edu/~pdodds/files/papers/others/1994/glance1994a.pdf), *Scientific American*.
</font>

## <a name="22test-step1">Step 1: State the Hypotheses</a>

---

-   $H_0$: There is no difference in how much people order regardless of how the bill is split. ${\color{dodgerblue}{\mu_{\rm even} - \mu_{\rm control}=0}}.$
-   $H_a$: People order more when the bill is split evenly as opposed to when each person pays for what they order. ${\color{dodgerblue}{\mu_{\rm even} - \mu_{\rm control}>0}}.$


## <a name="22test-step2">Step 2: Collect Sample Data and Define a Test Statistic</a>

---

8 people volunteered to take part in the study.

-   4 people were randomly assigned to sit at a table where they were   told the bill would be evenly split between the 4 people.
-   4 people were randomly assigned to sit at a table where they were   told each person would pay only for what they order themselves.

The results of the study are given in the tables below:

| <font size=3>Even Split Group</font>  | <font size=3>Pay for what you order (control) Group </font>   |
|-------------------------|------------------------------|
| <font size=3>$\$15.00$, $\$8.00$, $\$8.75$, $\$13.17$</font> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | <font size=3>$\$8.50$ , $\$7.90$ , $\$10.85$, $\$7.43$</font> |
| <font size=3>$\bar{x}_{\rm even} = 11.23$</font> | <font size=3>$\bar{x}_{\rm control} = 8.67$</font>  |

-   **We can use the difference in the two sample means as a test   statistic:**

$${\color{dodgerblue}{T= \bar{x}_{\rm even} - \bar{x}_{\rm control} = 2.56}}$$

## <a name="22test-step3">Step 3: How Extreme Is the Observed Difference?</a>

---

The p-value is the probability of getting a difference in sample means
between the even-split and self-pay groups as or more extreme than the
observed test statistic,
$\bar{x}_{\rm even} - \bar{x}_{\rm control} = 2.56$. In this case
(one-tailed test), more extreme implies a difference that is even bigger
than $\$2.56$.

$${\color{dodgerblue}{\mbox{p-value} = P(\bar{x}_{\rm even} - \bar{x}_{\rm control} \geq 2.56 \ | \ H_0 \mbox{ is true} )}}$$

If $H_0$ is true, then $\mu_{\rm even} - \mu_{\rm control} =0$. Under
this assumption, the center of the sampling distribution for the
difference in sample means would be $0$. However, we do not know how
much variability there is due to the sampling. **Is a difference in
sample means equal to $\$2.56$ a “big” difference, or is the difference
within the margins we would expect due to the uncertainty of sampling?**

> How can we calculate the p-value if we do not know the underlying
> probability distribution for the test statistic
> $T = \mu_{\rm{even}} - \mu_{\rm{control}}$ if $H_0$ is true?

## <a name="22q1">Question 1</a>

---

If people order the same amount of food no matter how the bill is split
(assuming $H_0$ is true), we assume each person would have order the
same amount of food regardless of the table they were seated at.
**Splitting by groups based on how the bill is split is no different
than if the eight values were just randomly split into two groups of
four people.**

If each person would have ordered the same regardless of which table
they were seated at, then another possible sample could have been:

| <font size=3>Even Split Group</font>  | <font size=3>Pay for what you order (control) Group</font>  |
|----------------------------------|-----------------------------|
| <font size=3>$\mathbf{\color{dodgerblue}{\$7.43}}$ , $\$8.00$, $\$8.75$, $\$13.17$</font> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | <font size=3>$\$8.50$ , $\$7.90$ , $\$10.85$, $\mathbf{\color{dodgerblue}{\$15.00}}$</font> |

### <a name="22q1a">Question 1a</a>

---

What would be the test statistic for the two samples above?

#### <a name="22sol1a">Solution to Question 1a</a>

---

In [None]:
# find the test statistic for another possible sample


<br>  
<br>

### <a name="22q1b">Question 1b</a>

---

How many different ways can we divide the eight participants into two
groups of four?

#### <a name="22sol1b">Solution to Question 1b</a>

---

In [None]:
# how many possible ways can we create two groups of 4


<br>  
<br>

# <a name="22two-samp-perm">Two-Sample Permutation Tests</a>

---

To perform a <font color="dodgerblue">**two-sample permutation
test**</font> on data collected from two samples size $m$ and $n$:

1.  Pool the $m+n$ values together.
2.  Draw a <font color="dodgerblue">**permutation   resample**</font> of size $m$ (size of first sample), **without   replacement**.
3.  Use the remaining $n$ (size of second sample) observations for the   other sample.
4.  Calculate the difference in means or another statistic to compares   samples.
5.  Repeat the resampling process many times and construct the   distribution of test statistics.

<font color="dodgerblue">**The p-value is the proportion of times
the randomized statistics are as or more extreme than the observed
difference.**</font>


<figure>
<img
src="https://raw.githubusercontent.com/CU-Denver-MathStats-OER/Statistical-Theory/main/Images/22fig-meal-perm.png"
alt="Permutation resamples of the Unscrupulous Diners Dilemma Data" width = "95%"/>
<figcaption aria-hidden="true">
Permutation resamples of the Unscrupulous
Diners Dilemma Data
</figcaption>
</figure>


# <a name="22melanoma">Case Study: Melanoma Lesion Thickness</a>

---

<figure>
<img
src="https://upload.wikimedia.org/wikipedia/commons/d/d0/Superficial_spreading_melanoma_in_situ_on_dermoscopy%2C_reflectance_confocal_microscopy_and_histopathology_2.jpg"
alt="Superficial spreading melanoma" width = "55%"/>
<figcaption aria-hidden="true">Credit: Braga, Macedo, Pinto, et. al., <a
href="https://creativecommons.org/licenses/by/4.0">CC BY 4.0</a>, via
Wikimedia Commons
</figcaption>
</figure>


Skin is the largest organ in the human body. In the United States, skin
cancer is the most common cancer. Current estimates are that one in five
Americans will develop skin cancer in their lifetime<sup>3</sup>. There are
different types of skin cancers, and melanoma is one form of skin cancer
that is particularly dangerous if not detected early. However, if a skin
lesion is detected early, it can be surgically removed before it
spreads, and a patient generally has good long-term outcomes.

The data set `melanoma` in the `boot` package contains measurements from
a random sample of $205$ patients with malignant melanoma at the
University Hospital of Odense in Denmark. Each patient had a skin lesion
(or tumor) surgically removed and various attributes of the patient and
tumors are recorded. Run the code cell below to load the `boot` package
and summarize the variables in the `melanoma` data set.

<br>

<font size=2>3. American Academy of Dermatology, <https://www.aad.org/media/stats-skin-cancer>.</font>

In [None]:
library(boot)
summary(melanoma)

## <a name="22q2">Question 2</a>

---

Based on the `summary(melanoma)` output, give a possible statistical
question that could be analyzed using a hypothesis test. Which
variable(s) in the `melanoma` data set would be involved in your
analysis?

Run `?melanoma` to access the help documentation and learn more about
the data.

### <a name="22sol2">Solution to Question 2</a>

---

<br>  
<br>  
<br>

## <a name="22q3">Question 3</a>

---

Which variables in `melanoma` are categorical? Are those variables being
stored as categorical variables? If so, explain how you can tell. If
not, in the code cell below, convert the categorical variables to a
`factor`.

### <a name="22sol3">Solution to Question 3</a>

---

In [None]:
# if needed, convert each categorical variable to a factor


<br>  
<br>

## <a name="22create-pool">Creating a Pooled Sample</a>

---

One possible question we could ask: *“Is the mean tumor thickness
greater for all people with fatal melanoma compared to the mean tumor
thickness for all people that survive melanoma?”* The two variables of
interest are `status` and `thickness`.

-   `status`: The patients status at the end of the study. `1` indicates   that they had died from melanoma, `2` indicates that they were still   alive and `3` indicates that they had died from causes unrelated to   their melanoma.
-   `thickness`: Tumor thickness in millimeters (mm).

For our study, we are comparing two independent populations:

-   People that have a melanoma tumor removed and survive. This is   `status` group `2`.
-   People that have a melanoma tumor removed and died from melanoma.   This is `status` group `1`.
-   The people in `status` group `3` we exclude from this analysis.

## <a name="22q3">Question 3</a>

---

Answer the questions below to state our hypotheses, organize our data,
and calculate a test statistic to help determine whether the mean tumor
thickness is greater for all people with fatal melanoma compared to the
mean tumor thickness for all people that survive melanoma?

### <a name="22q3a">Question 3a</a>

---

Create side-by-side box plots to display the distribution of tumor
thickness for each of the three `status` groups `1` (died from
melanoma), `2` (survived), and `3` (died from other causes).

#### <a name="22sol3a">Solution to Question 3a</a>

---

Fill in the `boxplot()` command to answer the question.

In [None]:
boxplot(??)

<br>  
<br>

### <a name="22q3b">Question 3b</a>

---

The function `tapply(data, index, function)` has three inputs:

-   The `data` is the data you want to summarize.
-   The `index` is a categorical feature that will split the data into   two or more different classes or factors.
-   The `function` is some function that you want to apply to the data.

Interpret the output from the code cell below.

In [None]:
# run code and interpret output below
tapply(melanoma$thickness, melanoma$status, length)  # nothing to edit

#### <a name="22sol3b">Solution to Question 3b</a>

---

<br>  
<br>  
<br>

### <a name="22q3c">Question 3c</a>

---

We would like to use the data in `melanoma` to test the if the mean
tumor thickness is greater for all people with fatal melanoma compared
to the mean tumor thickness for all people that survive melanoma? State
the corresponding null and alternative hypotheses for this test. Be sure
to state each hypothesis both in words and using mathematical notation.

#### <a name="22sol3c">Solution to Question 3c</a>

---

-   $H_0$:

-   $H_a$:

<br>  
<br>

### <a name="22q3d">Question 3d</a>

---

Using the code cell below, subset the `melanoma` data to create three
different vectors:

1.  The vector `died` is the `thickness` values for `status` group 1.
2.  The vector `survived` is the `thickness` values for `status` group   2.
3.  The vector `pooled` is the `thickness` values for both `status`   groups 1 and 2 (excluding group 3).

The logical operator `==` means “is equal to” while the logical operator
`!=` means “is not equal to”.

The option `drop = TRUE` means the output will be a vector of numerical
values as opposed to a data frame that has a variable with the name
`thickness`. For example, if we want to calculate the sample mean for
the `pooled` data:

-   Since the data is stored as a vector, we use `mean(pooled)`. There   are no headers in vectors.
-   If the data is stored as a data frame, we do need to use headers and   have `mean(pooled$thickness)`.

The output of the code cell below are vectors, so we do not use the
`$var_name` convention when referring to the sample data which helps
simplify the code a little.

#### <a name="22sol3d">Solution to Question 3d</a>

---

Replace each of the 9 `??` with an appropriate value or variable name.

In [None]:
died <- subset(melanoma, select = ??, ?? == "??", drop = TRUE)
survived <- subset(melanoma, select = ??, ?? == "??", drop = TRUE)
pooled <- subset(melanoma, select = ??, ?? != "??", drop = TRUE)

<br>  
<br>

### <a name="22q3e">Question 3e</a>

---

Using the vectors `died`, `survived`, and `pooled` from [Question 3d](#22q3d),
calculate the sample size and the sample mean for each of the samples
`died`, `survived`, and `pooled`.

#### <a name="22sol3e">Solution to Question 3e</a>

---

In [None]:
n.died <- ??  # size of sample of melanoma deaths
n.survived <- ??  # size of sample of survivors
n.pooled <- ??  # size of both samples pooled together
xbar.died <- ??  # mean thickness of sample that died
xbar.survived <- ??  # mean thickness of sample that survived
xbar.pooled <- ??  # mean thickness of pooled sample

# print each result to screen
n.died
n.survived
n.pooled
xbar.died
xbar.survived
xbar.pooled

<br>  
<br>

### <a name="22q3f">Question 3f</a>

---

Based on the output in [Question 3e](#22q3e), calculate, store, and print the
test statistic for this test to the screen.

#### <a name="22sol3f">Solution to Question 3f</a>

---

In [None]:
test.stat <- ??  # compute test statistic
test.stat  # print test statistic to screen

<br>  
<br>

# <a name="22perm-dist">Constructing a Permutation Distribution</a>

---

## <a name="22perm-step1">Step 1: Create a Vector of Pooled Data</a>

---

See the code used to create `pooled` in [Question 3d](#22q3d).

## <a name="22perm-step2">Step 2: Create Resamples for Each Treatment Group

---

For this step, it is important to note the size of each original sample.
From [Question 3e](#22q3e) we know

-   The sample `died` consists of 57 observations.
-   The sample `survived` consists of 134 observations.
-   The `pooled` sample consists of $57 + 134= 191$ observations.

#### <a name="22step2-index">Create an Index Vector</a>

---

We first create a vector called `index` that selects (without
replacement) 57 random integers out of the integers
$1, 2, \ldots , 191$.

In [None]:
set.seed(3021)  # fix the randomization seeding
index <- sample(191, size = 57, replace = FALSE)  # these are the 57 observations chosen for resample 1

The command `head(index)` shows the first 6 values in the vector
`index`.

In [None]:
head(index)

The first `index` value 55 means observation 55 from the `pooled` vector
is the first thickness value randomly assigned to the `died` sample.
From the code cell below, we see the 55th observation in `pooled` is a
thickness of $0.81$ mm.

In [None]:
pooled[55]

The next `index` value 164 means observation 164 from the `pooled`
vector is the second thickness value randomly assigned to the `died`
sample. From the code cell below, we see the 164th observation in
`pooled` is a thickness of $0.65$ mm.

In [None]:
pooled[164]

#### <a name="22step2-resample1">Use `index` to Select Resample of Group 1</a>

---

The `index` vector is a vector of integers that tells us which values in
`pooled` are randomly assigned to the `died` resample. However, `index`
does not contain the corresponding thicknesses of the selected
observations. The vector `pooled[index]` will contain the tumor
thicknesses for all 57 randomly selected observations picked in `index`.

In [None]:
died.resample <- pooled[index]

#### <a name="22step2-resample2">Use `-index` to Select the Remaining Values for Resample Group 2</a>

---

-   The vector `index` consists of 57 randomly selected integers out of   the integers $1, 2, \ldots , 191$.
-   The vector `-index` contains the remaining 134 integers that were   not selected for `index`.
-   The vector `pooled[-index]` will contain thicknesses for the   remaining 134 observations in resample group 2.

In [None]:
survived.resample <- pooled[-index]

## <a name="22perm-step3">Step 3: Calculate the Test Statistic for the Resamples</a>

---

Calculate the difference in the sample means between the two resamples.

In [None]:
perm.stat <- mean(died.resample) - mean(survived.resample)
perm.stat

## <a name="22perm-step4">Step 4: Repeat this Many Times to Construct a Permutation Distribution</a>

---

-   In practice, it takes a lot of time and energy (and money) to   generate all possible resamples (without any duplicate resamples).
    -   Instead we'll generate a lot of resamples rather than all       possible resamples.
-   We'll use $N=10^5-1=99,\!999$ as the default number of resamples.   <font color="tomato">**Why use 99,999 resamples?**</font>
    -   We may not generate the original sample as one of the resamples.
    -   We want to be sure that we do include the original sample when       we calculate the p-value.
    -   We will add the original sample back in with the $99,\!999$       resamples giving $100,\!000$ samples.
-   The resulting distribution of test statistics of the permutation   resamples is called a <font color="dodgerblue">**permutation   distribution**</font>.
-   We use the permutation distribution as an estimate for the null   distribution to compute the p-value.

## <a name="22q4">Question 4</a>

---

Complete the first code cell below to create a permutation distribution
for the difference in sample mean tumor thickness. Be sure you have
already created the vector `pooled` in [Question 3d](#22q3d) and stored
`test.stat` in [Question 3f](#22q3f).

After generating a permutation distribution, run the second code cell
below to create a histogram to display the distribution of resample
statistics along with a red vertical line through the observed test
statistic. There is nothing to edit in the second code cell.

### <a name="22sol4">Solution to Question 4</a>

---

In [None]:
##########################################
# save all 99,999 permutation resample
# statistics to the vector perm.stat
##########################################
N <- 10^5 - 1
perm.stat <- numeric(N)

for (i in 1:N)
{
  index <- sample(??, size = ??, replace = ??)  # create index vector
  x.died <- ??  # use index to select died resample
  x.survived <- ??  # the rest of the values go to survived resample
  perm.stat[i] <- ??  # calc difference in sample means
}

In [None]:
##################################################
# plot permutation distribution as a histogram
# mark observed test stat with red vertical line
##################################################
hist(perm.stat, xlab = "xbar.died - xbar.survived",
     main = "Permutation Distribution",
     cex.lab=1.5, cex.axis=1.5, cex.main=1.5)  # increase font size on labels
abline(v = test.stat, col = "red")

# <a name="22p-value">p-values with Permutation Distributions</a>

---

We use the permutation distribution as an estimate for the null
distribution. The p-value is the proportion of resampled test statistics
that are as or more extreme than the observed test statistic. We can use
a logical test to help compute this proportion.

## <a name="22q5">Question 5</a>

---

How likely is it to get resamples with a difference in means as or more
extreme than the observed test statistic? Explain what the code below is
doing in practical terms.

In [None]:
p.value <- sum(perm.stat >= test.stat)/N
p.value

### <a name="22sol5">Solution to Question 5</a>

---

Interpret the code cell above.

<br>  
<br>  
<br>

## <a name="22q6">Question 6</a>

---

The calculation from [Question 5](#22q5) used the $N=10^5-1=99,\!999$
permutation resamples, but recall we want to be sure to include the
original, observed sample when computing the p-value. Explain how the
code cell below accomplishes this goal.

In [None]:
p.value <- (sum(perm.stat >= test.stat) + 1) / (N + 1)
p.value

### <a name="22sol6">Solution to Question 6</a>

---

<br>  
<br>  
<br>

## <a name="22q7">Question 7</a>

---

Interpret the practical meaning of the p-value from [Question 6](#22q6) to a
person who is not very familiar with statistics.

### <a name="22sol7">Solution to Question 7</a>

---

<br>  
<br>  
<br>

# <a name="22perm-var">Permutation Test for a Difference in Variances</a>

---

## <a name="22q8">Question 8</a>

---

Ulceration is a breakdown of the skin over the melanoma tumor. Using the
data set `melanoma` from the `boot` package, perform a permutation test
to see if the variance of the tumor thickness for ulcerated tumors is
different from the variance of the tumor thickness for non-ulcerated
tumors.

The variable `ulcer` indicates whether the removed tumor was ulcerated
(`ulcer` group `1`) or not ulcerated (`ulcer` group `0`).

### <a name="22q8a">Question 8a</a>

---

Write out the null and alternative hypotheses using appropriate
notation.

#### <a name="22sol8a">Solution to Question 8a</a>

---

-   $H_0$:

-   $H_a$:

<br>  
<br>  
<br>

### <a name="22q8b">Question 8b</a>

---

What can we use as the test statistic? What is the value of the observed
test statistic?

#### <a name="22sol8b">Solution to Question 8b</a>

---

In [None]:
test.stat2 <- ??  # compute observed test statistic
test.stat2  # print output to screen

<br>  
<br>

### <a name="22q8c">Question 8c</a>

---

Create a permutation distribution for the difference in sample
variances.

#### <a name="22sol8c">Solution to Question 8c</a>

---

In [None]:
# nothing to edit in this cell
pooled2 <- melanoma$thickness  # create pooled vector of all tumor thicknesses

In [None]:
# Save resamples to vector called result
N <- 10^5 - 1
result <- numeric(N)

# Create permutation distribution
for (i in 1:N)
{
  index <- sample(??, size = ??, replace = ??)
  result[i] <- ??
}

# Display permutation distribution and observed sample diff
hist(result, xlab = "diff in sample variances",
     main = "Permutation Distribution",
     cex.lab=1.5, cex.axis=1.5, cex.main=1.5)  # increase font size on labels
abline(v = c(-test.stat2, test.stat2), col = c("blue", "red"))

<br>  
<br>

### <a name="22q8d">Question 8d</a>

---

Calculate the p-value of the observed test statistic and interpret its
meaning in practical terms.

#### <a name="22sol8d">Solution to Question 8d</a>

---

In [None]:
# compute the p-value


<br>  
<br>  
<br>

# <a name="22perm-prop">Permutation Test for a Difference in Proportions</a>

---

## <a name="22q9">Question 9</a>

---

Is the proportion of females with ulcerated tumors less than the
proportion of males with ulcerated tumors?

### <a name="22q9a">Question 9a</a>

---

Write out the null and alternative hypotheses using appropriate
notation.

#### <a name="22sol9a">Solution to Question 9a</a>

---

-   $H_0$:

-   $H_a$:

<br>  
<br>  
<br>

### <a name="22q9b">Question 9b</a>

---

What can we use as the test statistic? What is the value of the observed
test statistic?

#### <a name="22sol9b">Solution to Question 9b</a>

---

In [None]:
# original ulceration data for female sample
female <- subset(melanoma, select = "ulcer", sex == "0", drop = TRUE)

# original ulceration data for male sample
male <- subset(melanoma, select = "ulcer", sex == "1", drop = TRUE)

# original ulceration data for both samples pooled together
pooled.sex <- melanoma$ulcer

# enter a formula to compute the test statistic
test.diff.prop <- ??

<br>  
<br>

### <a name="22q9c">Question 9c</a>

---

Create a permutation distribution for the difference in sample
proportions.

#### <a name="22sol9c">Solution to Question 9c</a>

---

Complete the code cell below.

In [None]:
N <- 10^5 - 1
result.prop <- numeric(N)

for (i in 1:N)
{
  index <- sample(??, size = ??, replace = ??)
  result.prop[i] <- ??
}

hist(result.prop, xlab = "phat1-phat2",
     main = "Permutation Distribution",
     cex.lab=1.5, cex.axis=1.5, cex.main=1.5)  # increase font size on labels
abline(v = test.diff.prop, col = "red")

<br>  
<br>

### <a name="22q9d">Question 9d</a>

---

Calculate the p-value of the observed test statistic and interpret its
meaning in practical terms.

#### <a name="22sol9d">Solution to Question 9d</a>

---

In [None]:
# compute the p-value


<br>  
<br>  
<br>

# <a name="22perm-match">Permutation Test for Matched Pairs</a>

---

## <a name="22q10">Question 10</a>

---

In this example, we use data collected from a matched pair designed
study to determine whether smoking during pregnancy is associated with
lower birth weight. In our study, we solicit volunteers that have
already given birth to two babies. During one of the pregnancies, the
parent smoked. During the other pregnancy, they did not smoke. Below is
hypothetical data from such a study. A sample of $n=10$ people volunteer
to share their data with the researchers from which we have 10 different
pairs of birth weights (in grams) summarized in the table below.

|            |  <font size=3>1</font>    | <font size=3>2</font>    | <font size=3>3</font>    | <font size=3>4</font>    | <font size=3>5</font>    | <font size=3>6</font>    | <font size=3>7</font>    | <font size=3>8</font>    | <font size=3>9</font>    | <font size=3>10</font>  |
|------------|------|------|------|------|------|------|------|------|------|------|
| <font size=3>No Smoking</font> | <font size=3>2750</font> | <font size=3>2920</font> | <font size=3>3860</font> | <font size=3>3402</font> | <font size=3>2282</font> | <font size=3>3790</font> | <font size=3>3586</font> | <font size=3>3487</font> | <font size=3>2920</font> | <font size=3>2835</font> |
| <font size=3>Smoked</font>    | <font size=3>1790</font> | <font size=3>2381</font> | <font size=3>3940</font> | <font size=3>3317</font> | <font size=3>2125</font> | <font size=3>2665</font> | <font size=3>3572</font> | <font size=3>3156</font> | <font size=3>2721</font> | <font size=3>2225</font> |

### <a name="22q10a">Question 10a</a>

---

If researchers are testing to see whether a baby born to a parent that
smokes while pregnant is less, on average, at birth compared to a baby
born by the same parent when they do not smoke while pregnant. Write out
the null and alternative hypotheses using appropriate notation.

#### <a name="22sol10a">Solution to Question 10a</a>

---

-   $H_0$:

-   $H_a$:

<br>  
<br>  
<br>

### <a name="22q10b">Question 10b</a>

---

What can we use as the test statistic? What is the value of the observed
test statistic?

#### <a name="22sol10b">Solution to Question 10b</a>

---

In [None]:
# data from study
no <- c(2750, 2920, 3860, 3402, 2282,
        3790, 3586, 3487, 2920, 2835)  # non-smoking births weights

smoker <- c(1790, 2381, 3940, 3317, 2125,
            2665, 3572, 3156, 2721, 2225)  # matching smoking birth weight

diff <- no - smoker  # differences between matched pairs

# Calculate the observed test statistic


<br>  
<br>

### <a name="22q10c">Question 10c</a>

---

When we construct a permutation distribution for the sample mean
difference between matched pairs, we want to be sure the resampling we
use **preserves each pairing**.

-   We do not randomize how the pairs are formed.
-   Each pair of values should remain paired after resampling.
-   Instead, we randomly assign values in each pair to the smoker and   non-smoker positions.

For example, our original sample of matched pair differences is

|            |  <font size=3>1</font>    | <font size=3>2</font>    | <font size=3>3</font>    | <font size=3>4</font>    | <font size=3>5</font>    | <font size=3>6</font>    | <font size=3>7</font>    | <font size=3>8</font>    | <font size=3>9</font>    | <font size=3>10</font>  |
|------------|------|------|------|------|------|------|------|------|------|------|
| <font size=3>No Smoking</font> | <font size=3>2750</font> | <font size=3>2920</font> | <font size=3>3860</font> | <font size=3>3402</font> | <font size=3>2282</font> | <font size=3>3790</font> | <font size=3>3586</font> | <font size=3>3487</font> | <font size=3>2920</font> | <font size=3>2835</font> |
| <font size=3>Smoked</font>    | <font size=3>1790</font> | <font size=3>2381</font> | <font size=3>3940</font> | <font size=3>3317</font> | <font size=3>2125</font> | <font size=3>2665</font> | <font size=3>3572</font> | <font size=3>3156</font> | <font size=3>2721</font> | <font size=3>2225</font> |
| <font size=3>Difference</font> | <font size=3>960</font>  | <font size=3>539</font> | <font size=3>-80</font>  | <font size=3>85</font>   | <font size=3>157</font>  | <font size=3>1125</font> | <font size=3>14</font>   | <font size=3>331</font>  | <font size=3>199</font>  | <font size=3>610</font>  |

One possible resample is given below.

|            |  <font size=3>1</font>    | <font size=3>2</font>    | <font size=3>3</font>    | <font size=3>4</font>    | <font size=3>5</font>    | <font size=3>6</font>    | <font size=3>7</font>    | <font size=3>8</font>    | <font size=3>9</font>    | <font size=3>10</font>  |
|------------|------|------|------|------|------|------|------|------|------|------|
| <font size=3>No Smoking <br> Resample</font> | <font color="tomato" size=3>1790</font> | <font size=3>2920</font> | <font size=3>3860</font> | <font color="tomato" size=3>3317</font> | <font size=3>2282</font> | <font color="tomato" size=3>2665</font> | <font size=3>3586</font> | <font size=3>3487</font> | <font size=3>2920</font> | <font size=3>2835</font> |
| <font size=3>Smoked <br> Resample</font>    | <font color="tomato" size=3>2750</font> | <font size=3>2381</font> | <font size=3>3940</font> | <font color="tomato" size=3>3402</font> | <font size=3>2125</font> | <font color="tomato" size=3>3790</font> | <font size=3>3572</font> | <font size=3>3156</font> | <font size=3>2721</font> | <font size=3>2225</font> |
| <font size=3>Difference <br> Resample</font> | <font color="tomato" size=3>-960</font>  | <font size=3>539</font> | <font size=3>-80</font>  | <font color="tomato" size=3>-85</font>   | <font size=3>157</font>  | <font color="tomato" size=3>-1125</font> | <font size=3>14</font>   | <font size=3>331</font>  | <font size=3>199</font>  | <font size=3>610</font>  |


To create a permutation distribution for the sample mean difference
between matched pairs, <font color="dodgerblue">**we randomly
choose a sign (positive or negative) for each observed matched-pair
difference.**</font> Complete the code cell below to generate a
permutation distribution for the sample mean difference between matched
pairs.

#### <a name="22sol10c">Solution to Question 10c</a>

---

Complete and run the code cell below to create a permutation
distribution.

In [None]:
N <- 10^5-1
perm.match <-numeric(N)

# for each pair, randomly assign the difference to be positive or negative.
# then calculate the new mean of the paired differences
for (i in 1:N)
{
  sign <-sample(c(-1,1), size = ??, replace = ??) # random choose a sign -1 or 1
  diff.resample <- sign * diff
  perm.match[i] <- ??
}

There is nothing to edit in the code cell below. Run the code cell below
to plot the permutation distribution and test statistic.

In [None]:
# create a histogram of the permutation distribution
# and add a vertical line at the observed test statistic
hist(perm.match,  xlab = "xbar-diff",
     main = "Permutation Distribution",
     cex.lab=1.5, cex.axis=1.5, cex.main=1.5)  # increase font size on labels
abline(v = test.match, col ="red")

<br>  
<br>

### <a name="22q10d">Question 10d</a>

---

Calculate the p-value of the observed test statistic and interpret its
meaning in practical terms.

#### <a name="22sol10d">Solution to Question 10d</a>

---

In [None]:
# compute the p-value


<br>  
<br>

## <a name="22q11">Question 11</a>

---

**Is there a difference in the price of groceries sold by Target and
Walmart?** The data set `Groceries` in the `resampledata` package
contains a sample of $n=24$ different grocery items and a pair prices
(price at Target and price at Walmart) advertised on their respective
websites on a specific day.

-   First we load the `resampledata` package.

In [None]:
library(resampledata)  # load resampledata package

-   Then we print the first six rows of the `Groceries` data.
-   Notice this is matched pairs data!

If you received an error when running the code cell below, it is
possible you do not have the `resampledata` package installed. From the
R console, run the command `install.packages("resampledata")` to first
install the `resampledata` packaged. Run the `library(resampledata)`
command in the code cell above again. Then try running the code cell
below again.

In [None]:
head(Groceries)

-   Next, we save corresponding Target and Walmart prices to separate   vectors `target` and `walmart`.
-   The first six values in each vector are printed to the screen.
-   Notice the ordering of the values in each vector is very important   to preserve.

In [None]:
target <- Groceries$Target
walmart <- Groceries$Walmart
head(target)
head(walmart)

Using the sample data stored in `target` and `walmart`, answer the
questions below to perform a permutation test.

1.  Set up hypotheses to test whether there a difference in the price of   groceries sold by Target and Walmart?

2.  What is the observed test statistic?

3.  Create a permutation distribution for the sample mean difference   between matched pairs.

4.  Calculate the p-value.

5.  Interpret the meaning of the p-value.

### <a name="22sol11">Solution to Question 11</a>

---

<br>  
<br>  
<br>


# <a name="22CC License">Creative Commons License Information</a>
---

![Creative Commons
License](https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png)

*Statistical Methods: Exploring the Uncertain* by [Adam
Spiegler (University of Colorado Denver)](https://github.com/CU-Denver-MathStats-OER/Statistical-Theory)
is licensed under a [Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International
License](http://creativecommons.org/licenses/by-nc-sa/4.0/). This work is funded by an [Institutional OER Grant from the Colorado Department of Higher Education (CDHE)](https://cdhe.colorado.gov/educators/administration/institutional-groups/open-educational-resources-in-colorado).

For similar interactive OER materials in other courses funded by this project in the Department of Mathematical and Statistical Sciences at the University of Colorado Denver, visit <https://github.com/CU-Denver-MathStats-OER>.