In [2]:
suppressPackageStartupMessages(library(rstanarm))
suppressPackageStartupMessages(library(ggformula))
library(tibble)
suppressPackageStartupMessages(library(glue))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(modelr))
suppressPackageStartupMessages(library(rstan))
library(stringr)
library(tidyr)

In [3]:
# Set the maximum number of columns and rows to display
options(repr.matrix.max.cols=150, repr.matrix.max.rows=200)
# Set the default plot size
options(repr.plot.width=18, repr.plot.height=12)
rstan_options(auto_write=TRUE)
options(mc.cores = parallel::detectCores())

In [4]:
download_if_missing <- function(filename, url) {
    if (!file.exists(filename)) {
        dir.create(dirname(filename), showWarnings=FALSE, recursive=TRUE)
        download.file(url, destfile = filename, method="curl")
    }
}

# Sample size calculations for estimating a proportion

## Sample size for a main effect

How large a sample survey would be required to estimate, to within a standard error of &pm;3%, the proportion of the U.S. population who support the death penalty?

## Sample size for an interaction

About 14% of the U.S. population is Latino.
How large would a national sample of Americans have to be in order to estimate, to within a standard error of &pm;3%, the proportion of Latinos in the United States who support the death penalty?

## Sample size with increasing precision

How large would a national sample of Americans have to be in order to estimate, to within a standard error of &pm;1%, the proportion who are Latino?

# Sample size calculation for estimating a difference

Consider an election with two major candidates, A and B, and a minor candidate, C, who are believed to have support of approximately 45%, 35%, and 20% in the population.
A poll is conducted with the goal of estimating the difference in support between candidates A and B.
How large a sample would you estimate is needed to estimate this difference to within a standard error of 5 percentage points.
(Hint: consider an outcome variable that is codes as +1, -1, and 0 for supporters of A, B, and C, respectively).

# Power

Following Figure 16.3, determine the power (The probability of getting an estimate that is "statistically significantly" different from zero at the 5% level) of a study where the true effect size is X standard errors from zero.
Answer for the following values of X: 0, 1, 2, and 3.

# Power, type M error, and type S error

Consider the experiment shown in Figure 16.1 where the true effect could not realistically be more than 2 percentage points and it is estimated with a standard error of 8.1 percentage points.

## Power at 2% effect size

Assuming the estimate is unbiased and normally distributed and the true effect size is 2 percentage points, use simulation to answer the following questions:

* What is the power of this study?
* What is the type M error rate?
* What is the type S error rate?

## Power

Assuming the estimate is unbiased and normally distributed and the true effect size is *no more than* 2 percentage points in absolute value, what can you say about the power, type M error rate, and type S error rate?

# Design analysis for an experiment

You conduct an experiment in which half the people get a special get-out-the-vote message and others do not. Then you follow up after the election with a random sample of 500 people to see if they voted.

## Estimating standard error

What will be the standard error of your estimate of effect size? Figure this out making reasonable assumptions about voter turnout and the true effect size?

## Impact of assumptions

Check how sensititve your standard error calculation is to your assumptions.

## Conducting research

For a range of plausible effect sizes, consider conclusions from this study, in light of the statistical significance filter. As a reasearcher, how can you avoid this problem?

# Design analysis with pre-treatment information

A new teaching method is hoped to increase scores by 5 points of a certain standardizes test.
An experiment is performed on $n$ students, where half get this intervention and half get the control.
Suppose that the standard deviation of test scored in the population is 20 points.
Further suppose that a pre-test is available which has a correlation of 0.8 with the post-test under the constrol condition.
What will be the standard error of the estimated treatment effect based on a fitted regression, assuming that the treatment effect is constant and independent of the value of the pre-test?

# Decline effect

After a study is publiched on the effect of some treatment or intervention, it is common for the efstimated effect in future studies to be lower.
Give five reasons why you might expect this to happen.

# Effect size and sample size

Consider a toxin that can be tested on animals at different doses.
Suppose a typical exposure level for humans is 1 (in some units), and at this level the toxin is hypothesized to introduce a risk of 0.01% of death per person.

## Sample with effect size

Consider different animal studies, each time assuming a linear dose-response relation (that is, 0.01% risk of death per animal per unit of the toxin), with does of 1, 100, and 10&thinsp;000.
At each of these exposure levels, what is the sample size needed to have 80% power of detecting the effect?

## Sample with logarithmic response

This time assume that response is a logged function of dose and redo the calculations.

# Cluster sampling with equal-sized clusters

A survey is being planned with the goal of interviewing $n$ people in some number $J$ of clusters.
For simplicity, assume simple random sampling of clusters and a simple random sample of site $n/J$ (appropriately rounded) within each sampled cluster.

Consider inferences for the proportion of Yes responses in the population for some question of interest.
The estimate will simply be the average response for the $n$ people in the sample.
Suppose that the true proportion of Yes responses is not too far from 0.5 and that the standard deviation amoung the mean responses of clusters is 0.1.

## Sample error with cluster size

Suppose the total sample size is $n=1000$.
What is the standard error for the sample average if $J=1000$?
What if $J = 100, 10, 1$?

## Optimal number of clusters

Suppose the cost of the survey is \\$50 per interview, plus \\$500 per cluster.
Further suppose that the goal is to estimate the proportion of Yes responses in the population with a standard error of no more than 2%.
What values of $n$ and $J$ will achieve this at the lowest cost?

# Simulation for design analysis

The folder [ElectricCompany](https://github.com/avehtari/ROS-Examples/tree/master/ElectricCompany/) contains data from the Electric Company experiment analyzed in Chapter 19.
Suppose you wanted to perform a new experiment under similar conditions, but for simplicity just for second graders, with the goal of having 80% power to find a statistically significant result (at the 5% level) in grade 2.

In [6]:
filename <- "./data/ElectricCompany/electric.csv"

download_if_missing(filename,
                    'https://raw.githubusercontent.com/avehtari/ROS-Examples/master/ElectricCompany/data/electric.csv')

electric <- read.csv(filename)

In [7]:
electric %>% head()

Unnamed: 0_level_0,X,post_test,pre_test,grade,treatment,supp,pair_id
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<int>,<int>,<int>,<int>
1,1,48.9,13.8,1,1,1,1
2,2,70.5,16.5,1,1,0,2
3,3,89.7,18.5,1,1,1,3
4,4,44.2,8.8,1,1,0,4
5,5,77.5,15.3,1,1,1,5
6,6,84.7,15.0,1,1,0,6


## Assumptions

State clearly the assumptions you are making for your design calculations.
(Hint: you can set the numerical values for these assumptions based on the analysis of the existing Electric Company data)

## Sample size for average comparisons

Suppose that the data will be analyzsed by simply comparing the average scores for the treated classrooms to the average scores for the controls.
How many classrooms would be needed for 80% power?

## Sample size for gain

Repeat the analysis, but supposing that the new data will be analyzed by comparin gthe average gain scored for the treated classrooms to the average gain scores of the controls.

## Sample size for regression

Repeat, but supposing that the new data will be analyzed by regression, adjusting for pre-test scores as well as the treatment indicator.

# Optimal design

## Power at minimal cost

Suppose that the zinc study described in Section 16.3 would cost \\$150 for each treated child and \\$100 for each control.
Under the assumptions given in that section, determine the number of control and treated children needed to attain 80% power at minimal total cost.
You will need to set up a loop of simulations as illustrated for the example in the text.
Asssume that the number of measurements per child is fixed at $K=7$ (that is, measuring every two months for a year).

## Impact of changing design parameter

Make a generalization of Figure 16.1 with several lines corresponding to different values of the design parameter $K$, the number of measurements for each child.

# Experiment with pre-treatment information

An intervention is hoped to increase voter turnout in a local election from 20% to 25%.

## Sample size for estimating treatment effect

In a simple randomized experiment, how large a sample size would be needed so that the standard error of the estimated treatement effect is less than 2 percentage points?

## Impact of a pre-treatment indicator

Now suppose that previous voter turnout was known for all participants in the experiment.
Make a reasonable assumption about the correlation between turnout in two successibe elections.
Uner this assumption, how much would the standard error decrease if previous voter turnout was included as a pre-treatment predictor in a regression to estimate the treatment effect?

# Sample size calculations for main effects and interactions

In causal inference, it is often important to study varying treatment effects: for example, a treatment could be more effective for men than for women, or for healthy than for unhealthy patients.
Suppose a study is designed to have 80% power to detect a main effect at a 95% confidence level.
Further suppose that interactions of interest are half the size of main effects.

## Power for interaction

What is its power for detecting an interaction , comparing men to women (say) in a study that is half men and half women?

## Type M error

Suppose 1000 studies of this size are performed.
How many of the studies would you expect to report a statistically signficiant" interaction?
Of these, what is the expectation of the ratio of the estimated effect size to actual effect size?