Updated week 8 lecture materials.

SML201 · Mar 30, 2016 · ceb7601 · ceb7601
1 parent c7f70a1
commit ceb7601
Show file tree

Hide file tree

Showing 5 changed files with 64 additions and 28 deletions.
diff --git a/week8/week8.R b/week8/week8.R
@@ -32,6 +32,9 @@ ggplot(data=df) +
 library(BSDA)
 str(z.test)
 
+## ---- display=FALSE------------------------------------------------------
+set.seed(210)
+
 ## ------------------------------------------------------------------------
 n <- 40
 lam <- 14
@@ -42,6 +45,7 @@ z.test(x=x, sigma.x=stddev, mu=lam)
 
 ## ------------------------------------------------------------------------
 lam.hat <- mean(x)
+lam.hat
 stderr <- sqrt(lam.hat)/sqrt(n)
 lam.hat - abs(qnorm(0.025)) * stderr # lower bound
 lam.hat + abs(qnorm(0.025)) * stderr # upper bound
@@ -135,14 +139,16 @@ htwt %>% group_by(sex) %>% summarize(sd(height))
 t.test(x = m_ht$height, y = f_ht$height, var.equal = TRUE)
 
 ## ------------------------------------------------------------------------
-htwt <- htwt %>% mutate(diffwt = (weight - repwt), diffht = (height - repht))
+htwt <- htwt %>% mutate(diffwt = (weight - repwt), 
+                        diffht = (height - repht))
 t.test(x = htwt$diffwt) %>% tidy()
 t.test(x = htwt$diffht) %>% tidy()
 
 ## ------------------------------------------------------------------------
 t.test(x=htwt$weight, y=htwt$repwt, paired=TRUE) %>% tidy()
 t.test(x=htwt$height, y=htwt$repht, paired=TRUE) %>% tidy()
-htwt %>% select(height, repht) %>% na.omit() %>% summarize(mean(height), mean(repht))
+htwt %>% select(height, repht) %>% na.omit() %>% 
+  summarize(mean(height), mean(repht))
 
 ## ------------------------------------------------------------------------
 str(binom.test)

diff --git a/week8/week8.Rmd b/week8/week8.Rmd
@@ -158,9 +158,9 @@ $$z = \frac{\hat{\lambda} - \lambda_0}{\sqrt{\frac{\hat{\lambda}}{n}}} \mbox{ an
 
 where $Z^*$ is a Normal$(0,1)$ random variable.
 
-## Two-Sided CIs and HTs
+## One-Sided CIs and HTs
 
-The two-sided versions of these approximate confidence intervals and hypothesis tests work analogously.
+The one-sided versions of these approximate confidence intervals and hypothesis tests work analogously.
 
 The procedures shown for the $\mbox{Normal}(\mu, \sigma^2)$ case with known $\sigma^2$ from last week are utilzied with the appropriate subsitutions as in the above examples. 
 
@@ -215,7 +215,7 @@ Let $X_1, X_2, \ldots, X_{n_1}$ be iid $\mbox{Poisson}(\lambda_1)$ and $Y_1, Y_2
 We have $\hat{\lambda}_1 = \overline{X}$ and $\hat{\lambda}_2 = \overline{Y}$.  For large $n_1$ and $n_2$, it approximately holds that:
 
 $$ 
-\frac{\hat{\lambda}_1 - \hat{\lambda}_2 - (\lambda_1 - \lambda_2)}{\sqrt{\frac{\lambda_1}{n_1} + \frac{\lambda_2}{n_2}}} \sim \mbox{Normal}(0,1).
+\frac{\hat{\lambda}_1 - \hat{\lambda}_2 - (\lambda_1 - \lambda_2)}{\sqrt{\frac{\hat{\lambda}_1}{n_1} + \frac{\hat{\lambda}_2}{n_2}}} \sim \mbox{Normal}(0,1).
 $$
 
 ## Normal (Unequal Variances)
@@ -297,6 +297,10 @@ str(z.test)
 
 Apply `z.test()`:
 
+```{r, display=FALSE}
+set.seed(210)
+```
+
 ```{r}
 n <- 40
 lam <- 14
@@ -312,6 +316,7 @@ Confidence interval:
 
 ```{r}
 lam.hat <- mean(x)
+lam.hat
 stderr <- sqrt(lam.hat)/sqrt(n)
 lam.hat - abs(qnorm(0.025)) * stderr # lower bound
 lam.hat + abs(qnorm(0.025)) * stderr # upper bound
@@ -568,18 +573,24 @@ t.test(x = m_ht$height, y = f_ht$height, var.equal = TRUE)
 
 ## Paired Sample Test (v. 1)
 
+First take the difference between the paired observations. Then apply the one-sample t-test.
+
 ```{r}
-htwt <- htwt %>% mutate(diffwt = (weight - repwt), diffht = (height - repht))
+htwt <- htwt %>% mutate(diffwt = (weight - repwt), 
+                        diffht = (height - repht))
 t.test(x = htwt$diffwt) %>% tidy()
 t.test(x = htwt$diffht) %>% tidy()
 ```
 
 ## Paired Sample Test (v. 2)
 
+Enter each sample into the `t.test()` function, but use the `paired=TRUE` argument. This is operationally equivalent to the previous version.
+
 ```{r}
 t.test(x=htwt$weight, y=htwt$repwt, paired=TRUE) %>% tidy()
 t.test(x=htwt$height, y=htwt$repht, paired=TRUE) %>% tidy()
-htwt %>% select(height, repht) %>% na.omit() %>% summarize(mean(height), mean(repht))
+htwt %>% select(height, repht) %>% na.omit() %>% 
+  summarize(mean(height), mean(repht))
 ```
 
 # Inference on Binomial Data in R
@@ -637,7 +648,7 @@ Exercise: Figure out what happened here.
 
 ## *OIS* Exercise 6.10
 
-The way a question is phrased can influence a person’s response. For example, Pew Research Center conducted a survey with the following question:
+The way a question is phrased can influence a person's response. For example, Pew Research Center conducted a survey with the following question:
 
 "As you may know, by 2014 nearly all Americans will be required to have health insurance. [People who do not buy insurance will pay a penalty] while [People who cannot afford it will receive financial help from the government]. Do you approve or disapprove of this policy?"
 

diff --git a/week8/week8.html b/week8/week8.html
@@ -218,9 +218,9 @@ <h1>Poisson</h1>
 <p>Hypothesis test, <span class="math inline">\(H_0: \lambda=\lambda_0\)</span> vs <span class="math inline">\(H_1: \lambda \not= \lambda_0\)</span>:</p>
 <p><span class="math display">\[z = \frac{\hat{\lambda} - \lambda_0}{\sqrt{\frac{\hat{\lambda}}{n}}} \mbox{ and } \mbox{p-value} = {\rm Pr}(|Z^*| \geq |z|)\]</span></p>
 <p>where <span class="math inline">\(Z^*\)</span> is a Normal<span class="math inline">\((0,1)\)</span> random variable.</p>
-</section><section id="two-sided-cis-and-hts" class="slide level2">
-<h1>Two-Sided CIs and HTs</h1>
-<p>The two-sided versions of these approximate confidence intervals and hypothesis tests work analogously.</p>
+</section><section id="one-sided-cis-and-hts" class="slide level2">
+<h1>One-Sided CIs and HTs</h1>
+<p>The one-sided versions of these approximate confidence intervals and hypothesis tests work analogously.</p>
 <p>The procedures shown for the <span class="math inline">\(\mbox{Normal}(\mu, \sigma^2)\)</span> case with known <span class="math inline">\(\sigma^2\)</span> from last week are utilzied with the appropriate subsitutions as in the above examples.</p>
 </section><section id="comment" class="slide level2">
 <h1>Comment</h1>
@@ -254,7 +254,7 @@ <h1>Poisson</h1>
 <p>Let <span class="math inline">\(X_1, X_2, \ldots, X_{n_1}\)</span> be iid <span class="math inline">\(\mbox{Poisson}(\lambda_1)\)</span> and <span class="math inline">\(Y_1, Y_2, \ldots, Y_{n_2}\)</span> be iid <span class="math inline">\(\mbox{Poisson}(\lambda_2)\)</span>.</p>
 <p>We have <span class="math inline">\(\hat{\lambda}_1 = \overline{X}\)</span> and <span class="math inline">\(\hat{\lambda}_2 = \overline{Y}\)</span>. For large <span class="math inline">\(n_1\)</span> and <span class="math inline">\(n_2\)</span>, it approximately holds that:</p>
 <p><span class="math display">\[ 
-\frac{\hat{\lambda}_1 - \hat{\lambda}_2 - (\lambda_1 - \lambda_2)}{\sqrt{\frac{\lambda_1}{n_1} + \frac{\lambda_2}{n_2}}} \sim \mbox{Normal}(0,1).
+\frac{\hat{\lambda}_1 - \hat{\lambda}_2 - (\lambda_1 - \lambda_2)}{\sqrt{\frac{\hat{\lambda}_1}{n_1} + \frac{\hat{\lambda}_2}{n_2}}} \sim \mbox{Normal}(0,1).
 \]</span></p>
 </section><section id="normal-unequal-variances" class="slide level2">
 <h1>Normal (Unequal Variances)</h1>
@@ -309,6 +309,7 @@ <h1><code>BSDA</code> Package</h1>
 </section><section id="example-poisson" class="slide level2">
 <h1>Example: Poisson</h1>
 <p>Apply <code>z.test()</code>:</p>
+<pre class="r"><code>&gt; set.seed(210)</code></pre>
 <pre class="r"><code>&gt; n &lt;- 40
 &gt; lam &lt;- 14
 &gt; x &lt;- rpois(n=n, lambda=lam)
@@ -319,28 +320,30 @@ <h1>Example: Poisson</h1>
     One-sample z-Test
 
 data:  x
-z = 0.95256, p-value = 0.3408
+z = 0.41885, p-value = 0.6753
 alternative hypothesis: true mean is not equal to 14
 95 percent confidence interval:
- 13.3919 15.7581
+ 13.08016 15.41984
 sample estimates:
 mean of x 
-   14.575 </code></pre>
+    14.25 </code></pre>
 </section><section id="by-hand-calculations" class="slide level2">
 <h1>By Hand Calculations</h1>
 <p>Confidence interval:</p>
 <pre class="r"><code>&gt; lam.hat &lt;- mean(x)
+&gt; lam.hat
+[1] 14.25
 &gt; stderr &lt;- sqrt(lam.hat)/sqrt(n)
 &gt; lam.hat - abs(qnorm(0.025)) * stderr # lower bound
-[1] 13.3919
+[1] 13.08016
 &gt; lam.hat + abs(qnorm(0.025)) * stderr # upper bound
-[1] 15.7581</code></pre>
+[1] 15.41984</code></pre>
 <p>Hypothesis test:</p>
 <pre class="r"><code>&gt; z &lt;- (lam.hat - lam)/stderr
 &gt; z # test statistic
-[1] 0.9525627
+[1] 0.4188539
 &gt; 2 * pnorm(-abs(z)) # two-sided p-value
-[1] 0.3408117</code></pre>
+[1] 0.6753229</code></pre>
 </section><section id="exercise" class="slide level2">
 <h1>Exercise</h1>
 <p>Figure out how to get the <code>z.test()</code> function to work on Binomial data.</p>
@@ -590,7 +593,9 @@ <h1>Test with Equal Variances</h1>
  178.0114  164.7143 </code></pre>
 </section><section id="paired-sample-test-v.-1" class="slide level2">
 <h1>Paired Sample Test (v. 1)</h1>
-<pre class="r"><code>&gt; htwt &lt;- htwt %&gt;% mutate(diffwt = (weight - repwt), diffht = (height - repht))
+<p>First take the difference between the paired observations. Then apply the one-sample t-test.</p>
+<pre class="r"><code>&gt; htwt &lt;- htwt %&gt;% mutate(diffwt = (weight - repwt), 
++                         diffht = (height - repht))
 &gt; t.test(x = htwt$diffwt) %&gt;% tidy()
      estimate statistic   p.value parameter   conf.low
 1 0.005464481 0.0319381 0.9745564       182 -0.3321223
@@ -601,6 +606,7 @@ <h1>Paired Sample Test (v. 1)</h1>
 1 2.076503  13.52629 2.636736e-29       182 1.773603  2.379403</code></pre>
 </section><section id="paired-sample-test-v.-2" class="slide level2">
 <h1>Paired Sample Test (v. 2)</h1>
+<p>Enter each sample into the <code>t.test()</code> function, but use the <code>paired=TRUE</code> argument. This is operationally equivalent to the previous version.</p>
 <pre class="r"><code>&gt; t.test(x=htwt$weight, y=htwt$repwt, paired=TRUE) %&gt;% tidy()
      estimate statistic   p.value parameter   conf.low
 1 0.005464481 0.0319381 0.9745564       182 -0.3321223
@@ -609,7 +615,8 @@ <h1>Paired Sample Test (v. 2)</h1>
 &gt; t.test(x=htwt$height, y=htwt$repht, paired=TRUE) %&gt;% tidy()
   estimate statistic      p.value parameter conf.low conf.high
 1 2.076503  13.52629 2.636736e-29       182 1.773603  2.379403
-&gt; htwt %&gt;% select(height, repht) %&gt;% na.omit() %&gt;% summarize(mean(height), mean(repht))
+&gt; htwt %&gt;% select(height, repht) %&gt;% na.omit() %&gt;% 
++   summarize(mean(height), mean(repht))
 Source: local data frame [1 x 2]
 
   mean(height) mean(repht)
@@ -797,7 +804,8 @@ <h1><code>poisson.test()</code></h1>
 
 r    hypothesized rate or rate ratio
 
-alternative  indicates the alternative hypothesis and must be one of &quot;two.sided&quot;, &quot;greater&quot; or &quot;less&quot;. You can specify just the initial letter.
+alternative  indicates the alternative hypothesis and must be one of 
+&quot;two.sided&quot;, &quot;greater&quot; or &quot;less&quot;. You can specify just the initial letter.
 
 conf.level  confidence level for the returned confidence interval.</code></pre>
 </section><section id="example-rna-seq" class="slide level2">

diff --git a/week8/week8_notes.Rmd b/week8/week8_notes.Rmd
@@ -154,9 +154,9 @@ $$z = \frac{\hat{\lambda} - \lambda_0}{\sqrt{\frac{\hat{\lambda}}{n}}} \mbox{ an
 
 where $Z^*$ is a Normal$(0,1)$ random variable.
 
-## Two-Sided CIs and HTs
+## One-Sided CIs and HTs
 
-The two-sided versions of these approximate confidence intervals and hypothesis tests work analogously.
+The one-sided versions of these approximate confidence intervals and hypothesis tests work analogously.
 
 The procedures shown for the $\mbox{Normal}(\mu, \sigma^2)$ case with known $\sigma^2$ from last week are utilzied with the appropriate subsitutions as in the above examples. 
 
@@ -211,7 +211,7 @@ Let $X_1, X_2, \ldots, X_{n_1}$ be iid $\mbox{Poisson}(\lambda_1)$ and $Y_1, Y_2
 We have $\hat{\lambda}_1 = \overline{X}$ and $\hat{\lambda}_2 = \overline{Y}$.  For large $n_1$ and $n_2$, it approximately holds that:
 
 $$ 
-\frac{\hat{\lambda}_1 - \hat{\lambda}_2 - (\lambda_1 - \lambda_2)}{\sqrt{\frac{\lambda_1}{n_1} + \frac{\lambda_2}{n_2}}} \sim \mbox{Normal}(0,1).
+\frac{\hat{\lambda}_1 - \hat{\lambda}_2 - (\lambda_1 - \lambda_2)}{\sqrt{\frac{\hat{\lambda}_1}{n_1} + \frac{\hat{\lambda}_2}{n_2}}} \sim \mbox{Normal}(0,1).
 $$
 
 ## Normal (Unequal Variances)
@@ -293,6 +293,10 @@ str(z.test)
 
 Apply `z.test()`:
 
+```{r, display=FALSE}
+set.seed(210)
+```
+
 ```{r}
 n <- 40
 lam <- 14
@@ -308,6 +312,7 @@ Confidence interval:
 
 ```{r}
 lam.hat <- mean(x)
+lam.hat
 stderr <- sqrt(lam.hat)/sqrt(n)
 lam.hat - abs(qnorm(0.025)) * stderr # lower bound
 lam.hat + abs(qnorm(0.025)) * stderr # upper bound
@@ -564,18 +569,24 @@ t.test(x = m_ht$height, y = f_ht$height, var.equal = TRUE)
 
 ## Paired Sample Test (v. 1)
 
+First take the difference between the paired observations. Then apply the one-sample t-test.
+
 ```{r}
-htwt <- htwt %>% mutate(diffwt = (weight - repwt), diffht = (height - repht))
+htwt <- htwt %>% mutate(diffwt = (weight - repwt), 
+                        diffht = (height - repht))
 t.test(x = htwt$diffwt) %>% tidy()
 t.test(x = htwt$diffht) %>% tidy()
 ```
 
 ## Paired Sample Test (v. 2)
 
+Enter each sample into the `t.test()` function, but use the `paired=TRUE` argument. This is operationally equivalent to the previous version.
+
 ```{r}
 t.test(x=htwt$weight, y=htwt$repwt, paired=TRUE) %>% tidy()
 t.test(x=htwt$height, y=htwt$repht, paired=TRUE) %>% tidy()
-htwt %>% select(height, repht) %>% na.omit() %>% summarize(mean(height), mean(repht))
+htwt %>% select(height, repht) %>% na.omit() %>% 
+  summarize(mean(height), mean(repht))
 ```
 
 # Inference on Binomial Data in R
@@ -633,7 +644,7 @@ Exercise: Figure out what happened here.
 
 ## *OIS* Exercise 6.10
 
-The way a question is phrased can influence a person’s response. For example, Pew Research Center conducted a survey with the following question:
+The way a question is phrased can influence a person's response. For example, Pew Research Center conducted a survey with the following question:
 
 "As you may know, by 2014 nearly all Americans will be required to have health insurance. [People who do not buy insurance will pay a penalty] while [People who cannot afford it will receive financial help from the government]. Do you approve or disapprove of this policy?"
 

diff --git a/week8/week8_notes.pdf b/week8/week8_notes.pdf