In [1]:
library(dplyr)
library(ggplot2)


Attaching package: 'dplyr'

The following object is masked from 'package:stats':

    filter

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union



In [7]:
n <- 20
samp <- sample_n(cars, n)

In [9]:
samp

Unnamed: 0,speed,dist
31,17,50
6,9,10
5,8,16
21,14,36
35,18,84
33,18,56
44,22,66
10,11,17
11,11,28
48,24,93


### Confidence intervals

Based only on this single sample, the best estimate of the average would be the sample mean, usually denoted as X-Bar.  That serves as a good **point estimate** but it would be useful to also communicate how uncertain we are of that estimate. This uncertainty can be quantified using a **confidence interval**.  

A confidence interval for a population mean is of the following form X-Bar - Z* x S.E., X-Bar + Z* x S.E.

We can use the `qnorm` function to find the critical value associated with a given percentile under the normal distribution. Remember that confidence levels and percentiles are not equivalent. For example, a 95% confidence level refers to the middle 95% of the distribution, and the critical value associated with this area will correspond to the 97.5th percentile.

In [15]:
z_star_95 <- qnorm(0.975)
round(z_star_95,2)

Let's finally calculate the confidence interval:

In [17]:
samp %>%
  summarise(lower = mean(speed) - z_star_95 * (sd(speed) / sqrt(n)),
            upper = mean(speed) + z_star_95 * (sd(speed) / sqrt(n)))

Unnamed: 0,lower,upper
1,13.35429,18.14571


In this case we have the rare luxury of knowing the true population mean since we have data on the entire population. Let's calculate this value so that we can determine if our confidence intervals actually capture it. We'll store it in a data frame called `params` (short for population parameters), and name it `mu`.

In [18]:
params <- cars %>%
  summarise(mu = mean(speed))

In [19]:
params

Unnamed: 0,mu
1,15.4


We can accomplish this using the `do` function. The following lines of code takes 50 random samples of size `n` from population (and remember we defined $n = 60$ earlier), and computes the upper and lower bounds of the confidence intervals based on these samples.

In [42]:
ci <- cars %>%
        rep_sample_n(size = n, rep=50,replace = TRUE)%>%
        summarise(lower = mean(speed) - z_star_95 * (sd(speed) / sqrt(n)),
                  upper = mean(speed) + z_star_95 * (sd(speed) / sqrt(n)))

ERROR: Error in function_list[[i]](value): could not find function "rep_sample_n"


In [33]:
ci

In [34]:
ci %>%
  slice(1:5)

ERROR: Error in UseMethod("slice_"): no applicable method for 'slice_' applied to an object of class "list"


In [41]:
library(statsr)

In [40]:
install_github("StatsWithR/statsr")

Downloading GitHub repo StatsWithR/statsr@master
from URL https://api.github.com/repos/StatsWithR/statsr/zipball/master
Installing statsr
Installing 26 packages: BH, broom, colorspace, DBI, digest, dplyr, evaluate, formatR, ggplot2, gridExtra, gtable, highr, htmltools, knitr, mime, mnormt, munsell, plyr, psych, R6, Rcpp, rmarkdown, scales, stringi, stringr, tidyr



  There are binary versions available (and will be installed) but the
  source versions are later:
          binary   source
BH      1.60.0-1 1.60.0-2
DBI          0.4    0.4-1
formatR      1.3      1.4
highr      0.5.1      0.6
knitr     1.12.3     1.13
psych      1.5.8    1.6.4
Rcpp      0.12.4   0.12.5



: packages 'dplyr', 'ggplot2' are in use and will not be installed

package 'BH' successfully unpacked and MD5 sums checked
package 'broom' successfully unpacked and MD5 sums checked
package 'colorspace' successfully unpacked and MD5 sums checked


: cannot remove prior installation of package 'colorspace'

package 'DBI' successfully unpacked and MD5 sums checked
package 'digest' successfully unpacked and MD5 sums checked


: cannot remove prior installation of package 'digest'

package 'evaluate' successfully unpacked and MD5 sums checked
package 'formatR' successfully unpacked and MD5 sums checked
package 'gridExtra' successfully unpacked and MD5 sums checked
package 'gtable' successfully unpacked and MD5 sums checked
package 'highr' successfully unpacked and MD5 sums checked
package 'htmltools' successfully unpacked and MD5 sums checked
package 'knitr' successfully unpacked and MD5 sums checked
package 'mime' successfully unpacked and MD5 sums checked
package 'mnormt' successfully unpacked and MD5 sums checked
package 'munsell' successfully unpacked and MD5 sums checked
package 'plyr' successfully unpacked and MD5 sums checked


: cannot remove prior installation of package 'plyr'

package 'psych' successfully unpacked and MD5 sums checked
package 'R6' successfully unpacked and MD5 sums checked
package 'Rcpp' successfully unpacked and MD5 sums checked


: cannot remove prior installation of package 'Rcpp'

package 'rmarkdown' successfully unpacked and MD5 sums checked
package 'scales' successfully unpacked and MD5 sums checked
package 'stringi' successfully unpacked and MD5 sums checked
package 'stringr' successfully unpacked and MD5 sums checked
package 'tidyr' successfully unpacked and MD5 sums checked


"C:/Users/Bryan/Anaconda3/R/bin/x64/R" --no-site-file --no-environ --no-save  \
  --no-restore CMD INSTALL  \
  "C:/Users/Bryan/AppData/Local/Temp/RtmpusOvHg/devtools227821b55d06/StatsWithR-statsr-2d25c46"  \
  --library="C:/Users/Bryan/Anaconda3/R/library" --install-tests 



ERROR: Error: Command failed (1)
