# Solving Problems Statistically v. Computationally

Randomly generate data for a population with a mean of 10 and standard deviation of 1.  Then randomly sample 50 units from this same population without replacement.

In [1]:
pop <- rnorm(100000, 10, 1)
pop_samp <- sample(pop, 50, replace = F)

Create a function to calculate the 95% confidence interval from scratch using Student's T distribution. Using this formula:



In [2]:
confidence_interval <- function(x) {
  n <- length(x) #sample size 
  samp_mean <- mean(x) #sample mean
  st_dev <- sd(x) #sample standard deviation
  st_err <- st_dev / sqrt(n) #standard error of the sample mean
  ci <- qt(.975, df = n-1) * st_err #confidence interval bound
  LCL <- samp_mean - ci  #lower confidence interval
  UCL <- samp_mean + ci #upper confidence interval
  results <- list(LCL = LCL, sample_mean = samp_mean, UCL = UCL) #compile to a list
  return(results) #print the list
}

Create a function to simulate the bootstrap.

In [3]:
sampling_eng <- function(x, n_samples = 10000) {
  results <- c() #empty vector for the simulation results
  n <- length(x) #sample size
  samp_range <- seq(1, n_samples) #sampling range
  for (i in seq_along(samp_range)) { #for loop
    samp <- sample(x, n, replace = T) #sample from the sample with replacement
    samp_mean <- mean(samp) #calucated the sampled mean
    results <- append(results, samp_mean) #append the results to a vector
  }
  results_mu <- mean(results) #caculcate the sample of resamples mean
  results_sd <- sd(results) #calculate the standard deviation of the resamples
  UCL <- results_mu + (2 * results_sd) #two standard deviations up
  LCL <- results_mu - (2 * results_sd) #two standard deviations down
  results_update <- list(boot_LCL = LCL, boot_sample_mean = results_mu, boot_UCL = UCL) #compile to a list
  return(results_update) #print the list
}

View the results.

In [4]:
sampling_eng(pop_samp, n_samples = 10000)
confidence_interval(pop_samp)

Compare to the confidence interval calculated from the base t-test function. 

In [5]:
t.test(pop_samp,mu = 10)


	One Sample t-test

data:  pop_samp
t = 1.8346, df = 49, p-value = 0.07264
alternative hypothesis: true mean is not equal to 10
95 percent confidence interval:
  9.974435 10.561580
sample estimates:
mean of x 
 10.26801 
