# Confidence Intervals


In [8]:
import numpy as np
from scipy.stats import norm

In [4]:
mysamp = np.random.normal(loc=1, scale=np.sqrt(2), size=10)
xbar = np.mean(mysamp)  # Estimate the mean with the sample mean

In [9]:
print(xbar)

0.7572739251676908


> Let's find the critical values for a 95% confidence interval. We wand to find two values that, when indicated on the x-axis for a standard normal curve, capture area 0.95 between them. This means that we want to find a number that cuts of area 0.95+0.05/2=0.975 to the left and 0.025 to the right. We can get this by typing  
`norm.ppf(0.975) `

In [10]:
# Find the critical value for a 95% confidence interval
cv = norm.ppf(0.975)  # ppf = percent point function (inverse CDF)
cv  # View the critical value

np.float64(1.959963984540054)

In [12]:
# Compute the confidence interval endpoints
lower = xbar - cv * np.sqrt(2/10)
upper = xbar + cv * np.sqrt(2/10)

# Store them in a list (vector)
myci = [lower, upper]

# Display the confidence interval
print(myci)


[np.float64(-0.11924861540889076), np.float64(1.6337964657442723)]


In [None]:
# Initialize count
count = 0

# Set the critical value for 95% CI
cv = norm.ppf(0.975)

n_simulations = 100000

# Perform 100,000 simulations
for i in range(n_simulations):
    # Generate a random sample of size 10 from N(1,2)
    mysamp = np.random.normal(loc=1, scale=np.sqrt(2), size=10)
    
    # Compute the sample mean
    xbar = np.mean(mysamp)
    
    # True mean
    true_mean = 1
    
    # Check if the CI contains the true mean
    if (xbar - cv * np.sqrt(2/10) < true_mean) and (xbar + cv * np.sqrt(2/10) > true_mean):
        count += 1


count / n_simulations  # Proportion of CIs that contain the true mean

0.9493

> Look at the proportion by typing  
`count/100000`  
What do you see?

# Making a Confidence Interval Function

> R has built-in functions to make confidence intervals for the mean of a population or the difference in two means. That is, anything confidence interval for a mean or difference of means that requires a t-critical value.In order to get a confidence interval with z-critical values, one would have to load a special package. Instead of doing this, let's work with the base packages in R and write our own function.  

>In the cell below, type  
`normCI<-function(data,variance,level){}`  

> In **between the braces** (which can be on different lines for clarity) add the lines  
`cv<-qnorm(level+(1-level)/2)  
xbar<-mean(data)  
c(xbar-cv*sqrt(variance/length(data)),xbar+cv*sqrt(variance/length(data)))`


> Now type  
`normCI(mysamp,2,0.95)`  
  
> Note that you will not get the exact same confidence interval that you originally computed at the beginning of this lab because we have overwritten the vector "mysamp"-- many times in fact!  

# Built in t-Confidence Intervals in R  
> Compresive strength of concrete is measured in $\mbox{KN/m}^{2}$. A random  sample of one type of concrete (cement mixed with pulverized fuel ash) and a random sample of another type of concrete (cement mixed with a new artifical siliceous material produced in a lab) were obtained.  

>Read in the first random sample from provided data files by typing the following.  
`flyash<-read.table("flyash")`    
`flyash<-c(unlist(flyash))`  
`flyash<-as.vector(flyash)`

Do the same thing for the second sample. The filename for this is 'silicate'.

> Assume that the populations are both normally distributed.  
> Find a 95% confidence interval for the true mean compresive strength of the fly ash mix by typing  
`t.test(flyash)`  
> Can you pick the confidence interval out from this information?

Suppose that we want to change the default confidence level from 95% to 90%. Type the following  
`t.test(flyash',conf.level=0.90)`  
Does the width of the resulting confidence interval compare to the width of the previous 95% interval in the way that you expected?

> Finally, let us do a two-sample t-test to compare the means for both concrete populations by typing  
`t.test(flyash,silicate')`

> Does it appear that the new silcate mix is stronger than the fly ash mix?  

> You'll notice that the "Welch t-test" was performed. This is the more general test if you can not assume that the populations has equal variances. This is most likely what you will be using in "real life". However, if you would like to perform a "pooled variance test", you would include "var.equal=T" in your last command.  

> Try this. Is your resulting confidence interval wider or narrower than the Welch confidence interval? Does the relative length make sense to you?
