In [1]:
import numpy as np
import os
import sys

sys.path.append(os.path.join(sys.path[0], os.path.pardir))

from utils import ConfidenceInterval as CI
from utils import NormalDistribution as ND

### **12.1** POINT ESTIMATION FOR $\mu$

- Definition:
    + Point Estimate: A single value that represents some unknown characteristic, such as the population mean.

#### **A Basic Deficency**

- Although straightforward, simple, and precise , point estimates have a fatal deficency, which is inaccuracy.
- Because of sampling variability, it's unlikely that the observed sample mean will coincide with the population mean.
- Since it conveys no information about the degree of inaccuracy, this method of estimation is replaced with a more realistic type, known as the interval estimates or confidence intervals.

#### **Progress Check 12.1** A random sample of 200 graduates of U.S. colleges reveals a mean annual income of $62,600. What is the best estimate of the unknown mean annual income for all graduates of U.S. colleges?

$\pm 62.600$ by some standard deviation.

### **12.2** CONFIDENCE INTERVAL (CI) FOR $\mu$

- Definition:
    + Confidence Interval (CI): Uses a range of values, with a known degree of certainty, includes the unknown population characteristic, such as the population mean.

#### **Why Confidence Intervals Works**

- A sampling distribution carries three important properties:
    + The mean of the sampling distribution is always equal the population mean.
    + The standard error equals the standard deviation of the population divided by the square root of the sample size.
    + The shape of the distribution is normal given that the sample size lies in the range of 25 to 100 samples.

#### **A Series of Confidence Intervals**

- For each observed sample mean, we can construct the 95% confidence interval by adding and subtracting it 1.96 by standard errors: <br><br>
<center>$\Large \overline{X} \pm 1.96\sigma_{\overline{X}}$</center>

#### **True Confidence Intervals**

- In a normal distribution, 95% of all sample means deviate less than 1.96 standard errors from the unknown population mean. Thus when the confidence interval is constructed, we know that the range of $\overline{X} \pm 1.96\sigma_{\overline{X}}$ has 95% probability of being true because it can include the population mean.

- Graphical illustration of a normal distribution and the confidence intervals: <br>
![image.png](attachment:d4f0bb0c-a01f-4e78-a7e7-fefb1a776cda.png)

#### **False Confidence Intervals**

- To the contrary, there's 5% of all sample means deviates more than 1.96 standard errors from the unknown population mean. Thus, a confidence interval always has 5% chance of being a false because it does not include the population mean.

#### **Confidence Interval for $\mu$ Based on z**

- The percentage in the confidence interval is a representation of the z score that is used in the calculation of its range.

<center><b>CONFIDENCE INTERVAL FOR $\mu$ (BASED ON z)</b></center>
<center>$\Large \overline{X} \pm (z_{conf})(\sigma_{\overline{X}})$<center>

#### **Two Assumptions**

- The construction of the confidence interval assumes that:
    + The population standard deviation is known and
    + The shape of the population distribution is normal or the sample size is sufficiently large to satisfy the central limit theorem.

#### **Progress Check 12.2** Reading achievement scores are obtained for a group of fourth graders. A score of 4.0 indicates a level of achievement appropriate for fourth grade, a score below 4.0 indicates underachievement, and a score above 4.0 indicates overachievement. Assume that the population standard deviation equals 0.4. A random sample of 64 fourth graders reveals a mean achievementscore of 3.82.

(a) Construct a 95 percent confidence interval for the unknown population mean. (Remember to convert the standard deviation to a standard error.)

In [4]:
ci = CI.confidence_interval(3.82, 0.4, 64)
ci

array([3.722, 3.918])

(b) Interpret this confidence interval; that is, do you find any consistent evidence either of overachievement or of underachievement?

We can claim, with 95% confidence, that the true population is included in the interval, which contains the scores that are below 4.0. Thus indicating that the group of forth graders is underachieving on average.

### **12.3** INTERPRETATION OF A CONFIDENCE INTERVAL

- Remark:
    + A 95 percent confidence claim reflects a long-term performance rating for an extended series of confidence intervals.
    + In practice, an investigator would construct only one confidence interval and the percentage specification represents only the theoretical probability of it being true. With that being done, we can never really know whether the obtained interval is true unless the entire population is surveyed. However,
    <b>when the level of confidence equals 95 percent or more, we can be reasonably confident that the one observed confidence interval includes the population mean.</b>

#### **Progress Check 12.3** Before taking the GRE, a random sample of college seniors received special training on how to take the test. After analyzing their scores on the GRE, the investigator reported a dramatic gain, relative to the national average of 500, as indicated by a 95 percent confidence interval of 507 to 527. Are the following interpretations true or false?

(a) About 95 percent of all subjects scored between 507 and 527. <br>
False. 95% of all subjects have their scores deviate less than 1.96 standard errors from the unknown population mean, not the observed mean.

(b) The interval from 507 to 527 refers to possible values of the population mean for all students who undergo special training. <br>
True.

(c) The true population mean definitely is between 507 and 527. <br>
False. The true population mean has 95% chance of being between 507 and 527.

(d) This particular interval describes the population mean about 95 percent of the time. <br>
False. This particular interval either describes the one true population mean or fails to describe the one true population mean.

(e) In practice, we never really know whether the interval from 507 to 527 is true or false. <br>
True. Unless the entire population is surveyed.

(f) We can be reasonably confident that the population mean is between 507 and 527. <br>
True.

### **12.4** LEVEL OF CONFIDENCE

- Definition:
    + Level of Confidence: The percent of time that a series of confidence intervals includes the unknown population characteristics, such as the population mean.

#### **Effect on Width of Interval**

- A higher level of confidence, as described in the formula of confidence intervals' calculation, requires a larger absolute value of z score, which in turn, causes the interval to be wider, therefore, significantly less precise, unless offset by a larger sample size.

#### **Choosing a Level of Confidence**

- Although many levels of confidence can be used, the 95 percent and 99 percent levels are the most prevalent.
- Generally, a high level as 99 percent should be reserved for situations in which a false interval might have particularly serious consequences.

### **12.5** EFFECT OF SAMPLE SIZE

- Remark:
    + The larger the sample size, the smaller the standard error, hence, the more precise (narrower) the confidence interval will be.
    + Given this perspective, the sample size for a confidence interval, unlike that for a hypothesis test, can never be too large.

#### **Selection of Sample Size**

<center><b>CRUDE FORMULA OF APPROPRIATE SAMPLE SIZE CALCULATION</b></center>
<center>$\Large n = (z_{conf}\frac{\sigma_{pop}}{E})^2$</center>
Where: <br>
+ E = interval range / 2

#### **Progress Check 12.4** On the basis of a random sample of 120 adults, a pollster reports, with 95 percent confidence, that between 58 and 72 percent of all Americans believe in life after death.

(a) If this interval is too wide, what, if anything, can be done with the existing data to obtain a narrower confidence interval? <br>
Lower the level of confidence.

(b) What can be done to obtain a narrower 95 percent confidence interval if another similar investigation is being planned? <br>
Increase the sample size of adults being surveyed.

### **12.6** HYPOTHESIS TESTS OR CONFIDENCE INTERVALS?

- Remark:
    + Ordinarily, data are used either to test a hypothesis or to construct a confidence interval, but not both.
    + Hypothesis tests usually have been preferred to confidence intervals in the behavioral sciences.
    + Confidence intervals tend to be more informative than hypothesis test.
    + Hypothesis test merely indicates whether or not an effect is present, whereas confidence intervals indicate the possible size of effect.

#### **When to Use Confidence Intervals**

- If the primary concern of an investigator is to determine whether there exists an effect, use the hypothesis test.
- Other situations, in which there's obvious reasons to reject the null hypothesis, confidence intervals should be applied to identify the possible size of effect.

### **12.7** CONFIDENCE INTERVAL FOR POPULATION PERCENT

- Definition:
    + Margin of Error: That which is added to or subtracted from some sample value, such as the sample proportion or sample mean, to obtain the limits of a confidence intervals.

- There's a type of confidence interval, which is often encountered in the media, for population percents or proportions. This type of CI is essentially calculated using the same formula as in section 12.2, which is:
<center>sample percent $\pm$ (1.96)(standard error of percent)</center> <br>
Where:
+ Standard error of percent: similar to the standard error of the population.
+ Sample percent: the percentage of subjects answer in a specific manner that the investigator wants to demonstrate.
+ 1.96: the representative z score for 95% level of confidence.

#### **Sample Size and Margin of Error**

- The sample size and the standard error have an inverse proportion relationship. If the investigator has a larger degree of tolerable error, then the sample size will be smaller in effect.

#### **Pollsters Use Larger Samples**

- In constrast to experiments, the samples in surveys can never be too large. So that as long as the budget is appropriate, investigators tend to use as large as possible sample sizes.

#### **A Final Caution** 

- Unless specified in the reports, opion polls might use biased questions or tactics to obtain the results, these are called nonstatistical errors that could compromise the value of confidence intervals.

#### **Progress Check 12.5** In a recent scientific sample of about 900 adult Americans, 70 percent favor stricter gun control of assault weapons, with a margin of error of ±4 percent for a 95 percent confidence interval. Therefore, the 95 percent confidence interval equals 66 to 74 percent. Indicate whether the following interpretations are true or false:

(a) The interval from 66 to 74 percent refers to possible values of the sample percent. <br>
True.

(b) The true population percent is between 66 and 74 percent. <br>
False. There's a 5% chance that the observed percent fails to represent the true population percent.

(c) In the long run, a series of intervals similar to this one would fail to include the population percent about 5 percent of the time. <br>
True.

(d) We can be reasonably confident that the population percent is between 66 and 74 percent. <br>
True.

#### **Other Types of Confidence Intervals**

- Confidence intervals can be constructed not only for population means and percents but also for differences between two population means, as discussed in subsequent chapters. Although not discussed in this book, confidence intervals also can be constructed for other characteristics of populations, including variances and correlation coefficients.

### **Review Questions**

#### **12.6** (True or False) You should consider using a confidence interval whenever

(a) the null hypothesis has been rejected. <br>
True.

(b) the issue is whether or not an effect is present. <br>
False. Hypothesis test should be used in a such case.

(c) the issue involves possible effect sizes. <br>
True.

(d) there is no meaningful null hypothesis. <br>
True.

#### **12.7** In Question 10.5 on page 191, it was concluded that, the mean salary among the population of female members of the American Psychological Association is less than that ($82,500) for all comparable members who have a doctorate and teach full time.

(a) Given a population standard deviation of $\$6,000$ and a sample mean salary of $\$80,100$ for a random sample of 100 female members, construct a 99 percent confidence interval for the mean salary for all female members.

In [9]:
conf_zscore = ND.find_zscore(0.995)[0]
CI.confidence_interval(80100, 6000, conf_zscore, 100)

array([78552., 81648.])

(b) Given this confidence interval, is there any consistent evidence that the mean salary for all female members falls below $82,500, the mean salary for all members? <br>
Yes, we can claim with 95% of confidence that the mean salary of female psychologists is lower than the mean salary of all members.

#### **12.8** In Review Question 11.12 on page 218, instead of testing a hypothesis, you might prefer to construct a confidence interval for the mean weight of all 2-pound boxes of candy during a recent production shift.

(a) Given a population standard deviation of .30 ounce and a sample mean weight of 33.09 ounces for a random sample of 36 candy boxes, construct a 95 percent confidence interval.

In [11]:
conf_zscore = ND.find_zscore(0.975)[0]
CI.confidence_interval(33.09, 0.3, conf_zscore, 36)

array([32.99, 33.19])

(b) Interpret this interval, given the manufacturer’s desire to produce boxes of candy that, on the average, exceed 32 ounces. <br>
With this confidence interval, we can be reasonably confident that the sampled product shift is more productive than what the manufacturer might expect.

#### **12.9** It’s tempting to claim that, once a particular 95 percent confidence interval has been constructed, it includes the unknown population characteristic with a probability of .95. What is wrong with this claim?

The 95% reflects the probability for an extended series of confidence intervals to include the true population mean rather than a single one.

#### ****

#### ****