# Possibility of Incorrect Decisions
Having made a decision about the null hypothesis, we never know absolutely
whether that decision is correct or incorrect, unless, of course, we survey the entire
population. Even if $H_0$ is true (and, therefore, the hypothesized distribution of z about
$H_0$ also is true), there is a slight possibility that, just by chance, the one observed z actu-
ally originates from one of the shaded rejection regions of the hypothesized distribution
of z, thus causing the true $H_0$ to be rejected. This type of incorrect decision—rejecting a
true $H_0$ —is referred to as a type I error or a false alarm.<br>
On first impulse, it might seem desirable to abolish the shaded rejection regions in
the hypothesized sampling distribution to ensure that a true $H_0$ never is rejected. A most
unfortunate consequence of this strategy, however, is that no $H_0$ , not even a radically
false $H_0$ , ever would be rejected. This second type of incorrect decision—retaining a
false $H_0$ —is referred to as a type II error or a miss.
### Even though we never really know whether a particular decision is correct or incorrect, it is reassuring that in the long run, most decisions will be correct— assuming the null hypotheses are either true or seriously false.

# Strong or Weak Decisions
## Retaining $H_0$ Is a Weak Decision
There are subtle but important differences in the interpretation of decisions to
retain $H_0$ and to reject $H_0$ . $H_0$ is retained whenever the observed z qualifies as a
common outcome on the assumption that $H_0$ is true. Therefore, $H_0$ could be true.
However, the same observed result also would qualify as a common outcome when
the original value in $H_0$ (500) is replaced with a slightly different value. Thus, the
retention of $H_0$ must be viewed as a relatively weak decision. Because of this weakness, many statisticians prefer to describe this decision as simply a failure to reject $H_0$
rather than as the retention of $H_0$ . In any event, the retention of $H_0$ can’t be interpreted
as proving $H_0$ to be true. If $H_0$ had been retained in the present example, it would have
been appropriate to conclude not that the mean SAT math score for all local freshmen equals the national average, but that the mean SAT math score could equal the
national average, as well as many other possible values in the general vicinity of the
national average.
## Rejecting $H_0$ Is a Strong Decision
On the other hand, $H_0$ is rejected whenever the observed z qualifies as a rare out-
come—one that could have occurred just by chance with a probability of .05 or less—
on the assumption that $H_0$ is true. This suspiciously rare outcome implies that $H_0$ is
probably false (and conversely, that $H_1$ is probably true). Therefore, the rejection of $H_0$
can be viewed as a strong decision. When $H_0$ was rejected in the present example, it
was appropriate to report a definitive conclusion that the mean SAT math score for all
local freshmen probably exceeds the national average.
To summarize,
### the decision to retain $H_0$ implies not that $H_0$ is probably true, but only that $H_0$ could be true, whereas the decision to reject $H_0$ implies that $H_0$ is probably false (and that $H_1$ is probably true).
Since most investigators hope to reject $H_0$ in favor of $H_1$ , the relative weakness of the
decision to retain $H_0$ usually does not pose a serious problem.

# Why the Research Hypothesis Isn’t Tested Directly?
Even though $H_0$ , the null hypothesis, is the focus of a statistical test, it is usually
of secondary concern to the investigator. Nevertheless, there are several reasons why,
although of primary concern, the research hypothesis is identified with $H_1$ and tested
indirectly.
## 1. Lacks Necessary Precision:-
#### The research hypothesis, but not the null hypothesis, lacks the necessary precision to be tested directly.
To be tested, a hypothesis must specify a single number about which the hypothesized sampling distribution can be constructed. Because it specifies a single number,
the null hypothesis, rather than the research hypothesis, is tested directly. In the SAT
example, the null hypothesis specifies that a precise value (the national average of 500)
describes the mean for the current population of interest (all local freshmen). Typically, the research hypothesis lacks the required precision. It merely specifies that some
inequality exists between the hypothesized value (500) and the mean for the current
population of interest (all local freshmen).
## 2. Supported by a Strong Decision to Reject
Logical considerations also argue for the indirect testing of the research hypothesis and the direct testing of the null hypothesis.
#### Because the research hypothesis is identified with the alternative hypothesis, the decision to reject the null hypothesis, should it be made, will provide strong support for the research hypothesis, while the decision to retain the null hypothesis, should it be made, will provide, at most, weak support for the null hypothesis.
As mentioned, the decision to reject the null hypothesis is stronger than the decision to retain it. Logically, a statement such as “All cows have four legs” can never
be proven in spite of a steady stream of positive instances. It only takes one negative
instance—one cow with three legs—to disprove the statement. By the same token,
one positive instance (common outcome) doesn’t prove the null hypothesis, but one negative instance (rare outcome) disproves the null hypothesis. (Strictly speaking,
however, since a rare outcome implies that the null hypothesis is probably but not definitely false, remember that there always is a very small possibility that the rare outcome
reflects a true null hypothesis.)<br>
Logically, therefore, it makes sense to identify the research hypothesis with the
alternative hypothesis. If, as hoped, the data favor the research hypothesis, the test will
generate strong support for your hunch: It’s probably true. If the data do not favor the
research hypothesis, the hypothesis test will generate, at most, weak support for the
null hypothesis:
### It could be true. Weak support for the null hypothesis is of little consequence, as this hypothesis—that nothing special is happening in the population—usually serves only as a convenient testing device.
## Reminder: Rejecting $H_0$ implies that it probably is false, while retaining $H_0$ implies only that it might be true.

# One-Tailed and Two-Tailed Tests
Let’s consider some techniques that make the hypothesis test more responsive to special conditions.
### Two-Tailed Tests or Nondirectional Test:- Rejection regions are located in both tails of the sampling distribution.
Generally, the alternative hypothesis, $H_1$ , is the complement of the null hypoth-
esis, $H_0$ . Under typical conditions, the form of $H_1$ resembles that shown for the SAT
example, namely,
$$ H_1 : \mu \neq 500$$
This alternative hypothesis says that the null hypothesis should be rejected if the mean
reading score for the population of local freshmen differs in either direction from the
national average of 500. An observed z will qualify as a rare outcome if it deviates too
far either below or above the national average. Panel A of Figure 11.2 shows rejection
regions that are associated with both tails of the hypothesized sampling distribution.
The corresponding decision rule, with its pair of critical z scores of ±1.96, is referred to
as a two-tailed or nondirectional test.
### One-Tailed Test (Lower Tail Critical) :- Rejection region is located in just one tail of the sampling distribution.
Now let’s assume that the research hypothesis for the investigation of SAT math
scores was based on complaints from instructors about the poor preparation of local
freshmen. Assume also that if the investigation supports these complaints, a remedial
program will be instituted. Under these circumstances, the investigator might prefer a
hypothesis test that is specially designed to detect only whether the population mean
math score for all local freshmen is less than the national average.
This alternative hypothesis reads:
$$ H_1 : \mu \leq 500$$
It reflects a concern that the null hypothesis should be rejected only if the population
mean math score for all local freshmen is less than the national average of 500. Accordingly, an observed z triggers the decision to reject $H_0$ only if z deviates too far below the
national average. Panel B of Figure 11.2 illustrates a rejection region that is associated
with only the lower tail of the hypothesized sampling distribution. The corresponding
decision rule, with its critical z of –1.65, is referred to as a one-tailed or directional
test with the lower tail critical.
### Notice that the level of significance, α, equals .05 for this one-tailed test and also for the original two-tailed test.
![image.png](attachment:192bba3a-7aed-4c63-97b9-68a6368ddc94.png)<br>

### Extra Sensitivity of One-Tailed Tests
This new one-tailed test is extra sensitive to any drop in the population mean for the
local freshmen below the national average. If $H_0$ is false because a drop has occurred,
then the observed z will be more likely to deviate below the national average. As can
be seen in panels A and B of Figure 11.2, an observed deviation in the direction of
concern—below the national average—is more likely to penetrate the broader rejection
region for the one-tailed test than that for the two-tailed test. Therefore, the decision
to reject a false $H_0$ (in favor of the research hypothesis) is more likely to occur in the
one-tailed test than in the two-tailed test.
### One-Tailed Test (Upper Tail Critical)
Panel C of Figure 11.2 illustrates a one-tailed or directional test with the upper tail
critical. This one-tailed test is the mirror image of the previous test. Now the alternative
hypothesis reads: $$ H_1 : \mu > 500 $$
and its critical z equals 1.65. This test is specially designed to detect only whether
the population mean math score for all local freshmen exceeds the national average.
For example, the research hypothesis for this investigation might have been inspired
by the possibility of eliminating an existing remedial math program if it can be demonstrated that, on the average, the SAT math scores of all local freshmen exceed the
national average.


# One or Two Tails?
### Before a hypothesis test, if there is a concern that the true population mean differs from the hypothesized population mean only in a particular direction, use the appropriate one-tailed or directional test for extra sensitivity. Otherwise, use the more customary two-tailed or nondirectional test.
Having committed yourself to a one-tailed test with its single rejection region, you
must retain $H_0$ , regardless of how far the observed z deviates from the hypothesized
population mean in the direction of “no concern.” For instance, if a one-tailed test
with the lower tail critical had been used with the data for 100 freshmen from the SAT
example, $H_0$ would have been retained because, even though the observed z equals an
impressive value of 3, it deviates in the direction of no concern—in this case, above
the national average. Clearly, a one-tailed test should be adopted only when there is
absolutely no concern about deviations, even very large deviations, in one direction.
If there is the slightest concern about these deviations, use a two-tailed test.<br>
The selection of a one- or two-tailed test should be made before the data are collected. Never “peek” at the value of the observed z to determine whether to locate the
rejection region for a one-tailed test in the upper or the lower tail of the distribution
of z. To qualify as a one-tailed test, the location of the rejection region must reflect
the investigator’s concern only about deviations in a particular direction before any
inspection of the data. Indeed, the investigator should be able to muster a compelling
reason, based on an understanding of the research hypothesis, to support the direction
of the one-tailed test.
## New Null Hypothesis for One-Tailed Tests
When tests are one-tailed, a complete statement of the null hypothesis also should
include all possible values of the population mean in the direction of no concern. For
example, given a one-tailed test with the lower tail critical, such as $H_1$ : μ < 500, the
complete null hypothesis should be stated as $H_0$ : μ ≥ 500 instead of $H_0$ : μ = 500. By
the same token, given a one-tailed test with the upper tail critical, such as $H_1$ : μ > 500,
the complete null hypothesis should be stated as $H_0$ : μ ≤ 500.<br>
If you think about it, the complete $H_0$ describes all of the population means that could
be true if a one-tailed test results in the retention of the null hypothesis. For instance,
if a one-tailed test with the lower tail critical results in the retention of $H_0$ : μ ≥ 500, the
complete $H_0$ accurately reflects the fact that not only μ = 500 could be true, but also that
any other value of the population mean in the direction of no concern, that is, μ > 500,
could be true. (Remember, when the test is one-tailed, even a very deviant result in the
direction of no concern—possibly reflecting a mean much larger than 500—still would
trigger the decision to retain $H_0$ .) 
### Henceforth, whenever a one-tailed test is employed, write $H_0$ to include values of the population mean in the direction of no concern—even though the single number in the complete $H_0$ identified by the equality sign is the one value about which the hypothesized sampling distribution is centered and, therefore, the one value actually used in the hypothesis test.

## Reminder: In the absence of compelling reasons for a one-tailed test, use a two-tailed test.

# Choosing a Level of Significance ($\alpha$)
The level of significance indicates how rare an observed z must be before $H_0$ can be
rejected. To reject $H_0$ at the .05 level of significance implies that the observed z would
have occurred, just by chance, with a probability of only .05 (one chance out of twenty)
or less.<br>
The level of significance also spotlights an inherent risk in hypothesis testing, that
is, the risk of rejecting a true $H_0$ . When the level of significance equals .05, there is a
probability of .05 that, even though $H_0$ is true, the observed z will stray into the rejection region and cause the true $H_0$ to be rejected.
## Which Level of Significance?
When the rejection of a true $H_0$ is particularly serious, a smaller level of significance
can be selected. For example, the .01 level of significance implies that before $H_0$ can
be rejected, the observed z must achieve a degree of rarity equal to .01 (one chance out
of one hundred) or less; it also limits, to a probability of .01, the risk of rejecting a true
$H_0$ . The .01 level might be used in a hypothesis test in which the rejection of a true $H_0$
would cause the introduction of a costly new remedial education program, even though
the population mean math score for all local freshmen really equals the national aver-
age. An even smaller level of significance, such as the .001 level, might be used when
the rejection of a true $H_0$ would have horrendous consequences—for instance, the treatment of serious illnesses, such as AIDS, exclusively with a new, very expensive drug that not only is worthless but also has severe side effects.<br>
Although many different levels of significance are possible, most tables for hypothesis tests are geared to the .05 and .01 levels. However, in real-life applications, you, as an investigator, might have to select a level of significance. Unless there are obvious reasons for selecting
either a larger or a smaller level of significance, use the customary .05 level—the largest level of significance reported in most professional journals.<br>
When testing hypotheses with the z test, you may find it helpful to refer to Table 11.1,
which lists the critical z values for one- and two-tailed tests at the .05 and .01 levels of
significance.<br>
![image.png](attachment:dd9dd506-cacb-4cb7-90e6-6351f020e142.png)<br>

# Testing a Hypothesis About Vitamin C
Let’s look more closely at the four possible outcomes of a hypothesis test by focusing
on a study to determine whether vitamin C increases the intellectual aptitude of high
school students. After being randomly selected from some large school district, each of
36 students takes a daily dose of 90 milligrams of vitamin C for a period of two months
before being tested for IQ.<br>
Ordinarily, IQ scores for all students in this school district approximate a normal
distribution with a mean of 100 and a standard deviation of 15. According to the null
hypothesis, a mean of 100 still would describe the distribution of IQ scores even if all
of the students in the district were to receive the vitamin C treatment. Furthermore,
given our exclusive concern about detecting only any deviation of the population mean
above 100, the null hypothesis takes the form appropriate for a one-tailed test with the
upper tail critical, namely: $$ H_0 : \mu \leq 100 $$
The rejection of $H_0$ would support $H_1$ , the research hypothesis that something special
is happening in the underlying population (because vitamin C increases intellectual
aptitude), namely: $$ H_1 : \mu > 100 $$
## z Test Is Appropriate
To determine whether the sample mean IQ for the 36 students qualifies as a common or a rare outcome under the null hypothesis, a z test will be used. The z test for a
population mean is appropriate since, for IQ scores, the population standard deviation
is known to be 15 and the shape of the population is known to be normal.
## Two Groups Would Have Been Better
Although poorly designed, the present experiment supplies a perspective that will
be most useful in later chapters. A better-designed experiment would contrast the IQ
scores for the group of subjects who receive vitamin C with the IQ scores for a placebo
control group of subjects who receive fake vitamin C—thereby controlling for the
“placebo effect,” a self-induced improvement in performance caused by the subject’s
awareness of being treated in a special way. <br>
![image.png](attachment:2575106a-76a8-4377-955a-446aa03a393e.png)

# Four Possible Outcomes
Table 11.2 summarizes the four possible outcomes of any hypothesis test. Before testing a hypothesis, we must be concerned about all four possible outcomes because we
don’t know whether $H_0$ is true or false—that’s why we’re testing the hypothesis. If,
unknown to us, $H_0$ really is true, a well-designed hypothesis test will tend to confirm
this fact; that is, it will cause us to retain $H_0$ and conclude that $H_0$ could be true. To
conclude otherwise, as is always a slight possibility, reflects a type I error. On the
other hand, if, unknown to us, $H_0$ really is seriously false, a well-designed hypothesis
test also will tend to confirm this fact; that is, it will cause us to reject $H_0$ and conclude
that $H_0$ is false. To conclude otherwise, as is always a slight possibility, reflects a type
II error.<br>
## Four Possible Outcomes of the Vitamin C Experiment
It’s instructive to describe the four possible outcomes in Table 11.2 in terms of the
vitamin C experiment.
## Type 1 Error :- Rejecting a true null hypothesis.
## Type 2 Error :- Retaining a false null hypothesis.
1. If $H_0$ really is true (because vitamin C does not cause an increase in the population mean IQ), then it is a correct decision to retain the true $H_0$ . In this case, we would conclude correctly that there is no evidence that vitamin C increases IQ.
2. If $H_0$ really is true, then it is a type I error to reject the true $H_0$ and conclude that vitamin C increases IQ when, in fact, it doesn’t. Type I errors are sometimes called false alarms because, as with their firehouse counterparts, they trigger wild goose chases after something that does not exist. For instance, a type I error might encourage a batch of worthless experimental efforts to discover precisely what dosage of vitamin C maximizes the nonexistent “increase” in IQ.
3. If $H_0$ really is false (because vitamin C really causes an increase in the population mean IQ), then it is a type II error to retain the false $H_0$ and conclude that there is no evidence that vitamin C increases IQ when, in fact, it does. Type II errors are sometimes called misses because they fail to detect a potentially important relationship, such as that between vitamin C and IQ.
4. If $H_0$ really is false, then it is a correct decision to reject the false $H_0$ and conclude that vitamin C increases IQ.<br>

## Importance of Null Hypothesis
Refer to Table 11.2 when, as in the following exercise, you must describe the four
possible outcomes for a particular hypothesis test. To avoid confusing the type I and II
errors, first identify the null hypothesis, $H_0$ . Typically, the null hypothesis asserts that
there is no effect, thereby contradicting the research hypothesis. In the present case,
contrary to the research hypothesis, the null hypothesis ($H_0 : μ ≤ 100$) assumes that
vitamin C has no positive effect on IQ.
## Decisions Usually Are Correct
When generalizing beyond existing observations, there is always the possibility of
a type I or type II error, and we never can be absolutely certain of having made the
correct decision. At best, we can use a test procedure that usually produces a correct
decision when $H_0$ is either true or seriously false. This claim will be examined in the
context of the vitamin C experiment, assuming first that $H_0$ really is true and then that
$H_0$ really is false. Although you might view this approach as hopelessly theoretical,
since we never know whether $H_0$ really is true or false.<br>
![image.png](attachment:88162370-44a7-4433-a7f3-b379259fbc51.png)

# If $H_0$ is really True
Assume that $H_0$ really is true because vitamin C doesn’t increase the population mean
IQ. In this case, we need be concerned only about either retaining or rejecting a true $H_0$
(the two leftmost outcomes in Table 11.2).It’s instructive to view these two possible
outcomes in terms of the sampling distribution in Figure 11.3. Centered about a value
of 100, the hypothesized sampling distribution in Figure 11.3 reflects the properties
of the projected one-tailed test for vitamin C. If $H_0$ really is true—and this is a crucial
point—the hypothesized sampling distribution also can be viewed as the true sampling
distribution (from which the one observed sample mean actually originates). There-
fore, the one observed sample mean (or z) in the experiment can be viewed as being
randomly selected from the hypothesized distribution.<br>
![image.png](attachment:dc8a2ba7-b50d-4d60-9afe-529d904361be.png)<br>
## Alpha ($\alpha$) :- The probability of a type I error, that is, the probability of rejecting a true null hypothesis.
## Probability of a Type I Error:-
When, just by chance, a randomly selected sample mean originates from the small,
shaded portion of the sampling distribution in Figure 11.3, its z value equals or exceeds
1.65, and hence $H_0$ is rejected. Because $H_0$ really is true, this is an incorrect decision or
type I error—a false alarm, announced as evidence that vitamin C increases IQ, even
though it really does not. The probability of a type I error equals alpha (α), the level of
significance. (The level of significance, remember, indicates the proportion of the total
area of the sampling distribution in the rejection region for $H_0$ .) In the present case, the
probability of a type I error equals .05, as indicated in Figure 11.3.
## Probability of a Correct Decision:-
When, just by chance, a randomly selected sample mean originates from the large
white portion of the sampling distribution in Figure 11.3, its z value is less than 1.65
and $H_0$ is retained. Because $H_0$ really is true, this is a correct decision—announced as
a lack of evidence that vitamin C increases IQ. The probability of a correct decision
equals 1 − α, that is, .95.<br>
## Reducing the Probability of a Type I Error
If $H_0$ really is true, the present test will produce a correct decision with a probability of .95 and a type I error with a probability of .05.* If a false alarm has serious
consequences, the probability of a type I error can be reduced to .01 or even to .001
simply by using the .01 or .001 level of significance, respectively. One of these levels
of significance might be preferred for the vitamin C test if, for instance, a false alarm
could cause the adoption of an expensive program to supply worthless vitamin C to
all students in the district and, perhaps, the creation of an accelerated curriculum to
accommodate the fictitious increase in intellectual aptitude.
## True $H_0$ Usually Retained:- If $H_0$ really is true, the probability of a type I error, α, equals the level of significance, and the probability of a correct decision equals 1 − α.
Because values of .05 or less are usually selected for α, we can conclude that if $H_0$ really is true, correct decisions will occur much more frequently than will type I errors.
## Reminder: If $H_0$ is true and an error is committed, it must be a type I error.<br>
#### *Strictly speaking, if $H_0$ : μ ≤ 100 really is true, the true sampling distribution also could be centered about some value less than 100, in the direction of no concern. In this case, the consequences of the hypothesis test would be even more favorable than suggested. Essentially, because the true sampling distribution would be shifted to the left of the one shown in Figure 11.3, while everything else remains the same, the type I error would have a smaller probability than .05, and a correct decision would have a larger probability than .95.

# If $H_0$ is Really False Because of a Large Effect
Next, assume that $H_0$ really is false because vitamin C increases the population mean
by not just a few points, but by many points—for example, by ten points. Using the
vocabulary of most investigators, we also could describe this increase as a “ten-point effect,” since any difference between a true and a hypothesized population mean is
referred to as an effect. If $H_0$ really is false, because of the relatively large ten-point
effect of vitamin C on IQ, we need be concerned only about either retaining or rejecting a false $H_0$ (the two rightmost outcomes in Table 11.2). Let’s view each of these two
possible outcomes in terms of the sampling distributions in Figure 11.4.<br>
![image.png](attachment:de9f3867-cb58-4177-a592-871c17fd353d.png)<br>
## Effect:- Any difference between a true and a hypothesized population mean.
## Hypothesized Sampling Distribution:- Centered about the hypothesized population mean, this distribution is used to generate the decision rule.
It is essential to distinguish between the hypothesized sampling distribution and
the true sampling distribution shown in Figure 11.4. Centered about the hypothesized
population mean of 100, the hypothesized sampling distribution serves as the parent distribution for the familiar decision rule with a critical z of 1.65 for the projected
one-tailed test. Once the decision rule has been identified, attention shifts from the
hypothesized sampling distribution to the true sampling distribution.
## True Sampling Distribution:- Centered about the true population mean, this distribution produces the one observed mean (or z).
Centered about the true population mean of 110 (which reflects the ten-point effect,
that is, 100 + 10 = 110), the true sampling distribution serves as the parent distribution for the one randomly selected sample mean (or z) that will be observed in the
experiment. Viewed relative to the decision rule (based on the hypothesized sampling
distribution), the one randomly selected sample mean (originating from the true sampling distribution) dictates whether we retain or reject the false $H_0$ .
## Beta ( $\beta$ ) :- The probability of a type II error, that is, the probability of retaining a false null hypothesis.
## Low Probability of a Type II Error for a Large Effect:-
When, just by chance, a randomly selected sample mean originates from the very
small black portion of the true sampling distribution of the mean, its z value is
less than 1.65, and therefore, in compliance with the decision rule, $H_0$ is retained.
Because $H_0$ really is false, this is an incorrect decision or type II error—a miss,
announced as a lack of evidence that vitamin C increases IQ, even though, in fact,
it does. With the aid of tables for the normal curve, it can be demonstrated that in
the present case, the probability of a type II error, symbolized by the Greek letter
beta ( $\beta$ ), equals .01.<br>
## High Probability of a Correct Decision for a Large Effect:-
When, just by chance, a sample mean originates from the large shaded portion of
the true sampling distribution, its z value equals or exceeds 1.65, and $H_0$ is rejected.
Because $H_0$ really is false, this is a correct decision—announced as evidence that vitamin C increases IQ. In the present case, the probability of a correct decision, symbolized as 1 − β, equals 0.99.
## Review
If $H_0$ really is false, because vitamin C has a large ten-point effect on the population
mean IQ, the projected one-tailed test will do quite well. There is a high probability of
.99 that a correct decision will be made and a probability of only .01 that a type II error
will be committed. This conclusion, when combined with that for the previous section,
justifies the earlier claim that hypothesis tests tend to produce correct decisions when
either $H_0$ really is true or $H_0$ really is false because of a large effect.

# If $H_0$ is Really False Because of A Small Effect
The projected hypothesis test does not fare nearly as well if $H_0$ really is false because
vitamin C increases the population mean IQ by only a few points—for example, by
only three points. Once again, as indicated in Figure 11.5, there are two different
distributions of sample means: the hypothesized sampling distribution centered about
the hypothesized population mean of 100 and the true sampling distribution centered
about the true population mean of 103 (which reflects the three-point effect, that is,
100 + 3 = 103). After the decision rule has been constructed with the aid of the
hypothesized sampling distribution, attention shifts to the true sampling distribution
from which the one randomly selected sample mean actually will originate.<br>
![image.png](attachment:ad4bf980-b8db-4356-b5ec-c5c72a4f7bc4.png)<br>
## Low Probability of a Correct Decision for a Small Effect
Viewed relative to the decision rule, the true sampling distribution supplies two
types of randomly selected sample means: those that produce a type II error because
they originate from the black sector and those that produce a correct decision because
they originate from the shaded sector. Because of the small three-point effect, the true
and hypothesized population means are much closer in Figure 11.5 than in Figure 11.4.
As a result, the entire true sampling distribution in Figure 11.5 is shifted toward the
retention region for the false $H_0$ , and proportionately more of this distribution is black.
Now the projected one-tailed test performs more poorly; there is a fairly high prob-
ability of .67 that a type II error will be committed and a low probability of .33 that the
correct decision will be made. (Remember, you need not determine these normal curve
probabilities to understand the argument.)
## Rejection of False $H_0$ Depends on Size of Effect:-
### If $H_0$ really is false, the probability of a type II error, β, and the probability of a correct decision, 1 − ß, depend on the size of the effect, that is, the difference between the true and the hypothesized population means. The smaller the effect, the higher the probability of a type II error and the lower the probability of a correct decision.
If you think about it, this conclusion is not particularly surprising. If $H_0$ really is
false, there must be some effect. The smaller this effect is, the less likely that it will be
detected (by correctly rejecting the false $H_0$ ) and the more likely that it will be missed
(by erroneously retaining the false $H_0$ ). As will be described in the next section, if it’s
important to detect even a relatively small effect, the probability of a correct decision
can be raised to any desired value by increasing the sample size.

# Influence of Sample Size
Ordinarily, the investigator might not be too concerned about the low detection rate of
.33 for the relatively small three-point effect of vitamin C on IQ. Under special circumstances, however, this low detection rate might be unacceptable. For example, previous
experimentation might have established that vitamin C has many positive effects,
including the reduction in the duration and severity of common colds, and no apparent
negative side effects.* Furthermore, huge quantities of vitamin C might be available at
no cost to the school district. The establishment of one more positive effect, even a
fairly mild one such as a small increase in the population mean IQ, might clinch the
case for supplying vitamin C to all students in the district. The investigator, therefore,
might wish to use a test procedure for which, if $H_0$ really is false because of a small
effect, the detection rate is appreciably higher than .33.
## To increase the probability of detecting a false $H_0$ , increase the sample size.
Assuming that vitamin C still has only a small three-point effect on IQ, we can
check the properties of the projected one-tailed test when the sample size is increased
from 36 to 100 students.
$$ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} $$ 
For the original experiment with its sample size of 36,
$$ \sigma_{\bar{x}} = \frac{15}{\sqrt{36}} = 2.5$$ 
whereas for the new experiment with its sample size of 100,
$$ \sigma_{\bar{x}} = \frac{15}{\sqrt{100}} = 1.5$$ 
Clearly, any increase in sample size causes a reduction in the standard error of the mean.<br>
![image.png](attachment:e7046f47-90f8-4a15-802c-7c0112abfba4.png)<br>
## Consequences of Reducing Standard Error
As can be seen by comparing Figure 11.5 and Figure 11.6, the reduction of the standard error from 2.5 to 1.5 has two important consequences:
1. It shrinks the upper retention region back toward the hypothesized population mean of 100.
2. It shrinks the entire true sampling distribution toward the true population mean of 103.<br>
The net result is that, among randomly selected sample means for 100 students,
fewer sample means (.36) produce a type II error because they originate from the black
sector, and more sample means (.64) produce a correct decision—that is, more lead to
the detection of a false $H_0$ —because they originate from the shaded sector.
An obvious implication is that the standard error can be reduced to any desired
value merely by increasing the sample size. To cite an extreme case, when the sample
size equals 10,000 students (!), the standard error drops to 0.15. In this case, the upper
retention region shrinks to the immediate vicinity of the hypothesized population mean
of 100, and the entire true sampling distribution of the mean shrinks to the immediate
vicinity of the true population mean of 103. The net result is that a type II error hardly
ever is committed, and the small three-point effect virtually always is detected.
## Samples Can Be Too Large:-
At this point, you might think that the sample size always should be as large as
possible in order to maximize the detection of a false $H_0$ . Not so. An excessively large
sample size produces an extra-sensitive hypothesis test that detects even a very small
effect that, from almost any perspective, lacks importance. For example, an exces-
sively large sample size could cause $H_0$ to be rejected, even though vitamin C actually
increases the population mean IQ by only 1 / 2 point. Since from almost any perspective this very small effect lacks importance, most investigators would just as soon miss it;
that is, most would just as soon retain this false $H_0$ . Thus, before an experiment, a wise
investigator attempts to select a sample size that, because it is not excessively large,
minimizes the detection of a small, unimportant effect.
## Samples Can Be Too Small
On the other hand, the sample size can be too small. An unduly small sample size
will produce an insensitive hypothesis test (with a large standard error) that will miss
even a very large, important effect. For example, an unduly small sample size can cause
$H_0$ to be retained, even though vitamin C actually increases the population mean IQ by
15 points. Before an experiment, a wise investigator also attempts to select a sample size
that, because it is not unduly small, maximizes the detection of a large, important effect.
## Neither Too Large Nor Too Small
For the purposes of most investigators, a sample size of hundreds is excessively large
and one of less than about five is unduly small. There remains, of course, considerable
latitude for sample size selection between these rough extremities. Statistics supplies
investigators with charts, often referred to as <b>power curves, to help select the appropriate
sample size for a particular experiment.

# Power and Sample Size
## Power ( $1 − \beta$ ):- The probability of detecting a particular effect.
The power of a hypothesis test equals the probability (1 − ß) of detecting a particular
effect when the null hypothesis ($H_0$ ) is false. Power is simply the complement (1 − ß)
of the probability (ß) of failing to detect the effect, that is, the complement of the probability of a type II error. The shaded sectors in Figures 11.4, 11.5, and 11.6 illustrate
varying degrees of power.<br>
In Figures 11.5 and 11.6, sample sizes of 36 and 100 were selected, with computational convenience in mind, to dramatize different degrees of power for a small
three-point effect of vitamin C on IQ. Preferably, the selection of sample size should
reflect—as much as circumstances permit—your considered judgment about what
constitutes (1) the smallest important effect and (2) a reasonable degree of power for
detecting that effect. For example, the following considerations might influence the
selection of a new sample size for the vitamin C study.
1. The smallest effect that merits detection, we might conclude, equals seven points. This might reflect our judgment, possibly supported by educational consultants, that only a mean IQ of at least 107 for all students in the school district justifies the effort and expense of upgrading the entire curriculum. Another possible reason for focusing on a seven-point effect—in the absence of any compelling reason to the contrary—might be that, since 7 is about one-half the size of the standard deviation of 15, it avoids extreme effect sizes by qualifying as a “medium” effect size, according to Jacob Cohen’s widely adopted guidelines described in Chapter 14.
2. A reasonable degree of power for this seven-point effect, we might conclude, equals .80. This degree of power will detect the specified effect with a tolerable rate of eighty times out of one hundred. In the absence of special concerns about the type II error, many investigators would choose .80 as a default value for power—along with .05 as the default value for the level of significance—to avoid the large sample sizes required by high degrees of power, such as .95 or .99.

## Power Curve:- Shows how the likelihood of detecting any possible effect varies for a fixed sample size.
### The use of power curves represents a distinct improvement over the arbitrary selection of sample size, for power curves help identify a sample size that, being neither unduly small nor excessively large, produces a hypothesis test with the proper sensitivity.
