# Preview
## Whenever, as usually is the case, the population standard deviation is unknown, it must be estimated with the sample standard deviation. Estimating the unknown population standard deviation has important implications that require both the use of degrees of freedom and the replacement of the z test with the t test.

# Gas Mileage Investigation
Federal law might eventually specify that new automobiles must average, for example,
45 miles per gallon (mpg) of gasoline. Because it’s impossible to test all new cars,
compliance tests would be based on random samples from the entire production of each
car model. If a hypothesis test indicates substandard performance, the manufacturer
would be penalized, we’ll assume, USD 200 per car for the entire production.<br>
In these tests, the null hypothesis states that, with respect to the mandated mean
of 45 mpg, nothing special is happening in the population for some car model—that
is, there is no substandard performance and the population mean equals or exceeds
45 mpg. The alternative hypothesis reflects a concern that the population mean is less
than 45 mpg. Symbolically, the two statistical hypotheses read: $$ H_0: \mu \ge 45 $$ $$ H_1 : \mu < 45$$
From the manufacturer’s perspective, a type I error (a stiff penalty, even though the car
complies with the standard) is very serious. Accordingly, to control the type I error,
let’s use the .01 instead of the customary .05 level of significance. From the federal
regulator’s perspective, a type II error (not penalizing the manufacturer even though
the car fails to comply with the standard) also is serious. In practice, a sample size
should be selected.To control the type II error, that is, to
ensure a reasonable detection rate for the smallest decline (judged to be important) of
the true population mean below the mandated 45 mpg. To simplify computations in the
present example, however, the projected one-tailed test is based on data from a very
small sample of only six randomly selected cars.<br>
For reasons that will become apparent, the z test must be replaced by a new hypothesis test, the t test.

# Sampling Distribution of t
### Like the sampling distribution of z, the sampling distribution of t represents the distribution that would be obtained if a value of t were calculated for each sample mean for all possible random samples of a given size from some population.
In the early 1900s,
William Gosset discovered the sampling distribution of t and subsequently reported his
achievement under the pen name of “Student.” Actually, Gosset discovered not just one
but an entire family of t sampling distributions (or “Student’s” distributions).
### Each t distribution is associated with a special number referred to as degrees of freedom.
The concept of degrees of freedom is introduced because we’re
using variability in a sample to estimate the unknown variability in the population.
Recall that when the n deviations about the sample mean are used to estimate variability
in the population, only n − 1 are free to vary because of the restriction that the sum of
these deviations must always equal zero. Since one degree of freedom is lost because of
the zero-sum restriction, there are only n − 1 degrees of freedom, that is, symbolically, $$ df = n -1 $$
where df represents degrees of freedom and n equals the sample size. Since the gas
mileage investigation involves six cars, the corresponding t test is based on a sampling
distribution with five degrees of freedom (from df = 6 − 1).
### Reminder:- Degrees of freedom (df) refers to when, for example, sample variability is used to estimate the unknown population variability.the number of values free to vary.

## Compared to the Standard Normal Distribution
Figure 13.1 shows three t distributions. When there is an infinite (∞) number of
degrees of freedom (and, therefore, the sample standard deviation becomes the same as
the population standard deviation), the distribution of t is the same as the standard nor-
mal distribution of z. Notice that even with only four or ten degrees of freedom, a t dis-
tribution shares a number of properties with the normal distribution. All t distributions
are symmetrical, unimodal, and bell-shaped, with a dense concentration that peaks in
the middle (when t equals 0) and tapers off both to the right and left of the middle (as t
becomes more positive or negative, respectively).
![image.png](attachment:f8cbc5ce-0c00-4a8e-94ac-67b1c5f6a3b5.png)
### The inflated tails of the t distribution, particularly apparent with small values of df, constitute the most important difference between t and z distributions.
## Table for t Distributions:-
To save space, tables for t distributions concentrate only on the critical values of t that correspond to the more common levels of significance.Table B lists the critical t values for either one- or two-tailed hypothesis tests at the .05, .01, and .001 levels of significance. All listed critical t values are positive and originate from the upper half of each distribution. Because of the symmetry of the t distribution, you can obtain the corresponding critical t values for the lower half of each distribution merely by placing a negative sign in front of any entry in the table.
## Finding Critical t Values:-
1. To find a critical t in Table B, read the entry in the cell intersected by the row for the correct number of degrees of freedom and the column for the test specifications.
2. For example, to find the critical t for the gas mileage investigation, first go to the right hand panel for a one-tailed test, then locate both the row corresponding to five degrees of freedom and the column for a one-tailed test at the .01 level of significance. The intersected cell specifies 3.365. A negative sign must be placed in front of 3.365, since the hypothesis test requires the lower tail to be critical. Thus, –3.365 is the critical t for the gas mileage investigation, and the corresponding decision rule is illustrated in Figure 13.2, where the distribution of t is centered about zero (the equivalent value of t for the original null hypothesized value of 45 mpg).
3. If the gas mileage investigation had involved a two-tailed test (still at the .01 level with five degrees of freedom), then the left-hand panel for a two-tailed test would have been appropriate, and the intersected cell would have specified 4.032. Both positive and negative signs would have to be placed in front of 4.032, since both tails are critical. In this case, 4.032 would have been the pair of critical t values.<br>
## Missing df in Table B
If the desired number of degrees of freedom doesn’t appear in the df column of
Table B, use the row in the table with the next smallest number of degrees of freedom.
For example, if 36 degrees of freedom are specified, use the information from the
row for 30 degrees of freedom. Always rounding off to the next smallest df produces
a slightly larger critical t, making the null hypothesis slightly more difficult to reject.
This procedure defuses potential disputes about borderline decisions by investigators
with a stake in rejecting the null hypothesis.

![image.png](attachment:72b48834-efdd-44e4-863b-dc6524a59ef6.png)

# t Test (Equation 13.2)
## t ratio :- A replacement for the z ratio whenever the unknown population standard deviation must be estimated.
Usually, as in the gas mileage investigation, <b>the population standard deviation is
unknown and must be estimated from the sample.</b> The subsequent shift from the standard error of the mean, $ \sigma_{\bar{X}}$ , to its estimate, $s_{\bar{X}}$ , has an important effect on the entire
hypothesis test for a population mean. The familiar z test, $$ z = \frac{sample \ mean \ hypothesized \ population \ mean}{standard \ error} = \frac{\bar{X}-\mu_{hyp}}{\sigma_{\bar{X}}}$$
with its normal distribution, must be replaced by a new t test, $$ t = \frac{sample \ mean \ hypothesized \ population \ mean}{estimated \ standard \ error} = \frac{\bar{X}-\mu_{hyp}}{S_{\bar{X}}}$$
with its t sampling distribution and n − 1 degrees of freedom. For the gas mileage
investigation, given that the sample mean gas mileage, $\bar{X}$, equals 43; that the hypothesized population mean, $\mu_{hyp}$ , equals 45; and that the estimated standard error, $s_{\bar{X}}$ , equals
0.89 (from Table 13.1), Formula 13.2 becomes $$ t = \frac{43 - 45}{0.89} = -2.25 $$
with df = 5. Since the observed value of t (–2.25) is less negative than the critical value
of t (–3.365), the null hypothesis is retained, and we can conclude that the auto manufacturer shouldn’t be penalized since the mean gas mileage for the population cars could equal the mandated 45 mpg.
## Greater Variability of t Ratio
As has been noted, the tails of the sampling distribution for t are more inflated than
those for z, particularly when the sample size is small.* Consequently, to accommodate
the greater variability of t, the critical t value must be larger than the corresponding critical z value. For example, given the one-tailed test at the .01 level of significance for the
gas mileage investigation, the critical value for t (–3.365) is larger than that for z (–2.33).

![image.png](attachment:ebed0610-9764-4cea-8d89-61f8684fefe2.png)

# Common Theme for Hypothesis Testing
The remainder of this book discusses an alphabet variety of tests—z, t, F, U, T, and
H—for an assortment of situations. Notwithstanding the new formulas with their special symbols,
## all of these hypothesis tests represent variations on the same theme: If some observed characteristic, such as the mean for a random sample, qualifies as a rare outcome under the null hypothesis, the hypothesis will be rejected. Otherwise, the hypothesis will be retained.

# Details: Estimating The Standard Error($ S_{\bar{X}}$)
If the population standard deviation is unknown, it must be estimated from the sample.
This seemingly minor complication has important implications for hypothesis test-
ing—indeed, it is the reason why the z test must be replaced by the t test. Now s
replaces σ in the formula for the standard error of the mean. Instead of $$ \sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$$
we have, 
### Estimated Standard Error of the Mean -- 13.3
$$ s_{\bar{X}} = \frac{s}{\sqrt{n}}$$
where $s_{\bar{X}}$ represents the estimated standard error of the mean; n equals the sample size; and s has been defined as
$$ s = \sqrt{\frac{SS}{n-1}} = \sqrt{\frac{SS}{df}}$$
where s is the sample standard deviation; df refers to the degrees of freedom; and SS
has been defined as $$ SS = \Sigma{(X - \bar{X})^2} = \Sigma{X^2} - \frac{(\Sigma{X})^2}{n}$$
## This new version of the standard error, the estimated standard error of the mean, is used whenever the unknown population standard deviation must be estimated.

# Details: Calculating t test
The three panels in Table 13.1 show the computational steps that produce a t of –2.25 for the gas mileage investigation.
## Panel 1
This panel involves most of the computational labor, and it generates values for the sample mean, X, and the sample standard deviation, s. The sample standard deviation is obtained to calculate the sum of squares,
$$ SS = \Sigma{X^2} - \frac{(\Sigma{X})^2}{n}$$
and after dividing the sum of squares, SS, by its degrees of freedom, n − 1, extracting the square root.
## Reminder:- Replace n with n − 1 only when dividing SS to obtain s.
## Panel 2
Dividing the sample standard deviation, s, by the square root of the sample size, n,
gives the value for the estimated standard error, $s_{\bar{X}}$ .
## Panel 3
Finally, dividing the difference between the sample mean, $\bar{X}$, and the null hypothesized value, $\mu_{hyp}$ , by the estimated standard error, $s_{\bar{X}}$, yields the value of the t ratio.<br>
![image.png](attachment:dc9b551e-56cd-45ad-82b5-b9eebb8c77bb.png)<br>

# Confidence Intervals For $\mu$ Based On t
Under slightly different circumstances, you might wish to estimate the unknown mean
gas mileage for the population of cars, rather than test a hypothesis based on 45 mpg.
For example, there might be no legally required minimum of 45 mpg, but merely a
desire on the part of the manufacturer to estimate the mean gas mileage for a population of cars—possibly as a first step toward the design of a new, improved version of
the current model.
When the population standard deviation is unknown and, therefore, must be estimated, as in the present case, t replaces z in the new formula for a confidence interval:
### Equation 13.4
## $$ \bar{X} \pm( t_{conf} )( s_{\bar{X}})$$
## Finding $t_{conf}$
Read the entry from the cell intersected by the row for the correct number of degrees of
freedom and the column for the confidence specifications. In the present case, if a 95
percent confidence interval is desired, first locate the row corresponding to 5 degrees of
freedom (from df = n − 1 = 6 − 1 = 5), and then locate the column for the 95 percent
level of confidence, that is, the column heading identified with a single asterisk. (A
double asterisk identifies the column for the 99 percent level of confidence.) The intersected cell specifies that a value of 2.571 should be entered in Formula 13.4.<br>
Given this value for $t_{conf}$ , as well as the value of 43 for $\bar{X}$ (from Table 13.1), the
sample mean gas mileage, and 0.89 for $s_{\bar{X}}$ , the estimated standard error, Formula 13.3
becomes
$$ 43 \pm (2.571)(0.89) = 43 \pm 2.29 = 45.29 , 40.71$$
It can be claimed, with 95 percent confidence, that the interval between 40.71 and 45.29 includes the true mean gas mileage for all of the cars in the population.

# Assumptions
1. Whether testing hypotheses or constructing confidence intervals for population means, use t rather than z whenever, as almost always is the case, the population standard deviation is unknown.
2. Strictly speaking, when using t, you must assume that the underlying population is normally distributed. Even if this normality assumption is violated, t retains much of its accuracy as long as the sample size isn’t too small.
3. If a very small sample (less than about 10) is being used and you believe that the sample originates from a non-normal population—possibly because of a pronounced positive or negative skew among the observations in the sample—it would be wise to increase the size of
the sample before testing a hypothesis or constructing a confidence interval.