# T-Test Problem Statements

## Testing the mean of a single population

### Problem Statement
User
It is claimed that sports-car owners drive on the average 18580 kms per year. A consumer firm believes that the average milage is probably higher. To check, the consumer firm obtained information from randomly selected 10 sports-car owners that resulted in a sample mean of 17352 kms with a sample standard deviation of 2012 kms. What can be concluded about this claim at
(a) 5% level of significance 
(b) 1% level of significance

### Solution
To test the claim about the average mileage of sports-car owners, we can conduct a one-sample t-test. Here's how we can approach the problem:

Given:
- Population mean ($ \mu $) = 18580 kms
- Sample mean ($ \bar{X} $) = 17352 kms
- Sample standard deviation ($ s $) = 2012 kms
- Sample size ($ n $) = 10

We'll use the formula for the t-test:

$ t = \frac{{\bar{X} - \mu}}{{\frac{s}{\sqrt{n}}}} $

### (a) At 5% level of significance:
For a one-tailed test (since the consumer firm believes the average mileage is higher), we'll find the critical value from the t-distribution table with 9 degrees of freedom (since $ n - 1 = 10 - 1 = 9 $) and a 5% significance level.

$ \text{Critical value} = t_{\alpha, df} $

Let's calculate the t-value:

$ t = \frac{{17352 - 18580}}{{\frac{2012}{\sqrt{10}}}} $

$ t \approx \frac{{-1228}}{{\frac{2012}{\sqrt{10}}}} $

$ t \approx \frac{{-1228}}{{\frac{2012}{\sqrt{10}}}} $

$ t \approx -4.36 $

Since it's a one-tailed test and we're testing if the mean is higher, we compare the calculated t-value with the critical value:

- If $ t \leq t_{\alpha, df} $, we reject the null hypothesis.
- If $ t > t_{\alpha, df} $, we fail to reject the null hypothesis.

From the t-distribution table, $ t_{0.05, 9} \approx 1.833 $.

Since $ -4.36 < -1.833 $, we reject the null hypothesis.

### (b) At 1% level of significance:
Similarly, for a one-tailed test, we'll find the critical value from the t-distribution table with 9 degrees of freedom and a 1% significance level.

$ \text{Critical value} = t_{\alpha, df} $

Calculating the t-value:

$ t = \frac{{17352 - 18580}}{{\frac{2012}{\sqrt{10}}}} $

$ t \approx \frac{{-1228}}{{\frac{2012}{\sqrt{10}}}} $

$ t \approx -4.36 $

From the t-distribution table, $ t_{0.01, 9} \approx 2.821 $.

Since $ -4.36 < -2.821 $, we reject the null hypothesis.

### Conclusion:
In both cases (at 5% and 1% significance levels), we reject the null hypothesis. Therefore, we have sufficient evidence to conclude that the consumer firm's claim is justified: the average mileage of sports-car owners is likely higher than 18580 kms per year.

## Testing the deference between means of two population (µ1 – µ2)

### Problem Statement

The manager of a courier service believes that packets delivered at the beginning of the month are heavier than those delivered at the end of  month. As an experiment, he weighed a random sample of 15 packets at the beginning of the month and found that the mean weight was 5.25 kg. A randomly selected 10 packets at the end of the month had a mean weight of 4.56 kg. It was observed from the past experience that the sample variances are 1.20 kg and 1.15 kg. At 5% level of significance, can it be concluded that the packets delivered at the beginning of the month weigh more? Also find P-value and 95% confidence interval for the difference between the means. 

To test whether the packets delivered at the beginning of the month weigh more than those delivered at the end of the month, we can use a two-sample t-test for independent samples. 

Given:
- Sample size at the beginning of the month ($ n_1 $) = 15
- Sample size at the end of the month ($ n_2 $) = 10
- Sample mean weight at the beginning of the month ($ \bar{X}_1 $) = 5.25 kg
- Sample mean weight at the end of the month ($ \bar{X}_2 $) = 4.56 kg
- Sample variances: $ s_1^2 = 1.20 $ kg and $ s_2^2 = 1.15 $ kg
- Level of significance ($ \alpha $) = 0.05

First, let's calculate the pooled standard deviation ($ s_p $):

$ s_p = \sqrt{\frac{{(n_1 - 1) \cdot s_1^2 + (n_2 - 1) \cdot s_2^2}}{{n_1 + n_2 - 2}}} $

$ s_p = \sqrt{\frac{{(15 - 1) \cdot 1.20 + (10 - 1) \cdot 1.15}}{{15 + 10 - 2}}} $

$ s_p = \sqrt{\frac{{14 \cdot 1.20 + 9 \cdot 1.15}}{{23}}} $

$ s_p = \sqrt{\frac{{16.8 + 10.35}}{{23}}} $

$ s_p = \sqrt{\frac{{27.15}}{{23}}} $

$ s_p \approx \sqrt{1.1793478} $

$ s_p \approx 1.086 $

Now, let's calculate the t-value:

$ t = \frac{{\bar{X}_1 - \bar{X}_2}}{{s_p \cdot \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}} $

$ t = \frac{{5.25 - 4.56}}{{1.086 \cdot \sqrt{\frac{1}{15} + \frac{1}{10}}}} $

$ t = \frac{{0.69}}{{1.086 \cdot \sqrt{0.06667 + 0.1}}} $

$ t \approx \frac{{0.69}}{{1.086 \cdot \sqrt{0.16667}}} $

$ t \approx \frac{{0.69}}{{1.086 \cdot 0.408}} $

$ t \approx \frac{{0.69}}{{0.443}} $

$ t \approx 1.556 $

Next, we need to find the critical value from the t-distribution table with $ n_1 + n_2 - 2 = 15 + 10 - 2 = 23 $ degrees of freedom at a 5% level of significance.

For a two-tailed test, we divide the level of significance by 2, so $ \alpha = 0.05 / 2 = 0.025 $.

From the t-distribution table, $ t_{\alpha, df} = t_{0.025, 23} \approx \pm 2.069 $.

Since $ |1.556| < 2.069 $, we fail to reject the null hypothesis.

### Conclusion:
At a 5% level of significance, we do not have enough evidence to conclude that the packets delivered at the beginning of the month weigh more than those delivered at the end of the month.

To find the p-value and the 95% confidence interval for the difference between the means, we can use statistical software or online calculators. These tools will provide us with the exact values.

## Testing the deference between means of paired observations

### Problem Statement:

The Human Resources Development (HRD) manager wishes to investigate whether there has been any change in the aptitude of trainees after undergoing a specific training program. To assess this, trainees take an aptitude test both before and after the training program. The HRD manager wants to determine if there is a significant difference in the test scores before and after the training program.

The data provided includes the test scores of nine trainees before and after the training program:

| Subject | Before (x) | After (y) |
|---------|------------|-----------|
| 1       | 75         | 70        |
| 2       | 70         | 77        |
| 3       | 46         | 57        |
| 4       | 68         | 60        |
| 5       | 68         | 79        |
| 6       | 43         | 64        |
| 7       | 55         | 55        |
| 8       | 68         | 77        |
| 9       | 77         | 76        |

The HRD manager wants to perform a hypothesis test at a 5% level of significance to determine if there is a significant difference in the test scores after the training program compared to before.

### Solution 

To determine if there has been any significant change in the ability of trainees after the training program, we can use a paired samples t-test. This test will help us assess whether there is a statistically significant difference between the mean scores before and after the training program.

Let's start by calculating the differences between the scores before and after the training program for each subject:

| Subject | Before (x) | After (y) | Difference (y - x) |
|---------|------------|-----------|--------------------|
| 1       | 75         | 70        | -5                 |
| 2       | 70         | 77        | 7                  |
| 3       | 46         | 57        | 11                 |
| 4       | 68         | 60        | -8                 |
| 5       | 68         | 79        | 11                 |
| 6       | 43         | 64        | 21                 |
| 7       | 55         | 55        | 0                  |
| 8       | 68         | 77        | 9                  |
| 9       | 77         | 76        | -1                 |

First, let's compute $ \bar{d} $, the mean difference:

$ \bar{d} = \frac{{\sum (y - x)}}{n} $

$ \bar{d} = \frac{{(-5 + 7 + 11 - 8 + 11 + 21 + 0 + 9 - 1)}}{9} $

$ \bar{d} = \frac{{43}}{9} $

$ \bar{d} \approx 4.78 $

Next, let's calculate $ s_d $, the standard deviation of the differences:

$ s_d = \sqrt{\frac{{\sum (y - x - \bar{d})^2}}{n - 1}} $

$ s_d = \sqrt{\frac{{(-5 - 4.78)^2 + (7 - 4.78)^2 + (11 - 4.78)^2 + (-8 - 4.78)^2 + (11 - 4.78)^2 + (21 - 4.78)^2 + (0 - 4.78)^2 + (9 - 4.78)^2 + (-1 - 4.78)^2}}{9 - 1}} $

$ s_d = \sqrt{\frac{{37.8328 + 4.6228 + 37.8328 + 203.0832 + 37.8328 + 244.2232 + 24.3268 + 23.3928 + 51.3308}}{8}} $

$ s_d = \sqrt{\frac{{664.4372}}{8}} $

$ s_d \approx \sqrt{83.05465} $

$ s_d \approx 9.11 $

Now, let's calculate the t-value using the formula:

$ t = \frac{{\bar{d}}}{{\frac{s_d}{\sqrt{n}}}} $

$ t = \frac{{4.78}}{{\frac{9.11}{\sqrt{9}}}} $

$ t = \frac{{4.78}}{{\frac{9.11}{3}}} $

$ t = \frac{{4.78}}{{3.037}} $

$ t \approx 1.57 $

Now, we need to find the critical value from the t-distribution table with 8 degrees of freedom (since $ n = 9 $) at a 5% level of significance.

Looking up in the table, the critical value is approximately 2.306.

![Table_t-distribution](images/Table_t-distribution.png)

Since the calculated t-value (1.57) is less than the critical value (2.306), we fail to reject the null hypothesis.

Conclusion: There is insufficient evidence to conclude that there is a significant difference in the test scores after the training program at a 5% level of significance.