<h1 style="color:orange">EXERCISE CLASS 2 (Part 1/3)</h1>

# Review of basic statistical concepts - Hypothesis testing

### Chapter 3-4, D.C. Montgomery: "Statistical Quality Control - an introduction", 7th Ed., Wiley




# Statistical Inference

![Slide2.PNG](attachment:Slide2.PNG)

![Slide3.PNG](attachment:Slide3.PNG)

# EXERCISE T1

Given a sample of n independent and identically distributed observations, demonstrate that the sample mean $\bar{X}$ and the sample variance $S^2$ are unbiased estimators

Exercise T1 (solution) (1/2)

![Slide5.PNG](attachment:Slide5.PNG)

Exercise T1 (solution) (2/2)

![Slide6.PNG](attachment:Slide6.PNG)

# EXERCISE T2

A synthetic fiber used in manufacturing industry has an ultimate tensile strength that is normally distributed with mean 75.5 psi and standard deviation 3.5 psi. 

a) Compute the probability that a random sample of 6 observations has a sample mean larger than 75.75 psi.

b) How does the standard deviation of the mean estimator change by passing from a sample of 6 observations to a sample of 49 observations?

In [1]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

In [2]:
# Input data
mu = 75.5       # Mean
sigma = 3.5     # Standard deviation

## Point a

Compute the probability that a random sample of 6 observations has a sample mean larger than 75.75 psi.

$$\mu = 75.5$$
$$\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{3.5}{\sqrt{6}}$$
$$P(\bar{X} \geq \mu_0) = P(\frac{\bar{X} - \mu}{\sigma_{\bar{X}}} \geq \frac{\mu_0 - \mu}{\sigma_{\bar{X}}}) = P(Z \geq \frac{75.75 - 75.5}{1.429}) = 1 - P(Z \leq 0.175)$$

In [3]:
n = 6          # Number of samples
mu0 = 75.75    # Hypothesized mean

# Under the assumption of normality, the probability of observing a sample mean larger than mu0 is: 
Z_0 = (mu0 - mu)/(sigma/np.sqrt(n))
prob = 1 - stats.norm.cdf(Z_0)
print('The probability of observing a sample mean larger than mu0 is: %.3f' % prob)

The probability of observing a sample mean larger than mu0 is: 0.431


## Point b

How does the standard deviation of the mean estimator change by passing from a sample of 6 observations to a sample of 49 observations?

$$\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$$
$$\sigma_{\bar{X}}(n=6) = \frac{3.5}{\sqrt{6}} = 1.429$$
$$\sigma_{\bar{X}}(n=49) = \frac{3.5}{\sqrt{49}} = 0.5$$

In [4]:
n_new = 49      # Number of samples

sigma_n = sigma/np.sqrt(n)              # Standard deviation of the mean with n = 6 samples
sigma_n_new = sigma/np.sqrt(n_new)      # Standard deviation of the mean with n = 49 samples

print('The standard deviation of the mean with n = 6 samples is: %.3f psi' % sigma_n)
print('The standard deviation of the mean with n = 49 samples is: %.3f psi' % sigma_n_new)

print('The difference between the two standard deviations is: %.3f psi' % (sigma_n_new - sigma_n))

The standard deviation of the mean with n = 6 samples is: 1.429 psi
The standard deviation of the mean with n = 49 samples is: 0.500 psi
The difference between the two standard deviations is: -0.929 psi


# EXERCISE T3

A random sample of size 16 is drawn from a normal population with mean 75 and standard deviation 8. A second sample of size 9 is drawn from a normal population with mean 70 and standard deviation 12.

a) Compute the probability that the sample mean difference between the first and the second sample is greater than 4 (assume that the two populations are independent).

b) Compute the probability that the sample mean difference between the first and the second sample ranges between 3.5 and 5.5 (same assumption).

In [4]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

In [5]:
# Input data
n1 = 16          # Number of samples
mu1 = 75         # Mean
sigma1 = 8       # Standard deviation

n2 = 9           # Number of samples
mu2 = 70         # Mean
sigma2 = 12      # Standard deviation

## Point a

Compute the probability that the sample mean difference between the first and the second sample is greater than 4 (assume that the two populations are independent).

![Slide11.PNG](attachment:Slide11.PNG)

In [6]:
# Answer to point a
# Compute the mean and the variance of the difference between the two populations
mu_diff = mu1 - mu2 
sigma_diff = np.sqrt(sigma1**2/n1 + sigma2**2/n2)   # the operator ** stands for ^ (i.e., power of)

mu0 = 4       # Difference between the means

# P(X1 - X2 > mu0) = P(Z > (mu0 - mu_diff)/sigma_diff)
prob = 1 - stats.norm.cdf((mu0 - mu_diff)/sigma_diff)

print('Probability of the difference between the means being greater than %.1f is %.4f' % (mu0, prob))


Probability of the difference between the means being greater than 4.0 is 0.5885


## Point b

Compute the probability that the sample mean difference between the first and the second sample ranges between 3.5 and 5.5 (same assumption).


> ### Solution
> 
> We can use he following formula to compute the probability:
> $$Pr(3.5 \leq \bar{X}_1 - \bar{X}_2 \leq 5.5) = Pr(\frac{3.5 - 5}{\sqrt{20}} \leq Z \leq \frac{5.5 - 5}{\sqrt{20}}) = Pr(Z \leq \frac{5.5 - 5}{\sqrt{20}}) - Pr(Z \leq \frac{3.5 - 5}{\sqrt{20}})$$

In [8]:
# Answer to point b
lower_bound = 3.5      # Lower bound of the interval
upper_bound = 5.5      # Upper bound of the interval

# P(lower_bound < X1 - X2 < upper_bound) = P(X1 - X2 < upper_bound) - P(X1 - X2 < lower_bound)
prob = stats.norm.cdf((upper_bound - mu_diff)/sigma_diff) - stats.norm.cdf((lower_bound - mu_diff)/sigma_diff)

print('Probability of the difference between the means being between %.1f and %.1f is %.4f' % (lower_bound, upper_bound, prob))

Probability of the difference between the means being between 3.5 and 5.5 is 0.1759
