Among 400 ingots, 68 have defects

$n = 400$

$k = 68$

We can assume normal.  (check the conditions!)

In [3]:
from math import sqrt
from scipy.stats import norm

In [12]:
def ci1():
    n = 400
    k = 68
    p_hat = k / n
    SE = sqrt( p_hat * (1-p_hat) / n)
    z = norm.ppf(0.975)
    ME = z * SE
    print(f"95% CI: ({p_hat - ME:.3f}, {p_hat + ME:.3f})")
ci1()

95% CI: (0.133, 0.207)


Note that, when cacluating CI, we do not assume knowledge of true value of $p$.

We are 95% confident that changes in the casting process have resulted in a rate of defect between 13.3% and 20.7%.  Since this interval covers the original estimate of 20%, we can hardly claim any improvement.

The Hypothesis Testing (HT) will also prove this.

Null hypothesis $H_0$: $p_0 = 0.2$

In [14]:
def ht1():
    n = 400
    k = 68
    p_hat = k / n
    p0 = 0.2
    sp0 = sqrt(p0 * (1-p0) / n)
    z1 = (p_hat - p0) / sp0
    z2 = -z1
    p_val = 2 * norm.cdf(z1)
    print(f"p-value: {p_val:.3f}")
ht1()

p-value: 0.134


In HT, $H_0$ provides an assumed true parameter value.  Our analysis proceeds with the assumption of the parameter value.

Taking a threshold p-value of 5%, we failed to reject the null hypothesis (so we should retain it).  There is no strong evidence that the changes in processing have lowered the defect rate.

Now think about what happens if the engineers take a larger sample.  Suppose they collect 10 times, i.e. 4000 ignots and found that 680 have cracks.

In [16]:
def ci2():
    n = 400 * 10
    k = 68 * 10
    p_hat = k / n
    SE = sqrt( p_hat * (1-p_hat) / n)
    z = norm.ppf(0.975)
    ME = z * SE
    print(f"95% CI: ({p_hat - ME:.3f}, {p_hat + ME:.3f})")
ci2()

95% CI: (0.158, 0.182)


In [18]:
def ht1():
    n = 400 * 10
    k = 68 * 10
    p_hat = k / n
    p0 = 0.2
    sp0 = sqrt(p0 * (1-p0) / n)
    z1 = (p_hat - p0) / sp0
    z2 = -z1
    p_val = 2 * norm.cdf(z1)
    print(f"p-value: {p_val:.6f}")
ht1()

p-value: 0.000002


We see that, although the estimated $\hat{p}$ has not changed, since $n$ is 10 times as large, our estimate becomes more precise and we now have a strong enough evidence to reject the hypothesis.  We embrace the alternative hypothesis that rate of defects has indeed changed, and in this case specifically, it is lowered.

# Summary:

In HT, we start with a null hypothesis (a conservative, skeptical claim about a model parameter, which we would potentially overturn), whose opposite is what we call the alternative hypothesis.  We collect data and analyze same to reach two possible conclusions:

1. assuming $H_0$ is true, the data suggest a low p-value smaller than a threshold that we choose, and we can reject the null hypothesis in favor of thee alternative hypothesis.
2. otherwise, if the p-value is common (i.e., above the chosen threshold), we "failed to reject" the null hypothesis.  In this case, we retain $H_0$.