# Chapter 5: Inference for categorical data

### Guided Practice 6.2

We get that $SE_{\hat{p}} = \sqrt{\frac{\hat{p} \times (1 - \hat{p})}{n}} = 0.0159$.

### Guided Practice 6.4

A plausible hypothesis would be $P_{0}: $ support and opposition are the same fraction ($ = 0.50$) and $P_{A}: $ states that support and opposition fractions are in different proportions ($ \ne 0.50$).

### Guided Practice 6.5

Let's check the success-failure condition with the null value $np_{0} = 412.5 \approx 413 = n(1 - p_{0})$, so we can use the normal distribution.

### Guided Practice 6.8

We basically want $1.645 \times \sqrt{\frac{p(1 - p)}{n}} \lt 0.01$, we can conveniently create a Python routine which, taking as input `p`, the confidence level and the margin of error, will output the minimum sample size `n` to achieve that. We run the function with the three proportions found previously and see what's the respective `n`.

In [5]:
from scipy import stats

def get_sample_size(p, confidence_level, error_margin):
    z = round(stats.norm.interval(confidence_level)[-1], 2)

    return round((z ** 2) * ((p * (1 - p)) / (error_margin ** 2)))

for p in [0.017, 0.062, 0.013]:
    print(f"With p = {p}, we need a sample size of n = {get_sample_size(p, 0.90, 0.01)}.")


With p = 0.017, we need a sample size of n = 449.
With p = 0.062, we need a sample size of n = 1564.
With p = 0.013, we need a sample size of n = 345.


### Guided Practice 6.10

We can again leverage the previously created function to answer this question.

In [6]:
get_sample_size(0.70, 0.95, 0.05)

323

### Exercise 6.1 - Vegetarian college students.

* (a) Since $np = 60 \times 0.08 = 4.8$ and $n(1 - p) = 60 \times 0.92 = 55.2$, the normal distribution won't work well.
* (b) Yes, since the sample proportion is closer to 0 than 1 and the sample size is not very big for such a proportion.
* (c) A sample size of $n = 125$ would give us $SE_{\hat{p}} = 0.024$ therefore we can compute $Z = 1.67$, which means that value is quite unusual.
* (d) With a sample size of $n = 250$ we get $SE_{\hat{p}} = 0.017$ and therefore $Z = 2.33$, so the proportion becomes more unusual.
* (e) We reduced the standard error by 28% so we had roughly one quarter of the standard error with half the sample size.

### Exercise 6.2 - Young Americans, Part I.

* (a) Maybe only slightly left skewed due to $\hat{p}$ being closer to 1 and the sample size being 20.
* (b) Since $np = 30.8$ and $n(1 - p) = 9.2$, we fail the success-failure condition so the normal approximation won't work well.
* (c) Since $SE_{\hat{p}} = 0.054$ we have that $Z = 1.47$ so the observation would be considered unusual.
* (d) In this case, $Z = 2.08$, so the observation is more unusual.

### Exercise 6.3 - Orange tabbies.

* (a) It is left skewed.
* (b) True.
* (c) True.
* (d) True.

### Exercise 6.4 - Orange tabbies.