# Chapter 9 Inferences for Proportions and Count Data

In [1]:
import polars as pl
from polars import col, lit
from scipy import stats
import numpy as np

RNG = np.random.default_rng()

## 9.1 Inferences on Proportion

This chapter begins with inference procedures for an unknown proportion $p$ in a Bernoulli population. The sample proportion $\hat{p}$ from a random sample of size $n$ is an unbiased estimate of $p$. Inferences on p are based on the central limit theorem (CLT) result that for large $n$, the sample proportion $\hat{p}$ is approximately normal with mean = $p$ and standard deviation = $\sqrt{pq/n}$ . A large sample two-sided 100(1- $\alpha$)% confidence interval for $p$ is given by

$$
\left[ \hat{p} \pm z_{\alpha /2} \sqrt{\frac{\hat{p} \hat{q}}{n}}\;\right]
$$

where $\hat{q}$ = 1 - $\hat{p}$ and $z_{\alpha/2}$ is the upper $\alpha/2$ critical point of the standard normal distribution. A large sample test on $p$ to test $H_0: p = p_0$ can be based on the test statistic

$$
z = \frac{\hat{p} - p_0}{\sqrt{\hat{p}\hat{q}/n}} \quad \text{or} \quad 
z = \frac{\hat{p} - p_0}{\sqrt{p_0 q_0 / n}}.
$$

Both these statistics are asymptotically standard normal under $H_0$.

### Ex 10.1

Tell whether the following mathematical models are theoretical and deterministic or empirical and probabilistic.

1. Maxwell's equations of electromagnetism. ✍️ theoretical / deterministic
2. An econometric model of the U.S. economy. ✍️ empirical / probabilistic
3. A credit scoring model for the probability of a credit applicant being a good risk as a function of selected variables, e.g., income, outstanding debts, etc. ✍️ empirical / probabilistic

### Ex 10.2

Tell whether the following mathematical models are theoretical and deterministic or empirical and probabilistic.

1. An item response model for the probability of a correct response to an item on a "true-false" test as a function of the item's intrinsic difficulty.  ✍️ empirical / probabilistic
2. The Cobb-Douglas production function, which relates the output of a firm to its capital and labor inputs. ✍️ empirical / probabilistic
3. Kepler's laws of planetary motion. ✍️ theoretical / deterministic

### Ex 10.3

Give an example of an experimental study in which the explanatory variable is controlled at fixed values, while the response variable is random. Also, give an example of an observational study in which both variables are uncontrolled and random.

## 9.2 Inferences for Comparing Two Proportions

Next we consider the problem of comparing two Bernoulli proportions, $p_1$ and $p_2$, based on two independent random samples of sizes $n_1$ and $n_2$. The basis for inferences on $p_1 - p_2$ is the result that for large $n_1$ and $n_2$, the difference in the sample proportions, $\hat{p}_1 - \hat{p}_2$, is approximately normal with mean = $p_1 - p_2$ and standard deviation = $\sqrt{p_1 q_1 / n_1 + p_2 q_2 / n_2}$ . A large sample two- sided 100(1 - $\alpha$)% confidence interval for $p_1 - p_2$ is given by

$$
\left[ \hat{p}_1 - \hat{p}_2 \pm z_{\alpha/2} \sqrt{\frac{\hat{p}_1 \hat{q}_1}{n_1} + \frac{\hat{p}_2 \hat{q}_2}{n_2}}\; \right].
$$

## 9.3 Inferences for One-way Count Data

## 9.4 Inferences for Two-way Count Data