**Probability and statistics - Chapter 11: Tables of Counts**

Using the textbook by Wild & Seber as a guide, I document some statistical lessons in this notebook.

This chapter is about variables that define classes or group membership. Tables of measurements are now replaced by tables of counts, and we focus on proportions (probabilities) rather than means. Tables are based on samples, and the sample proportions in the tables will be estimates of the true underlying population proportions.

In the way that t- and F-distributions are used for making inferences about samples of quantitative varibles, the Chi-square distribution is the main distribution handling inferences about categorical data. It is based on cmparing the observed counts in the tables with those that one might expect to get under some hypothesis about the way the underlying population proportions or probabilities are structured. If the differences between the observed and expected counts are sufficiently large, then the hypothesis will be rejected.

**One-way tables:**
- single qualitative variable with a single vector of data
- **Test for goodness-of-fit**: Do the underlying population proportions follow a hypothesized pattern?

**Two-way tables:**
- Similar as above but now two situations are of interest
    1. **Test for homogeneity**: When we have several different populations, each represented by a 1D table, do they have the same underlying proportions?
    2. **Test for independence**: When there is just one population, but the categories can now be arranged in a 2D array. Are the row and column categoires independent of each other?
- Note the perils of collapsing complex tables to simpler ones


In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib inline
import seaborn as sns
from scipy import integrate, stats

# Code formatting
%load_ext nb_black

<IPython.core.display.Javascript object>

## Introduction



## Chapter review exercises

### Question 1

300 voters in a district are surveyed at random and are asked which of three majoral candidates they would vote for. Results were:

| Preferred mayor | A | B | C |
| --- | ---- | --- | -- |
| Number of voters | 119 | 97 | 84 |

In [2]:
from scipy.stats import chisquare

chisquare([119, 97, 84])

Power_divergenceResult(statistic=6.26, pvalue=0.04371779725275094)

<IPython.core.display.Javascript object>

This answer is the same that I got calculated manually and with the book.