In [7]:
import numpy as np
from scipy import stats

# Goodness of Fit

## Examples

### Example 1

Jiao works as an usher at a theater. The theater has $1000$ seats that are accessed through five entrances. Each guest should use the entrance that's marked on their ticket. Jiao wants to test if the distribution of guests according to entrances matches the official distribution. He collects information about the number of guests that went through each entrance at a certain night. Here are the results:

|Entrance|A|B|C|D|E|Total|
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|Expected|30%|30%|20%|10%|10%|100%|
|# of people|398|202|205|87|108|1000|

Jiao wants to perform a $\chi^2$ goodness-of-fit test to determine if these results suggest that the actual distribution of people doesn't match the expected distribution.

**What is the expected count of guests in entrance $\text A$ in Jiao's sample?**  
_You may round your answer to the nearest hundredth._

In [18]:
total = 1000
1000 * 0.3

300.0

Conculsion: The expected count of guests in entrance $A$ in Jiao's sample is $Total \times P(A) = 1000 \times 30\% = 300$.

### Example 2

In the game rock-paper-scissors, Kenny expects to win, tie, and lose with equal frequency. Kenny plays rock-paper-scissors often, but he suspected his own games were not following that pattern, so he took a random sample of $24$ games and recorded their outcomes. Here are his results:

|Outcome|Win|Loss|Tie|
|:-:|:-:|:-:|:-:|
|Games|4|13|7|

He wants to use these results to carry out a $\chi^2$ to determine if the distribution of his outcomes disagrees with an even distribution.

**What are the values of the test statistic and P-value for Kenny's test?**

In [40]:
p = np.array([1/3, 1/3, 1/3])
total = 24
expected = p * total
observed = np.array([4, 13, 7])
static, pvalue = stats.chisquare(f_obs=observed, f_exp=expected, ddof=0)
precision = 2
print('test static =', round(static, precision))
print('p-value =', round(pvalue, precision))

test static = 5.25
p-value = 0.07


### Example 3

In the following table, Meryem modeled the number of rooms she believes are in use at any given time at the veterinary hospital where she works.

|Number of rooms in use|1|2|3|4|5|
|:-:|:-:|:-:|:-:|:-:|:-:|
|Percent of the time|10%|10%|25%|45%|10%|

To test her model, she took a random sample of $80$ times and recorded the numbers of rooms in use at those times. Here are her results:

|Number of rooms in use|1|2|3|4|5|
|:-:|:-:|:-:|:-:|:-:|:-:|
|Percent of the time|12|4|20|36|8|

She wants to use these results to carry out a $\chi^2$, squared goodness-of-fit test to determine if the distribution of numbers of rooms in use at her veterinary hospital disagrees with the claimed percentages.

**What are the values of the test statistic and P-value for Meryem's test?**

In [41]:
p = np.array([0.1, 0.1, 0.25, 0.45, 0.1])
total = 80
expected = p * total
observed = np.array([12, 4, 20, 36, 8])
static, pvalue = stats.chisquare(f_obs=observed, f_exp=expected, ddof=0)
precision = 2
print('test static =', round(static, precision))
print('p-value =', round(pvalue, precision))

test static = 4.0
p-value = 0.41


In [44]:
p = np.array([0.66, 0.25, 0.09])
total = 500
expected = p * total
observed = np.array([345, 125, 30])
static, pvalue = stats.chisquare(f_obs=observed, f_exp=expected, ddof=0)
precision = 2
print('test static =', round(static, precision))
print('p-value =', round(pvalue, precision))

test static = 5.68
p-value = 0.06
