# Riddler Express

From Ernie Cohen comes a scintillating stumper of a survey:

You’re reviewing some of the survey data that was randomly collected from the residents of Riddler City. As you’ll recall, the city is quite large.

Ten randomly selected residents were asked how many people (including them) lived in their household. As it so happened, their answers were 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10.

It’s your job to use this (admittedly limited) data to estimate the average household size in Riddler City. Your co-worker suggests averaging the 10 numbers, which would give you an answer of about 5.5 people. But you’re not so sure.

Would your best estimate be exactly 5.5, less than 5.5 or greater than 5.5?

In [1]:
import numpy as np
import pandas as pd

In [2]:
HOUSE_SIZES = list(range(1, 11))
HOUSE_SIZES

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [3]:
# least common multiple of the house sizes to stick with integer population
min_population = np.lcm.reduce(HOUSE_SIZES)
min_population

2520

In [4]:
df = pd.DataFrame({"household_size":HOUSE_SIZES})

In [5]:
df["houses"] = min_population // df.household_size

In [6]:
df["population"] = df.household_size * df.houses

In [7]:
df["household_frac_population"] = df.population / df.population.sum()

In [8]:
df

Unnamed: 0,household_size,houses,population,household_frac_population
0,1,2520,2520,0.1
1,2,1260,2520,0.1
2,3,840,2520,0.1
3,4,630,2520,0.1
4,5,504,2520,0.1
5,6,420,2520,0.1
6,7,360,2520,0.1
7,8,315,2520,0.1
8,9,280,2520,0.1
9,10,252,2520,0.1


A random sampling of the town population would now make one equally likely to draw a person from any of the available household sizes. However, there are more smaller households to swing the numbers.

In [9]:
average_household_size = np.average(df.household_size, weights=df.houses)
average_household_size

3.414171521474055

Which is less than the suggested 5.5