# Three girls

In which we solve the three-girls-in-family problem.

## The problem

If there is a family of four children, what is the chance that
family will consist of exactly three girls and one boy?

We decided we could simulate this situation, by taking four
random numbers, between 0 and 1.  For each number, if it is less
than 0.5, we label this as a girl, otherwise we label it as
a boy.  Then we could count how many girls we got.  That is one
family.  We repeat the procedure many times, and count how many
families have three girls (three of four random numbers less
than 0.5).

## A simulation

In [1]:
import numpy as np
np.set_printoptions(precision=2)

First we do a simulation of a single family.

Start with 4 random numbers, between 0 and 1.

We could do these one at a time:

In [2]:
first_child = np.random.uniform()
first_child

0.2275878331047121

In [3]:
second_child = np.random.uniform()
second_child

0.765186012224795

In [4]:
third_child = np.random.uniform()
third_child

0.12518217908956397

In [5]:
fourth_child = np.random.uniform()
fourth_child

0.42676148437855577

That gets boring.  It is neater to make an array of 4 numbers in one shot, like this:

In [6]:
one_family = np.random.uniform(size=4)
one_family

array([0.58, 0.68, 0.46, 0.2 ])

Arrays allow us to do the same operation on all the elements.

For example, we can ask whether each random number is less than
0\.5.

In [7]:
girls = one_family < 0.5
girls

array([False, False,  True,  True])

Notice that the new array, `girls`, has four elements, like the
original array `one_family`.  At each position in the `girls`
array, there is a `True` if the corresponding element in
`one_family` was less than 0.5, and `False` otherwise.

We consider `True` to mean "girl" and `False` to mean "boy".  To count the number of girls in this family, we need to count the number of `True` values in the array.

In [8]:
n_girls = np.count_nonzero(girls)
n_girls

2

That is the result of our simulation, for one family.

We want to do this many times.  How would we do that?

One way is to make a two-dimensional array of random numbers.

A two-dimensional array has rows and columns.

In our case, the row will be a single family.  There are four columns, so each row has four elements, corresponding to the four children in the family.

Here we get ready to simulate 10 families, with one 2D array.

In [9]:
ten_families = np.random.uniform(size=(10, 4))
ten_families

array([[0.96, 0.61, 0.4 , 0.35],
       [0.97, 0.17, 0.05, 0.64],
       [0.98, 0.79, 0.06, 0.6 ],
       [0.34, 0.53, 0.42, 0.33],
       [0.29, 0.82, 0.44, 0.52],
       [0.21, 0.47, 0.79, 0.37],
       [0.89, 0.49, 0.47, 0.59],
       [0.47, 0.81, 0.04, 0.65],
       [0.04, 0.58, 0.17, 0.49],
       [0.29, 0.37, 0.06, 0.64]])

Notice the `size=` argument to `np.random.uniform`.  When we
wanted an array of 4 values the size was `4`.  Now we want a 2D
array, the size is two values, between parentheses, the first
value is the number of rows, and the second is the number of
columns.

We can apply our test `< 0.5` to all the 10 * 4 elements at the
same time.

In [10]:
are_girls = ten_families < 0.5
are_girls

array([[False, False,  True,  True],
       [False,  True,  True, False],
       [False, False,  True, False],
       [ True, False,  True,  True],
       [ True, False,  True, False],
       [ True,  True, False,  True],
       [False,  True,  True, False],
       [ True, False,  True, False],
       [ True, False,  True,  True],
       [ True,  True,  True, False]])

Remember, each row represents a family, and each `True` value
represents a girl.  We want to count how many `True` values
there are in each row.  We can try `np.count_nonzero` on this
array, but:

In [11]:
np.count_nonzero(are_girls)

23

By default, `np.count_nonzero` counts the number of True values in the entire 2D array.

We want it to count the number of `True` value in each *row*.

We can do that, by using the `axis` argument to
`np.count_nonzero`.  See [Arrays and axes](arrays_and_axes) for
a more detailed explanation.

In [12]:
n_girls = np.count_nonzero(are_girls, axis=1)
n_girls

array([2, 2, 1, 3, 2, 3, 2, 2, 3, 3])

`n_girls` has one element per *row* in the `are_girls` array.  The element corresponding to the first row, has the count of `True` values in the first row, and so on.

Now we need to ask the question, how many of the counts in `n_girls` are equal to 3?

To do this, we can use another comparison operator, like `<` in
as in `< 0.5`.  The operator is `==`.   Notice the double `=`
sign, together.   It is a test, that returns `True` or `False`.
For example:

In [13]:
4 == 3

False

In [14]:
4 == 4

True

These are expressions, because they return values.

Compare to the single equals, which is the assignment operator, in an assignment expression.

In [15]:
a = 4

Notice this does not return anything, because it is not an expression, it is an assignment statement.  `a` now has the value 4.

In [16]:
a

4

We can test whether the value of `a` is 4 like this:

In [17]:
a == 4

True

This is an equality test expression, so it does return a value.

How does this operate on arrays?   It operates the same way as the other comparison operators - element by element:

In [18]:
my_array = np.array([2, 3, 4, 2])
my_array

array([2, 3, 4, 2])

In [19]:
my_array == 2

array([ True, False, False,  True])

We can use this trick on the `n_girl` array, to find counts that are equal to 3.

In [20]:
n_girls == 3

array([False, False, False,  True, False,  True, False, False,  True,
        True])

To find the number of 3s in `n_girls`:

In [21]:
np.count_nonzero(n_girls == 3)

4

Now the proportion of the counts, that are equal to 3:

In [22]:
prop_3 = np.count_nonzero(n_girls == 3) / 10
prop_3

0.4

## Exercises

### 10000 families

Now you have seen how to simulate 10 families, with a 2D array.
Copy and paste from the code above, into the cell below, and
change what you need to change, to simulate 10000 families of
4 children.  It will have these steps:

* make a 2D array of random numbers between 0 and 1, with 10000
  rows and 4 columns.
* make a new array of the same shape, with `True` where the random
  number corresponds to "girl" and `False` when the number corresponds to "boy"
* count the number of girls (`True` values) in each row.
* count the number of times there were exactly 3 girls.
* divide by the number of rows to give an estimate for the
  proportion of 4-child families with exactly 3 girls.

In [23]:
# Simulate 10000 families of 4 children.
# Show proportion with 3 girls.
# Your code below

### No girls in a family of 4.

Estimate the chances that a 4-child family will have no girls.  You can copy the code from the cell above, and modify it, or you may be able to use variables from the code above, to get the answer, without repeating the simulation.

In [24]:
# Show proportion with 0 girls.
# Your code below

For extra points - the answer above is easier to work out with
probability than the chance of three girls.  What's the exact answer, from probability?

### 3 girls in a family of 5.

Simulate the chances that a family with 5 children will have
exactly 3 girls.

In [25]:
# Simulate 10000 families of 5 children.
# Show proportion with 3 girls.
# Your code below

### 3 or fewer girls in a family of 5.

Simulate the chances that a family with 5 children will have 3 or
fewer girls.

Hint: `<=` tests whether the thing on the left is *less than or equal to* the thing on the right.

In [26]:
3 <= 4

True

In [27]:
3 <= 3

True

In [28]:
3 <= 2

False

In [29]:
my_array = np.array([1, 2, 3, 4])
my_array <= 2

array([ True,  True, False, False])

Hints done, now:

In [30]:
# Proportion of families of 5 children with 3 or fewer girls.
# Your code below

In [31]:
# Simulate 10000 families of 5 children.
# Show proportion with 3 girls.
# Your code below

### More realistic simulation

Now we are back to the situation of exactly 3 girls in a family of 4.

In fact, when you have a child, the probability of having a girl
is slightly less than 0.5.

In fact, the [proportion of boys born in the
UK](https://www.gov.uk/government/statistics/gender-ratios-at-birth-in-great-britain-2010-to-2014)
is 0.513.  Hence the proportion of girls is 1-0.513 = 0.487.

With that probability of having a girl, what are the chances of having exactly three girls in a family of four?

In [32]:
# Simulate 10000 familes of 4 children.
chance_of_girl = 0.487
# Estimate chance of having exactly 3 girls.
# Your code here