# The Left Handed Sister Problem

Think Bayes, Second Edition

Copyright 2021 Allen B. Downey

License: [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)

Suppose you meet someone who looks like the brother of your friend Mary. 
You ask if he has a sister named Mary, and he says "Yes I do, but I don't think I know you."

You remember that Mary has a sister who is left-handed, but you don't remember her name.
So you ask your new friend if he has another sister who is left-handed.

If he does, how much evidence does that provide that he is the brother of your friend, rather than a random person who coincidentally has a sister named Mary and another sister who is left-handed. In other words, what is the Bayes factor of the left-handed sister?

Let's assume:

* Out of 100 families with children, 20 have one child, 30 have two children, 40 have three children, and 10 have four children.

* All children are either boys or girls with equal probability, one girl in 10 is left-handed, and one girl in 100 is named Mary.

* Name, sex, and handedness are independent, so every child has the same probability of being a girl, left-handed, or named Mary.

* If the person you met had more than one sister named Mary, he would have said so, but he could have more than one sister who is left handed.

I'll use the following function to do Bayesian updates.

In [1]:
import pandas as pd

def make_table(prior, likelihood):
    """Make a DataFrame representing a Bayesian update."""
    table = pd.DataFrame(index=[1, 2, 3, 4])
    table.index.name = '# children'
    table['Prior'] = prior
    table['Likelihood'] = likelihood
    table['Product'] = (table['Prior'] * table['Likelihood'])
    total = table['Product'].sum()
    table['Posterior'] = table['Product'] / total
    return table

## The first update

Due to [length-biased sampling](https://towardsdatascience.com/the-inspection-paradox-is-everywhere-2ef1c2e9d709), the person you met is more likely to come from a big family.
Specifically, the likelihood of meeting someone from a family with size $n$ is proportional to $n$.
So when we meet a person, we have to update our belief about the size of their family.

In [2]:
prior = [20, 30, 40, 10]
likelihood1 = [1, 2, 3, 4]
table1 = make_table(prior, likelihood1)
table1

Unnamed: 0_level_0,Prior,Likelihood,Product,Posterior
# children,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,20,1,20,0.083333
2,30,2,60,0.25
3,40,3,120,0.5
4,10,4,40,0.166667


The posterior probabilities for families with one or two children are smaller than the priors; for families with three or four children, they are larger.

## The second update

The likelihood that a person has exactly one sister named Mary is given by the binomial distribution where `n` is the number of siblings and `p` is the probability that a sibling is a girl named Mary.

In [3]:
from scipy.stats import binom

p = 1 / 200
n = [0, 1, 2, 3]  # number of siblings
k = 1

likelihood2 = binom.pmf(k, n, p)
likelihood2

array([0.        , 0.005     , 0.00995   , 0.01485038])

Here's the second update.

In [4]:
prior = table1['Posterior']
table2 = make_table(prior, likelihood2)
table2

Unnamed: 0_level_0,Prior,Likelihood,Product,Posterior
# children,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,0.083333,0.0,0.0,0.0
2,0.25,0.005,0.00125,0.143677
3,0.5,0.00995,0.004975,0.571835
4,0.166667,0.01485,0.002475,0.284488


Based on the sister named Mary, we can rule out the possibility that the person you met is an only child, and the probability is higher that your interlocutor comes from a big family.

## Probability of a left-handed sister

Finally, we can compute the probability that he has at least one left-handed sister.
The likelihood comes from the binomial distribution again, this time using the survival function to compute the probability of one or more.

In [5]:
p = 1 / 20
n = [0, 0, 1, 2]
k = 1

likelihood3 = binom.sf(k-1, n, p)
likelihood3

array([0.    , 0.    , 0.05  , 0.0975])

A convenient way to compute the total probability of an outcome is to do an update as if it happened, ignore the posterior probabilities, and compute the sum of the products. 

In [6]:
prior = table2['Posterior']
table3 = make_table(prior, likelihood3)
table3

Unnamed: 0_level_0,Prior,Likelihood,Product,Posterior
# children,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,0.0,0.0,0.0,0.0
2,0.143677,0.0,0.0,0.0
3,0.571835,0.05,0.028592,0.507582
4,0.284488,0.0975,0.027738,0.492418


Here's the total probability that your new friend has a left-handed sister.

In [7]:
p = table3['Product'].sum()
p

0.05632931875489403

## The Bayes factor

If your interlocutor is the brother of your friend, the probability is 1 that he has a left-handed sister.
If he is not the brother of your friend, the probability is about 0.056.
So the Bayes factor is the ratio of these probabilities.

In [8]:
1/p

17.752744434054023

## Now with rational numbers

Some people solved the problem in terms of rational numbers, so I'll do the same calculation with `Fraction` objects.

In [9]:
from fractions import Fraction

prior = [Fraction(x) for x in [20, 30, 40, 10]]
likelihood1 = [1, 2, 3, 4]
table1 = make_table(prior, likelihood1)
table1

Unnamed: 0_level_0,Prior,Likelihood,Product,Posterior
# children,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,20,1,20,1/12
2,30,2,60,1/4
3,40,3,120,1/2
4,10,4,40,1/6


We can't use Scipy to compute the binomial distribution, but we can do it ourselves.

In [10]:
p = Fraction(1, 200)
ns = [0, 1, 2, 3]
k = 1

likelihood2 = [n * p**k * (1-p)**(n-k) for n in ns]
likelihood2

[Fraction(0, 1),
 Fraction(1, 200),
 Fraction(199, 20000),
 Fraction(118803, 8000000)]

Here's the first update.

In [11]:
prior = table1['Posterior']
table2 = make_table(prior, likelihood2)
table2

Unnamed: 0_level_0,Prior,Likelihood,Product,Posterior
# children,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1/12,0,0,0
2,1/4,1/200,1/800,20000/139201
3,1/2,199/20000,199/40000,79600/139201
4,1/6,118803/8000000,39601/16000000,39601/139201


Here's the second update. Again, we have to compute the binomial distribution ourselves.

In [12]:
p = Fraction(1, 20)
ns = [0, 0, 1, 2]
k = 1

likelihood3 = [1 - (1-p)**n for n in ns]
likelihood3

[Fraction(0, 1), Fraction(0, 1), Fraction(1, 20), Fraction(39, 400)]

And the pseudo-update as if your new friend has a left-handed sister.

In [13]:
prior = table2['Posterior']
table3 = make_table(prior, likelihood3)
table3

Unnamed: 0_level_0,Prior,Likelihood,Product,Posterior
# children,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,0,0,0,0
2,20000/139201,0,0,0
3,79600/139201,1/20,3980/139201,8000/15761
4,39601/139201,39/400,1544439/55680400,7761/15761


Here's the total probability as a rational number.

In [14]:
p = table3['Product'].sum()
print(p)

3136439/55680400


And here's the Bayes factor.

In [15]:
print(1/p)

55680400/3136439


Converting to floating-point, it's consistent with the previous result. 

In [16]:
float(1/p)

17.752744434054033

If you like this article, you might like the [new second edition of *Think Bayes*](https://thinkbayes.com).