# The Left Handed Sister Problem

Think Bayes, Second Edition

Copyright 2021 Allen B. Downey

License: [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)

Suppose you meet someone who looks like the brother of your friend Mary. 
You ask if he has a sister named Mary, and he says "Yes I do, but I don't think I know you."

You remember that Mary has a sister who is left-handed, but you don't remember her name.
So you ask your new friend if he has another sister who is left-handed.

If he does, how much evidence does that provide that he is the brother of your friend, rather than a random person who coincidentally has a sister named Mary and another sister who is left-handed. In other words, what is the Bayes factor of the left-handed sister?

Let's assume:

* Out of 100 families with children, 20 have one child, 30 have two children, 40 have three children, and 10 have four children.

* All children are either boys or girls with equal probability, one girl in 10 is left-handed, and one girl in 100 is named Mary.

* Name, sex, and handedness are independent, so every child has the same probability of being a girl, left-handed, or named Mary.

* If the person you met had more than one sister named Mary, he would have said so, but he could have more than one sister who is left handed.

## Constructing the prior

I'll make a Pandas `Series` that enumerates possible families with 2, 3, or 4 children.

In [1]:
import pandas as pd

qs = [(2, 0),
      (1, 1),
      (0, 2),
      (3, 0),
      (2, 1),
      (1, 2),
      (0, 3),
      (4, 0),
      (3, 1),
      (2, 2),
      (1, 3),
      (0, 4),
    ]
index = pd.MultiIndex.from_tuples(qs, names=['Boys', 'Girls'])

The frequencies of these families are proportional to 30%, 40%, and 10%.

In [2]:
ps = [30, 30, 30, 40, 40, 40, 40, 10, 10, 10, 10, 10]

prior1 = pd.Series(ps, index=index, name='Prior')
pd.DataFrame(prior1)

Unnamed: 0_level_0,Unnamed: 1_level_0,Prior
Boys,Girls,Unnamed: 2_level_1
2,0,30
1,1,30
0,2,30
3,0,40
2,1,40
1,2,40
0,3,40
4,0,10
3,1,10
2,2,10


So that's the (unnormalized) prior.

I'll use the following function to do Bayesian updates.

In [3]:
import pandas as pd

def make_table(prior, likelihood):
    """Make a DataFrame representing a Bayesian update."""
    table = pd.DataFrame(prior)
    table.columns = ['Prior']
    table['Likelihood'] = likelihood
    table['Product'] = (table['Prior'] * table['Likelihood'])
    total = table['Product'].sum()
    table['Posterior'] = table['Product'] / total
    return table

This function takes a prior and a likelihood and returns a `DataFrame`

## The first update

Due to [length-biased sampling](https://towardsdatascience.com/the-inspection-paradox-is-everywhere-2ef1c2e9d709), the person you met is more likely to come from family with more boys.
Specifically, the likelihood of meeting someone from a family with $n$ boys is proportional to $n$.

In [4]:
likelihood1 = prior1.index.to_frame()['Boys']
table1 = make_table(prior1, likelihood1)
table1

Unnamed: 0_level_0,Unnamed: 1_level_0,Prior,Likelihood,Product,Posterior
Boys,Girls,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2,0,30,2,60,0.139535
1,1,30,1,30,0.069767
0,2,30,0,0,0.0
3,0,40,3,120,0.27907
2,1,40,2,80,0.186047
1,2,40,1,40,0.093023
0,3,40,0,0,0.0
4,0,10,4,40,0.093023
3,1,10,3,30,0.069767
2,2,10,2,20,0.046512


So that's what we should believe about the family after the first update.

## The second update

The likelihood that a person has exactly one sister named Mary is given by the binomial distribution where `n` is the number of girls in the family and `p` is the probability that a girl is named Mary.

In [5]:
from scipy.stats import binom

ns = prior1.index.to_frame()['Girls']
p = 1 / 100
k = 1

likelihood2 = binom.pmf(k, ns, p)
likelihood2

array([0.        , 0.01      , 0.0198    , 0.        , 0.01      ,
       0.0198    , 0.029403  , 0.        , 0.01      , 0.0198    ,
       0.029403  , 0.03881196])

Here's the second update.

In [6]:
prior2 = table1['Posterior']
table2 = make_table(prior2, likelihood2)
table2

Unnamed: 0_level_0,Unnamed: 1_level_0,Prior,Likelihood,Product,Posterior
Boys,Girls,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2,0,0.139535,0.0,0.0,0.0
1,1,0.069767,0.01,0.000698,0.104093
0,2,0.0,0.0198,0.0,0.0
3,0,0.27907,0.0,0.0,0.0
2,1,0.186047,0.01,0.00186,0.277582
1,2,0.093023,0.0198,0.001842,0.274806
0,3,0.0,0.029403,0.0,0.0
4,0,0.093023,0.0,0.0,0.0
3,1,0.069767,0.01,0.000698,0.104093
2,2,0.046512,0.0198,0.000921,0.137403


Based on the sister named Mary, we can rule out families with no girls, and families with more than one girls are more likely.

## Probability of a left-handed sister

Finally, we can compute the probability that he has at least one left-handed sister.
The likelihood comes from the binomial distribution again, where `n` is the number of *additional* sisters, and we use the survival function to compute the probability that one or more are left-handed.

In [7]:
ns = prior1.index.to_frame()['Girls'] - 1
ns.name = 'Additional sisters'
neg = (ns < 0)
ns[neg] = 0
pd.DataFrame(ns)

Unnamed: 0_level_0,Unnamed: 1_level_0,Additional sisters
Boys,Girls,Unnamed: 2_level_1
2,0,0
1,1,0
0,2,1
3,0,0
2,1,0
1,2,1
0,3,2
4,0,0
3,1,0
2,2,1


In [8]:
p = 1 / 10
k = 1

likelihood3 = binom.sf(k-1, ns, p)
likelihood3

array([0.   , 0.   , 0.1  , 0.   , 0.   , 0.1  , 0.19 , 0.   , 0.   ,
       0.1  , 0.19 , 0.271])

A convenient way to compute the total probability of an outcome is to do an update as if it happened, ignore the posterior probabilities, and compute the sum of the products. 

In [9]:
prior3 = table2['Posterior']
table3 = make_table(prior3, likelihood3)
table3

Unnamed: 0_level_0,Unnamed: 1_level_0,Prior,Likelihood,Product,Posterior
Boys,Girls,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2,0,0.0,0.0,0.0,0.0
1,1,0.104093,0.0,0.0,0.0
0,2,0.0,0.1,0.0,0.0
3,0,0.0,0.0,0.0,0.0
2,1,0.277582,0.0,0.0,0.0
1,2,0.274806,0.1,0.027481,0.453438
0,3,0.0,0.19,0.0,0.0
4,0,0.0,0.0,0.0,0.0
3,1,0.104093,0.0,0.0,0.0
2,2,0.137403,0.1,0.01374,0.226719


At this point, there are only three family types left standing, (1,2), (2,2), and (1,3).

Here's the total probability that your new friend has a left-handed sister.

In [10]:
p = table3['Product'].sum()
p

0.06060509432587445

## The Bayes factor

If your interlocutor is the brother of your friend, the probability is 1 that he has a left-handed sister.
If he is not the brother of your friend, the probability is `p`.
So the Bayes factor is the ratio of these probabilities.

In [11]:
1/p

16.500263073975034

## Now with rational numbers

Some people solved the problem in terms of rational numbers, so I'll do the same calculation with `Fraction` objects.

COMING SOON

If you like this article, you might like the [new second edition of *Think Bayes*](https://thinkbayes.com).