# Did Elvis Have an Identical Twin?  Probably.

Copyright 2020 Allen B. Downey

License: [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)

[Elvis Presley](https://en.wikipedia.org/wiki/Elvis_Presley#1935%E2%80%931953:_Early_years) had a twin brother who died at birth.  We don't know if they were identical or fraternal twins, but we can figure out the probability that they were.

Here's how:

1. First, we need some background information about the relative frequencies of identical and fraternal twins.

2. Then we'll use Bayes's Rule to take into account one piece of data, which is that Elvis's twin was male.

3. Then we'll take into account a second piece of data, which is that Elvis's twin died at birth.

For background information, I'll use data from 1935, the year Elvis was born, from the
U.S. Census Bureau, [Birth, Stillbirth, and Infant Mortality Statistics for the Continental United States, the Territory of Hawaii, the Virgin Islands 1935](https://www.cdc.gov/nchs/data/vsushistorical/birthstat_1935.pdf).

It includes this table, which shows the total number of plural births in the United States.

<img width="300" src="https://github.com/AllenDowney/BiteSizeBayes/raw/master/birth_data_1935.png">

The table doesn't report which twins are identical or fraternal, but we can use the data to estimate it.

With the numbers in the table, we can compute the fraction of twins that are opposite sex, which I'll call `x`.

In [1]:
opposite = 8397
same = 8678 + 8122

x = opposite / (opposite + same)
x

0.3332539588046196

But the quantity we want is the fraction of twins who are fraternal, which I'll call `p_f`. 
Let's see how we can get from `x` to `p_f`.

Because identical twins have the same genes, they are almost always the same sex.
Fraternal twins do not have the same genes; like other siblings, they are about equally likely to be the same or opposite sex.

So we can write the relationship:

`x = p_f / 2 + 0`

which says that opposite-sex twins include half of the fraternal twins and none of the identical twins.

And that implies

In [2]:
p_f = 2 * x
p_f

0.6665079176092392

We can also compute the fraction of twins that are identical, `p_i`:

In [3]:
p_i = 1 - p_f
p_i

0.3334920823907608

In 1935 about 2/3 of twins were fraternal and 1/3 were identical.
So if we know nothing else about Elvis, the probability is about 1/3 that he was an identical twin.

But we have two pieces of information that affect our estimate of this probability:

* Elvis's twin was male, which is more likely if he was identical.

* Elvis's twin died at birth, which is also more likely if he was identical.

To take this information into account, we will use Bayes's Rule:

`odds(H|D) = odds(H) * likelihood_ratio(D)`

That is, the posterior odds of the hypothesis `H`, after seeing data `D`, are the product of the prior odds of `H` and the likelihood ratio of `D`.

We can use `p_i`  and `p_f` to compute the prior odds that Elvis was an identical twin.

In [4]:
prior_odds = p_i / p_f
prior_odds

0.5003572704537335

The prior odds are about `0.5:1`.

Now let's compute the likelihood ratio of `D`.
The probability that twins are the same sex is nearly 100% if they are identical and about 50% if they are fraternal.
So the likelihood ratio is `100 / 50 = 2`.

In [5]:
likelihood_ratio = 2

Now we can apply Bayes's Rule:

In [6]:
posterior_odds = prior_odds * likelihood_ratio
posterior_odds

1.000714540907467

The posterior odds are close to 1, or, in terms of probabilities:

In [7]:
posterior_prob = posterior_odds / (posterior_odds + 1)
posterior_prob

0.5001785714285715

Taking into account that Elvis's twin was male, the probability is close to 50% that he was identical.

## More Data

Now let's take into account the second piece of data: Elvis's twin died at birth.

It seems likely that there are different risks for fraternal and identical twins, so I'll define:

* `r_f`: The probability that one twin is stillborn, given that they are fraternal.

* `r_i`: The probability that one twin is stillborn, given that they are identical.

We can't get those quantities directly from the table, but we can compute:

* `y`: the probability of "1 living", given that the twins are opposite sex.

* `z`: the probability of "1 living", given that the twins are the same sex.

In [8]:
y = (258 + 299) / opposite
y

0.06633321424318209

In [9]:
z = (655 + 564) / same
z

0.07255952380952381

Assuming that all opposite-sex twins are fraternal, we can infer that the risk for fraternal twins is `y`:

In [10]:
r_f = y
r_f

0.06633321424318209

To compute `r_i`, we can write the following relation:

`z = q_i * r_i + q_f * r_f`

which says that the risk for same-sex twins is the weighted sum of the risks for identical and fraternal twins, with weights

* `q_i`, the fraction of same-sex twins who are identical, and 

* `q_f`, compute the fraction who are fraternal.

`q_i` is the posterior probability we computed in the previous update; `q_f` is its complement.

In [11]:
q_i = posterior_prob
q_f = 1 - posterior_prob

Solving for `r_i`, we get

In [12]:
r_i = (z - q_f * r_f) / q_i
r_i

0.07878138759966678

Now we can compute the likelihood ratio:

In [13]:
likelihood_ratio2 = r_i / r_f
likelihood_ratio2

1.1876612417852819

In this dataset, the probability that one twin dies at birth is about 19% higher if the twins are identical.

Now we can apply Bayes's Rule again to compute the posterior odds after both updates:

In [14]:
posterior_odds2 = posterior_odds * likelihood_ratio2
posterior_odds2

1.1885098743267504

Or, if you prefer probabilities:

In [15]:
posterior_prob2 = posterior_odds2 / (posterior_odds2 + 1)
posterior_prob2

0.5430680885972108

Taking into account both pieces of data, the posterior probability that Elvis was an identical twin is about 54%.

The code in this example is in [a Jupyter notebook you can run on Colab](https://colab.research.google.com/github/AllenDowney/ProbablyOverthinkingIt2/blob/master/elvis.ipynb).

This example is from the second edition of *Think Bayes* forthcoming from O'Reilly Media.  The current draft is available from [Green Tea Press](https://greenteapress.com/wp/).

I learned about this problem from [*Bayesian Data Analysis*](http://www.stat.columbia.edu/~gelman/book/).
Their solution takes into account that Elvis's twin was male, but not the additional evidence that his twin died at birth.

Jonah Spicher, who took my Bayesian Statistics class at Olin College, came up with the idea to use data from 1935 to compute the likelihood of the data.