<div style="text-align: right">INFO 6105 Data Sci Engineering Methods and Tools, Week 3 Lecture 1</div>
<div style="text-align: right">Dino Konstantopoulos, 16 September 2019</div>


# 1. The Birthday paradox
  
Our game was: What's the probability that someone in this classroom shares your birthday? 

Each person can have your birthday with probability 1/365. There are n−1 people other than yourself, so the probability that someone shares your birthday is ...

Now, what is the probability that *two* students in this classroom have the *same* birthday? Which one of the two you think is higher?

### Probability that someone in this classroom shares your birthday? 

Each person can have your birthday with probability 1/365. There are n-1 people other than yourself, so the probability that someone shares your birthday is (n-1)/ 365. 

Why do we add probabilities? Because it's the **union** of two possibilities, which have no intersection.

Hmm, what happens where there are 365 people in class?

### Probability that two students in this classroom have the same birthday? 

For 2 students to share a birthday, this problem is best approached the other way around, because probability that no two people have the same birthday is easier to ﬁnd.

Let A be the event that two people have the same birthday. Then Ac is the event that no two people have the same birthday. Note that P(A) = 1−P(Ac).

We start with person 1; this person can have any 1 of 365 days out of the year.

A second person can only have a birthday on the 364 days out of the year that hasn’t been ‘taken
By assumption of random birthdays, and of uniform probability, the chance that this person has any of the 364 birthdays is 364/365.

A third person can only have a birthday out of the 353 days not ‘taken,’ and the corresponding probability of such an event is 363/365.

This continues until we’ve covered all n people.

Since this is an **intersection** of events, probabilities multiply.


In [1]:
from operator import mul
from functools import reduce

# assume 23 people in class
def probSomeoneShares():
    return 22/365
def prob2StudentsShare():
    """return 1 minus the negation (students NOT sharing a common birthday)"""
    """return 1 - (365 * 364 * 363 * 362 * 361 * 360 * 359 * 358 * 357 * 356 *...)/ (365 ** 23)"""
    l = [n for n in range(365, 365-23, -1)]
    return 1 - (reduce(mul, l, 1) / (365 ** 23))

print("Shares: " + str(probSomeoneShares()))
print("2 share: " + str(prob2StudentsShare()))

Shares: 0.06027397260273973
2 share: 0.5072972343239854


Wow, 50%??

### Another approach

We should compare ***everyone*** to ***everyone else***! 

The chance of 2 people having *different* birthdays is:

$$ 1 - \frac{1}{365} = \frac{364}{365} = .997$$

oh... so the chance of your friend having a different birthday than *you* is:

$$ 1 - \frac{1}{365} = \frac{364}{365} = .997$$

what is the chance of 2 people having different birthdays than you? The product of each:

$$ (1 - \frac{1}{365})^2$$

And so the chance of $n$ people having different birthdays than you must be:

$$ (1 - \frac{1}{365})^n$$

And so the chance of people having different birthdays than you in a classroom of $n$ people must be:

$$ (1 - \frac{1}{365})^{n-1}$$

And so the chance of people in a classroom of $n$ people having the the *same* birthday than you must be:

$$ 1 - (1 - \frac{1}{365})^{n-1}$$

And we see than when $n=365$, the real probability *probSomeoneShares* is actually:


In [2]:
1 - (1 - 1/365) ** 365

0.6326250793368262

Now, with 23 people we have 253 pairs: 

$$\frac{23 * 22}{2} = 253$$ 

Once again, the chance of 2 people having *different* birthdays is:

$$ 1 - \frac{1}{365} = \frac{364}{365} = .997$$

Making 253 comparisons and having them all be different is like getting one of heads or tails 253 times in a row:

$$(\frac{364}{365})^{253} = .499$$

So the chance we find a match is: 1 – 49.95% = 50.05%, or just over half.

The probability of a match for any number of people $n$ the formula is:

$$1 - (\frac{364}{365})^{n(n-1)/2}$$

In fact, $\sqrt{n}$ is roughly the number you need to have a 50% chance of a match with $n$ items.

### Note 

For n = 30, the odds of a common birthday increase to 70.6%, and most people still find it hard to believe that among 30 people there are probably two who have the same birthday! The table below lists various values of n and the probabilities, 1 − Pn, that at least two people have a common birthday.

|n |10| 20| 23| 30| 50| 60| 70| 100|
|--|--|--|--|--|--|--|--|--|--|--|
|1 − Pn| 11.7%| 41.1%| 50.7%| 70.6%| 97.0%| 99.4%| 99.92%| 99.9994%|

### Course Correction

Remember how we assumed birthdays are independent in class? Well, ***they aren’t!***

<br />
<center>
    <img src="images/slam-brakes.jpg" width=400 />
</center>

If Person A and Person B match birthday-wise, and Person B and C match too, we know that A and C must match also. The outcome of matching A and C thus depends on their results with B, so the probabilities aren’t independent! If truly independent, A and C would have a 1/365 chance of matching even if they both match with B, yet we know it's a 100% guaranteed match!

When counting pairs, we treated birthday matches like coin flips, multiplying the same probability over and over. This assumption isn’t strictly true even though it’s good enough for a small number of people (23) compared to the sample size (365). It’s unlikely to have multiple people match and screw up the independence, so it’s a good approximation.

It’s unlikely, but it can happen. Let’s figure out the ***real*** chances of each person having a different birthday:

* The first person has a 100% chance of a unique birthday (of course)
* The second has a (1 – 1/365) chance (all but 1 number from the 365)
* The third has a (1 – 2/365) chance (all but 2 birthdays)
* ...
* The 23rd has a (1 – 22/365) (all but 22 birthdays)

And so:

$$p(\text{different}) = 1.(1 - \frac{1}{365}).(1 - \frac{2}{365})...(1 - \frac{22}{365})$$

### Why the Odds are Higher than you think

One person has a 1/365 chance of meeting someone with the same birthday.

Two people have a 1/183 chance of meeting someone with the same birthday. But those two people might *also* have the same birthday, so you have to add odds of 1/365 for that. 

The odds become 1/365 + 1/182.5 = 0.008, or .8 percent.

Four people (lets call them ABCD) have a 1/91 chance, but there are 6 possible combinations (AB AC AD BD BC CD) so the probability becomes 1/91 + 6/365…and so on.

### Want more?

[Wikipedia](https://en.wikipedia.org/wiki/Birthday_problem)