# Birthday Paradox Explanation

The **birthday paradox** refers to the counterintuitive probability that in a group of people, at least two of them share the same birthday. For example, in a group of just 23 people, the probability is over 50%.

## Formula Using Combinatorics

For \( n \) people, the probability that **at least two people share a birthday** can be approximated using combinatorics:

$$
P(\text{at least one match}) \approx 1 - \left( \frac{364}{365} \right)^{\binom{n}{2}}
$$

Where:

- $\frac{364}{365}$ represents the probability that two randomly chosen people do not share the same birthday.
- $\binom{n}{2} = \frac{n(n-1)}{2}$is the number of unique pairs of people in a group of \( n \).

This formula simplifies the computation by focusing on the number of unique pairs and assumes that each pair has an independent probability of not sharing a birthday.

## Complement Formula

The complement gives the probability that **no two people share a birthday**:

$$
P(\text{no match}) \approx \left( \frac{364}{365} \right)^{\binom{n}{2}}
$$







In [1]:
import pandas as pd
import math


In [2]:
df = pd.read_csv("nba_2017.csv", parse_dates=["Birth Date"])
df

Unnamed: 0,Player,Pos,Age,Team,Birth Date
0,Alex Abrines,SG,23.0,Oklahoma City Thunder,1993-08-01
1,Quincy Acy,PF,26.0,Dallas Mavericks,1990-10-06
2,Quincy Acy,PF,26.0,Brooklyn Nets,1990-10-06
3,Steven Adams,C,23.0,Oklahoma City Thunder,1993-07-20
4,Arron Afflalo,SG,31.0,Sacramento Kings,1985-10-15
...,...,...,...,...,...
546,Cody Zeller,PF,24.0,Charlotte Hornets,1992-10-05
547,Tyler Zeller,C,27.0,Boston Celtics,1990-01-17
548,Stephen Zimmerman,C,20.0,Orlando Magic,1996-09-09
549,Paul Zipser,SF,22.0,Chicago Bulls,1994-02-18


In [3]:
df["Team"].value_counts()

Team
New Orleans Pelicans      27
Dallas Mavericks          24
Cleveland Cavaliers       22
Philadelphia 76ers        22
Atlanta Hawks             22
Brooklyn Nets             21
Milwaukee Bucks           20
Oklahoma City Thunder     19
Denver Nuggets            19
Charlotte Hornets         19
Los Angeles Lakers        19
Sacramento Kings          19
Orlando Magic             19
Phoenix Suns              18
Washington Wizards        18
Houston Rockets           18
Chicago Bulls             18
Golden State Warriors     17
Toronto Raptors           17
Memphis Grizzlies         17
Indiana Pacers            17
San Antonio Spurs         17
Minnesota Timberwolves    16
New York Knicks           16
Miami Heat                15
Los Angeles Clippers      15
Portland Trail Blazers    15
Detroit Pistons           15
Utah Jazz                 15
Boston Celtics            15
Name: count, dtype: int64

In [4]:
def nCr(n,k):
    f = math.factorial
    return f(n)/(f(k)*f(n-k))

In [44]:
nCr(22,2)

231.0

### What's the probability when n = 10?
What's the (approximate) probability of two people sharing a birthday in a group of 10 people?

In [6]:
1-((364/365)**(nCr(10,2)))

0.11614023654879224

### Implement the function `birthday_probability`

In [7]:
def birthday_probability(number_of_people):
    pairs = nCr(number_of_people,2)
    return 1-((364/365)**pairs)

### Probability when n=15

In [8]:
birthday_probability(15)

0.25028790861398265

### Create the column Birthday in the df
Use the column Birth Date to extract the "Birthday" (just Month and Day) for each player in the format MM-DD. Example: 08-01 is August, 1st.

In [9]:
df.head()

Unnamed: 0,Player,Pos,Age,Team,Birth Date
0,Alex Abrines,SG,23.0,Oklahoma City Thunder,1993-08-01
1,Quincy Acy,PF,26.0,Dallas Mavericks,1990-10-06
2,Quincy Acy,PF,26.0,Brooklyn Nets,1990-10-06
3,Steven Adams,C,23.0,Oklahoma City Thunder,1993-07-20
4,Arron Afflalo,SG,31.0,Sacramento Kings,1985-10-15


In [11]:
df['Birth Day'] = df['Birth Date'].dt.strftime("%m-%d")
df

Unnamed: 0,Player,Pos,Age,Team,Birth Date,Birth Day
0,Alex Abrines,SG,23.0,Oklahoma City Thunder,1993-08-01,08-01
1,Quincy Acy,PF,26.0,Dallas Mavericks,1990-10-06,10-06
2,Quincy Acy,PF,26.0,Brooklyn Nets,1990-10-06,10-06
3,Steven Adams,C,23.0,Oklahoma City Thunder,1993-07-20,07-20
4,Arron Afflalo,SG,31.0,Sacramento Kings,1985-10-15,10-15
...,...,...,...,...,...,...
546,Cody Zeller,PF,24.0,Charlotte Hornets,1992-10-05,10-05
547,Tyler Zeller,C,27.0,Boston Celtics,1990-01-17,01-17
548,Stephen Zimmerman,C,20.0,Orlando Magic,1996-09-09,09-09
549,Paul Zipser,SF,22.0,Chicago Bulls,1994-02-18,02-18


### Combinatorics


In [17]:
from itertools import combinations
names = ["John", "Mary", "Rob"]
birthdays= ["March 5", "Sept 20", "March 5"]
#Wrapping to force display
list(combinations(birthdays,2))

[('March 5', 'Sept 20'), ('March 5', 'March 5'), ('Sept 20', 'March 5')]

In [19]:
names_df = pd.DataFrame(combinations(names,2), columns = ["Person 1", "Person 2"])
birthdays_df = pd.DataFrame(combinations(birthdays, 2), columns=["Birthday 1", "Birthday 2"])


Unnamed: 0,Birthday 1,Birthday 2
0,March 5,Sept 20
1,March 5,March 5
2,Sept 20,March 5


In [20]:
df_concat = pd.concat([names_df, birthdays_df], axis=1)
df_concat

Unnamed: 0,Person 1,Person 2,Birthday 1,Birthday 2
0,John,Mary,March 5,Sept 20
1,John,Rob,March 5,March 5
2,Mary,Rob,Sept 20,March 5


In [22]:
df_concat.loc[df_concat['Birthday 1'] == df_concat['Birthday 2']]

Unnamed: 0,Person 1,Person 2,Birthday 1,Birthday 2
1,John,Rob,March 5,March 5


# Activities
### 5. How many pairs of players share a birthday for the Atlanta Hawks?


In [41]:
# Find all the players for the Atlanta Hawks
df_hawks = df.loc[df["Team"].str.contains("atlanta", case=False, na=False)]
df_hawks

Unnamed: 0,Player,Pos,Age,Team,Birth Date,Birth Day
37,Kent Bazemore,SF,27.0,Atlanta Hawks,1989-07-01,07-01
42,DeAndre' Bembry,SF,22.0,Atlanta Hawks,1994-07-04,07-04
75,Jose Calderon,PG,35.0,Atlanta Hawks,1981-09-28,09-28
116,Malcolm Delaney,PG,27.0,Atlanta Hawks,1989-03-11,03-11
130,Mike Dunleavy,SF,36.0,Atlanta Hawks,1954-03-21,03-21
131,Mike Dunleavy,SF,36.0,Atlanta Hawks,1980-09-15,09-15
192,Tim Hardaway,SG,24.0,Atlanta Hawks,1966-09-01,09-01
193,Tim Hardaway,SG,24.0,Atlanta Hawks,1992-03-16,03-16
231,Dwight Howard,C,31.0,Atlanta Hawks,1985-12-08,12-08
234,Kris Humphries,PF,31.0,Atlanta Hawks,1985-02-06,02-06


In [42]:
# Create a data frame containing all pair of names and birth days
names_df = pd.DataFrame(combinations(df_hawks["Player"],2), columns = ["Player 1", "Player 2"])
birthdays_df = pd.DataFrame(combinations(df_hawks["Birth Day"], 2), columns=["Birthday 1", "Birthday 2"])


In [43]:
names_df

Unnamed: 0,Player 1,Player 2
0,Kent Bazemore,DeAndre' Bembry
1,Kent Bazemore,Jose Calderon
2,Kent Bazemore,Malcolm Delaney
3,Kent Bazemore,Mike Dunleavy
4,Kent Bazemore,Mike Dunleavy
...,...,...
226,Mike Scott,Edy Tavares
227,Mike Scott,Taurean Waller-Prince
228,Thabo Sefolosha,Edy Tavares
229,Thabo Sefolosha,Taurean Waller-Prince


In [45]:
birthdays_df

Unnamed: 0,Birthday 1,Birthday 2
0,07-01,07-04
1,07-01,09-28
2,07-01,03-11
3,07-01,03-21
4,07-01,09-15
...,...,...
226,07-16,03-22
227,07-16,03-22
228,05-02,03-22
229,05-02,03-22


In [47]:
check_df = pd.concat([names_df,birthdays_df], axis =1)
check_df

Unnamed: 0,Player 1,Player 2,Birthday 1,Birthday 2
0,Kent Bazemore,DeAndre' Bembry,07-01,07-04
1,Kent Bazemore,Jose Calderon,07-01,09-28
2,Kent Bazemore,Malcolm Delaney,07-01,03-11
3,Kent Bazemore,Mike Dunleavy,07-01,03-21
4,Kent Bazemore,Mike Dunleavy,07-01,09-15
...,...,...,...,...
226,Mike Scott,Edy Tavares,07-16,03-22
227,Mike Scott,Taurean Waller-Prince,07-16,03-22
228,Thabo Sefolosha,Edy Tavares,05-02,03-22
229,Thabo Sefolosha,Taurean Waller-Prince,05-02,03-22


In [48]:
# Final result
check_df.loc[check_df["Birthday 1"] == check_df["Birthday 2"]]

Unnamed: 0,Player 1,Player 2,Birthday 1,Birthday 2
13,Kent Bazemore,Mike Muscala,07-01,07-01
106,Mike Dunleavy,Dennis Schroder,09-15,09-15
230,Edy Tavares,Taurean Waller-Prince,03-22,03-22


### 7. How many pairs of players share a birthday in the cleveland Cavaliers?

In [None]:
birthday_probability()