In [1]:
# import
import pandas as pd
import math

* The Birthday problem asks the question of "What's the probability of 2 people sharing a birthday, in a room with N people?" The birthday paradox refers to the counterintuitive fact that only 23 people are needed for that probability to exceed 50%.


* In this project, I'll be using Pandas to answer the question: How many players in each team in the Premier League share a birthday?


***The first step will be to calculate the probability of two people sharing a birthday, in a group of n people. We can use the following approximation formula:***

<img src='birthday-probability-formula.png' width='300px' height='300px' align='left' />
<img src='n-choose-k-formula.png'  width='280px' height='280px' align='rigth'/>

### 1. What's the probability when n = 10?

In [2]:
def nCr(n, k):
    f = math.factorial
    return f(n) / (f(k) * f(n - k))

1 - ((364 / 365) ** nCr(10, 2)) 

0.11614023654879224

### 2. What's the probability when n is 15?

In [3]:
1 - ((364 / 365) ** nCr(15, 2))

0.25028790861398265

### 3. Implement the birthday_probability function

In [4]:
def birthday_probability(number_of_people):
    return 1 - ((364 / 365) ** nCr(number_of_people, 2))

(birthday_probability(10), birthday_probability(15))

(0.11614023654879224, 0.25028790861398265)

### Premier League Birthday Paradox Analysis

In [5]:
# load csv
df = pd.read_csv('Premier_League_2022-2023.csv')

In [6]:
df.head()

Unnamed: 0.1,Unnamed: 0,player_name,team,birthday,position
0,0,Daniel Adu-Adjei,AFC Bournemouth,21/06/2005,FW
1,1,Jaidon Anthony,AFC Bournemouth,01/12/1999,FW
2,2,Philip Billing,AFC Bournemouth,11/06/1996,MF
3,3,David Brooks,AFC Bournemouth,08/07/1997,FW
4,4,Ryan Christie,AFC Bournemouth,22/02/1995,MF


In [7]:
df['team'].value_counts()

team
Nottingham Forest          39
West Ham United            38
Liverpool FC               37
Leeds United               37
Manchester United          36
AFC Bournemouth            34
Brighton & Hove Albion     34
Fulham FC                  34
Crystal Palace             33
Everton FC                 33
Arsenal FC                 33
Wolverhampton Wanderers    32
Newcastle United           31
Brentford FC               31
Southampton FC             31
Leicester City             31
Chelsea FC                 30
Aston Villa                30
Manchester City            27
Tottenham Hotspur          27
Name: count, dtype: int64

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 658 entries, 0 to 657
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Unnamed: 0   658 non-null    int64 
 1   player_name  658 non-null    object
 2   team         658 non-null    object
 3   birthday     658 non-null    object
 4   position     658 non-null    object
dtypes: int64(1), object(4)
memory usage: 25.8+ KB


### 4. Create the Birth Date column

In [9]:
df['birthday'] = pd.to_datetime(df['birthday'], format='%d/%m/%Y')

In [10]:
df['Birth Date'] = df['birthday'].dt.strftime("%m-%d")

In [11]:
df.head()

Unnamed: 0.1,Unnamed: 0,player_name,team,birthday,position,Birth Date
0,0,Daniel Adu-Adjei,AFC Bournemouth,2005-06-21,FW,06-21
1,1,Jaidon Anthony,AFC Bournemouth,1999-12-01,FW,12-01
2,2,Philip Billing,AFC Bournemouth,1996-06-11,MF,06-11
3,3,David Brooks,AFC Bournemouth,1997-07-08,FW,07-08
4,4,Ryan Christie,AFC Bournemouth,1995-02-22,MF,02-22


Using combinations, I can take all the samples in paris (`r=2`) to compare them:

| Name | Birthday  |
|------|-----------|
| John | March 5th |
| Mary | Sept 20th |
| Rob  | March 5th |


| Person 1 | Person 2  |
|------|-----------|
| John | Mary |
| John | Rob |
| Mary  | Rob |

Using Python:

In [12]:
from itertools import combinations

In [13]:
# Example
names = ["Rodrigo Bahia", "Lucas Fields", "Arthur comegundi", "Ian Alcatra"]

In [14]:
list(combinations(names, 2))

[('Rodrigo Bahia', 'Lucas Fields'),
 ('Rodrigo Bahia', 'Arthur comegundi'),
 ('Rodrigo Bahia', 'Ian Alcatra'),
 ('Lucas Fields', 'Arthur comegundi'),
 ('Lucas Fields', 'Ian Alcatra'),
 ('Arthur comegundi', 'Ian Alcatra')]

In [15]:
# Same for birthdays
birthdays = ["sep 20th", "mar 5th", "sep 20th", "jan 15th"]

In [16]:
list(combinations(birthdays, 2))

[('sep 20th', 'mar 5th'),
 ('sep 20th', 'sep 20th'),
 ('sep 20th', 'jan 15th'),
 ('mar 5th', 'sep 20th'),
 ('mar 5th', 'jan 15th'),
 ('sep 20th', 'jan 15th')]

In [17]:
names_df = pd.DataFrame(combinations(names, 2), columns=["Person 1", "Person 2"])
names_df

Unnamed: 0,Person 1,Person 2
0,Rodrigo Bahia,Lucas Fields
1,Rodrigo Bahia,Arthur comegundi
2,Rodrigo Bahia,Ian Alcatra
3,Lucas Fields,Arthur comegundi
4,Lucas Fields,Ian Alcatra
5,Arthur comegundi,Ian Alcatra


In [18]:
birthdays_df = pd.DataFrame(combinations(birthdays, 2), columns=['Birthday 1', 'Birthday 2'])
birthdays_df

Unnamed: 0,Birthday 1,Birthday 2
0,sep 20th,mar 5th
1,sep 20th,sep 20th
2,sep 20th,jan 15th
3,mar 5th,sep 20th
4,mar 5th,jan 15th
5,sep 20th,jan 15th


Combining it:

In [19]:
df_concat = pd.concat([names_df, birthdays_df], axis=1)
df_concat

Unnamed: 0,Person 1,Person 2,Birthday 1,Birthday 2
0,Rodrigo Bahia,Lucas Fields,sep 20th,mar 5th
1,Rodrigo Bahia,Arthur comegundi,sep 20th,sep 20th
2,Rodrigo Bahia,Ian Alcatra,sep 20th,jan 15th
3,Lucas Fields,Arthur comegundi,mar 5th,sep 20th
4,Lucas Fields,Ian Alcatra,mar 5th,jan 15th
5,Arthur comegundi,Ian Alcatra,sep 20th,jan 15th


In [20]:
df_concat.loc[(df_concat['Birthday 1']) == (df_concat['Birthday 2'])]

Unnamed: 0,Person 1,Person 2,Birthday 1,Birthday 2
1,Rodrigo Bahia,Arthur comegundi,sep 20th,sep 20th


### End of examples! Time to practice on the dataset.

Building a function for reuse across different teams.

In [21]:
def BirthdayParadox(name_of_team):
    #select the team
    team_df = df.loc[df['team'] == name_of_team] 
    
    # Combining the names
    players_df = pd.DataFrame(combinations(team_df['player_name'], 2), columns=['player 1', 'player 2']) 
    
    # Combinig the birthdays
    birthday_df = pd.DataFrame(combinations(team_df['Birth Date'], 2), columns=['birthday 1', 'birthday 2'])
    
    # concat
    df_concat = pd.concat([players_df, birthday_df], axis=1)
    
    return df_concat.loc[(df_concat['birthday 1']) == (df_concat['birthday 2'])]

#### Activities

### 5. How many pairs of players share a birthday for the Liverpool FC?

In [22]:
BirthdayParadox('Liverpool FC') 

Unnamed: 0,player 1,player 2,birthday 1,birthday 2
97,Alisson,Roberto Firmino,10-02,10-02
188,Luke Chambers,Darwin Núñez,06-24,06-24
256,Harvey Davies,Layton Stewart,09-03,09-03


#### R: 3

### 6. What is the probability of the Liverpool FC team?

In [23]:
birthday_probability((df['team'] == 'Liverpool FC').sum())

0.8391304739689956

In [24]:
(df['team'] == 'Liverpool FC').sum()

37

#### R: 0.83

### 7. How many pairs of players share a birthday for the Manchester City?

In [25]:
BirthdayParadox('Manchester City') 

Unnamed: 0,player 1,player 2,birthday 1,birthday 2
177,Phil Foden,John Stones,05-28,05-28
178,Phil Foden,Kyle Walker,05-28,05-28
232,João Cancelo,Aymeric Laporte,05-27,05-27
348,John Stones,Kyle Walker,05-28,05-28


#### R: 4

### 8. What is the probability of the Manchester City team?

In [26]:
birthday_probability((df['team'] == 'Manchester City').sum())

0.6182401629679479

In [27]:
(df['team'] == 'Manchester City').sum()

27

### 9. In the Arsenal FC, who shares a birthday with Fábio Vieira?


In [28]:
BirthdayParadox('Arsenal FC')

Unnamed: 0,player 1,player 2,birthday 1,birthday 2
187,Fábio Vieira,Eddie Nketiah,05-30,05-30


#### R: Eddie Nketiah

### 10. Which team has the most shared birthdays?

In [29]:
array = []

for i in df['team']:
    if(i in array):
        pass
    else:
        array.append(i)
        
for j in array:
    print(f"{j}: {BirthdayParadox(j).shape[0]}")

AFC Bournemouth: 0
Arsenal FC: 1
Aston Villa: 1
Brentford FC: 1
Brighton & Hove Albion: 0
Chelsea FC: 1
Crystal Palace: 1
Everton FC: 1
Fulham FC: 1
Leeds United: 3
Leicester City: 0
Liverpool FC: 3
Manchester City: 4
Manchester United: 3
Newcastle United: 2
Nottingham Forest: 0
Southampton FC: 0
Tottenham Hotspur: 0
West Ham United: 2
Wolverhampton Wanderers: 0


#### R: Manchester City