# The Birthday Probability Lab
In this lab, a program should be developed where within a group size of 23, there should be a 50% chance that two people would share the same birthday. We create a monte carlo simulation to perform a large number of experiments and determine the ratio of successes to the overall number. 

The experiments need to be stochastic in nature, which mean they can have different outcomes and the outcomes are distributed randomly among all possible outcomes according to a distribution function. Hence, if we run N experiments, and observer how many times a particular outcome of interest in observed M, we can estimate the probability of that outcome ocurring is M / N.

### Importing Random Library
The `random` library in Python provides functions for generating random numbers and selecting random elements from sequences. For this Jupyter Notbeook, we will be using the random library to generate random numbers that will be used for simulations and generating group sizes.

In [1]:
import random

### Generate the Gorup Size of Birthday
In this method, we instantiate a empty list of birthdays. We then use a for loop to generate a birthday number and then add it to the list. We then return that list at the end of the function. 

In [2]:
def generate_group_size_birthdays(group_size=100):
    """
    method for generating a list of birthdays for group sizes
    :param group_size:
    :return: list of birthdays
    """

    birthday_list = []

    for i in range(group_size):
        birthday_list.append((random.randint(1, 365)))

    return birthday_list

### Computing the Probability
In this method, we caluclate the probability of two people having the same birthday. We start off by initiating a count for the number of people who have the same birthday. We then for loop through a number of simulations, where we generate a list of birthdays, where the number of birthdays is kept constant. Next, we turn the list into a set and then compare if the length of the list and the length of the set are the same.

If not, this implies that there are duplicates, hence we incremenet the count of duplicates by one. After the for loop, we calculate the probability by dividing the count of duplicates with the number of simulations. 

In [3]:
def compute_probability(user_group_size=23, number_of_simulation=1000):
    """
    method for computing the probability of 2 people sharing the same birthday
    :param user_group_size:
    :param number_of_simulation:
    :return: the probability value
    """

    duplicate_count = 0
    for i in range(0, number_of_simulation):
        birthday_list_sample = generate_group_size_birthdays(user_group_size)
        if len(birthday_list_sample) != len(set(birthday_list_sample)):
            duplicate_count = duplicate_count + 1
    probability_calculation = duplicate_count / number_of_simulation
    print("The probability of a group size of", user_group_size, "having a common birthday is",
          round(probability_calculation, 2))
    return probability_calculation

### Smallest Group Size
In this method, we set up a experimental_group_size variable, setting it to 1, and the experimental_probability variable, where we set it to 0. Next, we use a while loop, where we set the condition to be less than 0.5. 

While the experimental probability is less than 0.5, we call the compute_probability() method with its parameters, and we keep repeating this until the group size is greater than 0.5. Once the condition is satisfied, we then return the smallest group size at which the probability is greater than 50%

In [4]:
def smallest_group_size(number_of_simulation=1000):
    """
    method to compute the smallest group size that has a probability of greater than 50%
    of two people sharing the same birthday
    :return the probability value:
    """
    experimental_group_size = 1
    experimental_probability = 0
    while experimental_probability < 0.5:
        experimental_probability = compute_probability(len(generate_group_size_birthdays(experimental_group_size)),
                                                       number_of_simulation)
        experimental_group_size = experimental_group_size + 1
    return experimental_group_size

### Method Calls
Here, we call the methods computer_probability() and smallest_group_size()

In [5]:
print(compute_probability())
print(smallest_group_size())
print(compute_probability(50, 500))
print(smallest_group_size(2000))

The probability of a group size of 23 having a common birthday is 0.51
0.514
The probability of a group size of 1 having a common birthday is 0.0
The probability of a group size of 2 having a common birthday is 0.0
The probability of a group size of 3 having a common birthday is 0.01
The probability of a group size of 4 having a common birthday is 0.02
The probability of a group size of 5 having a common birthday is 0.03
The probability of a group size of 6 having a common birthday is 0.04
The probability of a group size of 7 having a common birthday is 0.05
The probability of a group size of 8 having a common birthday is 0.07
The probability of a group size of 9 having a common birthday is 0.09
The probability of a group size of 10 having a common birthday is 0.11
The probability of a group size of 11 having a common birthday is 0.14
The probability of a group size of 12 having a common birthday is 0.17
The probability of a group size of 13 having a common birthday is 0.2
The probabil

### Calling the Help Method
Here, we call the help() method to retrieve the docstring behind the method smallest_group_size()

In [6]:
help(smallest_group_size)

Help on function smallest_group_size in module __main__:

smallest_group_size(number_of_simulation=1000)
    method to compute the smallest group size that has a probability of greater than 50%
    of two people sharing the same birthday
    :return the probability value:



### Benchmarking of Methods using Local PC vs ROSIE Supercomputer
Here, we do benchmarking by using `%%time`. We use this to find the longest running cell, and then compare the run times on the local coputer and the ROSIE supercomputer. We then come up a explanation as to why the runtimes are different.

In [7]:
%%time
list = generate_group_size_birthdays()

CPU times: total: 0 ns
Wall time: 1.01 ms


In [8]:
%%time
benchmark_probability = compute_probability()

The probability of a group size of 23 having a common birthday is 0.52
CPU times: total: 0 ns
Wall time: 29 ms


In [9]:
%%time
small_group_size = smallest_group_size()

The probability of a group size of 1 having a common birthday is 0.0
The probability of a group size of 2 having a common birthday is 0.0
The probability of a group size of 3 having a common birthday is 0.01
The probability of a group size of 4 having a common birthday is 0.01
The probability of a group size of 5 having a common birthday is 0.03
The probability of a group size of 6 having a common birthday is 0.03
The probability of a group size of 7 having a common birthday is 0.05
The probability of a group size of 8 having a common birthday is 0.07
The probability of a group size of 9 having a common birthday is 0.08
The probability of a group size of 10 having a common birthday is 0.12
The probability of a group size of 11 having a common birthday is 0.14
The probability of a group size of 12 having a common birthday is 0.18
The probability of a group size of 13 having a common birthday is 0.21
The probability of a group size of 14 having a common birthday is 0.22
The probability o

### Benchmarking Reuslts for PC
It is evident to see that the longest running cell is the cell that contains the method smallest_group_size() as shown below:

Here are the benchmarking reuslts for PC as follows:
- for generate_group_size_birthdays(), the runtime is 1.01 ms
- for compute_probability(), the runtime is 29 ms
- for smallest_group_size(), the runtime is 202 ms

Here are the benchmarking results for ROSIE as follows:
- for generate_group_size_birthdays(), the runtime is 269 microseconds
- for compute_probability(), the runtime is  28.8 ms
- for smallest_group_size(), the runtime is 244 ms

Here, we can see that the PC performs faster than the ROSIE supercomputer. This is because a PC can perform better than a supercomputer in terms of job scheduling and and performing non-parrllelizable workloads. Thus, this makes the PC have a faster runtime, althought initially it would be thought that the supercomputer would perform better. Furthermore, supercomputers run jobs through a scheduler, which queues tasks based on availability of reources. And even if resources are available, the queueing might cause delays.

### Lab Questions 
1. What is the probability of a group of size 20 that atleast one pair has the same birthday?
2. What is the smallest group size to have a probability of greater than 50% that two people share the same birthday?
3. What changes do you observe in the results as you increase the value of N?

In [10]:
compute_probability(20)

The probability of a group size of 20 having a common birthday is 0.42


0.421

In [11]:
smallest_group_size()

The probability of a group size of 1 having a common birthday is 0.0
The probability of a group size of 2 having a common birthday is 0.0
The probability of a group size of 3 having a common birthday is 0.01
The probability of a group size of 4 having a common birthday is 0.02
The probability of a group size of 5 having a common birthday is 0.03
The probability of a group size of 6 having a common birthday is 0.03
The probability of a group size of 7 having a common birthday is 0.06
The probability of a group size of 8 having a common birthday is 0.08
The probability of a group size of 9 having a common birthday is 0.09
The probability of a group size of 10 having a common birthday is 0.11
The probability of a group size of 11 having a common birthday is 0.14
The probability of a group size of 12 having a common birthday is 0.13
The probability of a group size of 13 having a common birthday is 0.19
The probability of a group size of 14 having a common birthday is 0.21
The probability o

25

In [12]:
smallest_group_size(10000)

The probability of a group size of 1 having a common birthday is 0.0
The probability of a group size of 2 having a common birthday is 0.0
The probability of a group size of 3 having a common birthday is 0.01
The probability of a group size of 4 having a common birthday is 0.02
The probability of a group size of 5 having a common birthday is 0.03
The probability of a group size of 6 having a common birthday is 0.04
The probability of a group size of 7 having a common birthday is 0.05
The probability of a group size of 8 having a common birthday is 0.08
The probability of a group size of 9 having a common birthday is 0.09
The probability of a group size of 10 having a common birthday is 0.11
The probability of a group size of 11 having a common birthday is 0.14
The probability of a group size of 12 having a common birthday is 0.17
The probability of a group size of 13 having a common birthday is 0.19
The probability of a group size of 14 having a common birthday is 0.22
The probability o

24

In [13]:
smallest_group_size(100000)

The probability of a group size of 1 having a common birthday is 0.0
The probability of a group size of 2 having a common birthday is 0.0
The probability of a group size of 3 having a common birthday is 0.01
The probability of a group size of 4 having a common birthday is 0.02
The probability of a group size of 5 having a common birthday is 0.03
The probability of a group size of 6 having a common birthday is 0.04
The probability of a group size of 7 having a common birthday is 0.06
The probability of a group size of 8 having a common birthday is 0.07
The probability of a group size of 9 having a common birthday is 0.09
The probability of a group size of 10 having a common birthday is 0.12
The probability of a group size of 11 having a common birthday is 0.14
The probability of a group size of 12 having a common birthday is 0.17
The probability of a group size of 13 having a common birthday is 0.19
The probability of a group size of 14 having a common birthday is 0.22
The probability o

24

### Responses to Lab Questions
1. The probability of a group of size of 20 that has atleast one pair which has the same birthday is 0.42
2. The smallest group size to have a probability of greater than 50% that two people share the same birthday is 24, given that the number of simulations is 1000
3. As the number of simulations is increased to 10000 and 100000, we can see that the smallest group size stays consists at 24

### Conclusion
To conclude, when we set the group size of birthdays to 20 birthdays, we find the probability of atleast two people having the same birthdays to be 0.42, given that the number of simulations is at 1000. When we run the `smallest_group_size()` method to find the smallest group size, we set the number of simulations to 1000, and we found that the size of the group should be 24, and no matter the increase in the number of simulations, the number 24 stays constant. 

This is known as the Birthday Paradox. The Birthday Paradox, states that in a group of people the probability of two individuals sharing the same birthday which exceeds 50% is 24. We set the threshold for the probability calculations where the probability must be greater than 0.5. No matter by how much we increase the number of simulations, the number is always 24, which means the program is consistent and robust, and isn't erroenous, given the testing with increasing numbers of simulations.