# Binomial Distribution

The Binomial probability model has a wide range of uses in the real world. Binomial is associated with situations where there are two outcomes, frequently labeled success or failure. Research often involves quantifying the probability that an event occurs, the percentage of successes in a population, or the likelihood of certain outcomes that can be classed as success or failure. 

Each probability model has its own set of properties. In order for a random variable to be considered to have the properties or characteristics of a binomial distribution it must meet the following conditions:

- A random process with a fixed number of trials (n).
- The outcome of each trial can be classified as success or failure. 
- The probability of success (p) is the same for each trial.
- The trials are independent, so, the outcome of one trial does not influence the result of the next. 

In this notebook I will demonstrate how to simulate data and variables that follows the binomial distribution.

### Calculating the binomial manually

The binomial distribution has a Probability Mass Function (pmf), it is a discrete distribution as it can only take on certain values (success or failure) and it cannot take on continuous real number values. The pmf is based on k successes in n trials with a probability p of success on each trial.  

The pmf for a random variable X is:

$${𝑛^𝑐𝑥}  {𝑝^x}{(1−𝑝)}^{𝑛−𝑥}$$
 
This formula uses the binomial coefficient: $${𝑛^𝑐𝑥}$$ 
This is often referred to as n choose x. Meaning, from n trials choose x number of successes. Calculating this coefficient makes use of the factorial operator (!) where the specifed number is multiplied by all preceding lower values.

The formula for the binomial coefficient is:

$${𝑛!}/{𝑥!(𝑛−𝑥)!}$$

#### Calculating the factorial of a number

In [1]:
# If we are calculating the binomial manually without using any software libraries we
# need to calculate the binomial coefficient shown above. 

# Python core/ base does not have a factorial operator so we first need to define a function that performs the
# factorial calculation. 

def factorial(x):
    f = 1
    for i in range(x):
        f = f * (i+1)
    return f

In [2]:
# Testing this function works using some arbitrary numbers. 
# This calculation works as: 3x2x1 = 6

factorial(3)

6

In [3]:
# E.g. 4x3x2x1 = 24
factorial(4)

24

In [4]:
# E.g. 8! = 8x7x6x5x4x3x2x1 = 40320
factorial(8)

40320

In [5]:
# Recursion can also be used as an alternative method to calculate the factorial of a number.

def factorial_v2(x):
    if x == 1:
        return 1
    else:
        return (x * factorial_v2(x-1))


In [6]:
# We can see this works and gives the same result as the previous method. 
factorial_v2(3)

6

In [7]:
factorial_v2(4)

24

In [9]:
factorial_v2(8)

40320

#### Calculating the binomial coefficient

We can now make use of the factorial function we created above as part of a function that gives us the binomila coefficient. Here the function will take two values as arguments. The first will be the number of trials (n) and the second will be the number of successes (x). 

In [10]:
# Defining a function that gives us the binomial coefficient for n choose k. 

def binomcoef(n, x):
    comb = factorial(n) / (factorial(x) * factorial(n - x))
    return comb

In [11]:
# Testing the binomcoef function using arbitrary choices. e.g. 10 trial and 2 successes.  
binomcoef(10, 2)

45.0

#### Obtaining the Probability Mass Function (pmf)


In [12]:
# To obtain the pmf for the binomial distribution we need to define a function that takes
# arguments for number of trials (n), number of successes (x), and probability (p) of success on each trial. 

def binom_pmf(n, x, p):
    return binomcoef(n, x) * p**x * (1-p)**(n-x)

In [13]:
# Now testing the binom_pmf function using arbitrary values of 10 trials (n=10), 4 successes (x=4), and a probability
# of success on each trial of 0.3 (p=0.3).

binom_pmf(10, 4, 0.3)

0.2001209489999999

We can see above that the probability of 4 success from 10 trials with a success probability per trial of 0.3 is about 0.2 or 20%. 

#### Obtaining the Cumulative Density Function (cdf)

The binomial cumulative density function (cdf) tells us the probability of some number of successes (x) or less from n trials. To calculate the binomial cdf we make use of the pmf. One difference is that the formula for the cdf makes use of a floor operator. The floor operator rounds a floating point decimal number towards 0 if it is positive and away from 0 if it is negative. It rounds to an integer that is less than or equal to a given value. This fits with the usage of the cdf that tells us the probability that x has a value x or lower. 

In [14]:
# First, I will define a function that carries out the floor operation by rounding down. 
# The below function takes positive values of x and rounds them down to the nearest integer. 
# It takes negative values of x and rounds them down to the next smallest integer (away from 0).

def floor_op(x):
    if x >= 0:
        f = int(x)
    else:
        f = int(x)-1
    return f

In [15]:
# Testing this function on a positive value of x.
# We can see it rounds down to the integer below. The floor of the decimal value. 
floor_op(5.7)

5

In [16]:
# Testing this function with a negative value of x. 
# We can see it rounds down to the next smallest integer, away from 0. 
floor_op(-3.4)

-4

With the above function for the floor operator, we can now incorporate this into a function that gives us the binomial cdf.

In [17]:
# Defining a function to calculate the binomial cdf.

def binom_cdf(n, x, p):
    p_val = 0
    x_floor = floor_op(x)
    for i in range(x_floor+1):
        p_val = p_val + binom_pmf(n, i, p)
    return p_val

In [18]:
# Calling the function using the example of 4 successes from 10 trials, with a success probability of 0.3
# for each trial. 

binom_cdf(10, 4, 0.3)

0.8497316673999995

We can see, using the cdf for the binomial distribution, that the probability of getting 4 successes or less from 10 trials with a success probability of 0.3 on each trial is about 0.85. Approximately an 85% chance of seeing 4 successes or less on trials with the specified success probability. 