# Binomial Distribution with Python and Scipy
*By P. Stikker*<br>
https://PeterStatistics.com<br>
https://www.youtube.com/stikpet<br>
____

The distribution of probabilities of the possible $k$ outcomes given $n$ trials, and a probability of success on each trial $p$ remaining the same.

The binomial distribution has the following probability mass function:

$$bpmf\left(n,k,p\right) = \binom{n}{k}p^k\left(1 - p\right)^{n - k}$$

Where $n$ is the number of trials, $k$ the number of successes, and $p$ the probability of success on each trial.

This formula uses the binomial coefficient, which is defined as:
$$\binom{a}{b}=\frac{a!}{b!\left(a-b\right)!}$$

The factorial operator (!) can be defined as:
$$x! = \prod_{i=1}^{x}i$$

The binomial cumulative density function is a simple summing over the various pmf's:
$$bcdf\left(n,k,p\right) = \sum_{i=0}^{\lfloor k\rfloor}bpmf\left(n,i,p\right)$$

Where $\lfloor...\rfloor$ indicates the floor operator, which rounds a decimal towards 0 if it is positive, and away from 0 if it is negative, or in other words it gives the greatest integer less than or equal to the given value.

How these formulas came to be is explained in the appendix.

# Using Scipy

The most simple method to use the binomial distribution is to make use of the <a href="https://www.scipy.org">*scipy*</a> library. If you never used this library before you need to install it first, otherwise you can import the <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html">**binom**</a> function directly from the <a href="https://docs.scipy.org/doc/scipy/reference/stats.html">*stats*</a> sub-library:

In [None]:
#!pip install scipy
from scipy.stats import binom

The functions will usually require the number of successes $k$, the number of trials $n$, and the probability of success on each trial $p$. Lets use as an example $n=10$, $k=4$, and $p=0.3$:

In [None]:
n = 10
k = 4
p = 0.3

The binomial probability mass function can then be used simply by providing the number of successes, the number of trials,  and the probability of success on each trial. 

In [None]:
binom.pmf(k, n, p)

0.2001209489999999

So, the probability of having 4 successes out of 10 trials, if the probability of success would be 0.3 on each trial, is 0.2001.

The cumulative density function is also available as **cdf**:

In [None]:
binom.cdf(k, n, p)

0.8497316674000001

This is, the probability of having 4 successes **or less** out of 10 trials, if the probability of success would be 0.3 on each trial. It is about 0.8497.

Another option is the **survival function** which is the opposite of the **cdf**, i.e. sf = 1 - cdf.

In [None]:
binom.sf(k, n, p)

0.15026833259999992

This is, the probability of **NOT** having 4 successes or less out of 10 trials, if the probability of success would be 0.3 on each trial. 
Note that this is therefor the probability of having 5 or more successes.

## Appendix: The Formula Explained

If you were to repeat an event $n$ times, and each time the probability for a success $p$ is the same, the binomial distribution will show the probability of having $k$ successes.

A dice is a standard example. A regular dice has six sides, and the probability for rolling a five would then be $\frac{1}{6}$, each time we roll the dice.

If we were to roll the dice four times (i.e. $n = 4$), and consider rolling a five a success (hence $p = \frac{1}{6}$), the binomial distribution shows the probability of rolling 0 times a five, 1 time a five, 2 times a five, 3 times a five or even 4 times a five.

If we were to roll 0 times a five, it means we never rolled a five. The chance of not rolling a five (the chance of failure ($q$)):
$$q = 1 - p = 1 - \frac{1}{6} = \frac{5}{6}$$
We need to roll four times in a row, not a 5, so the chance for this is:
$$P\left(k=0\right) = q \times q \times q \times q = q^4 = \left(\frac{5}{6}\right)^4 = \frac{5^4}{6^4} = \frac{625}{1296}\approx0.4823$$

Rolling a five exactly one time, also means not rolling it the other three times. We could have rolled a five on the first throw, and then three times not. The probability for this would be:
$$p \times q \times q \times q = p \times q^3 = \frac{1}{6} \times \left(\frac{5}{6}\right)^3 = \frac{1}{6} \times\frac{5^3}{6^3} = \frac{1}{6} \times\frac{125}{216} = \frac{125}{1296}\approx0.0965$$

But...this is just for having the first roll being a five and the other three not, we could also have only the second roll being a five, or the third, or the fourth. So using 5 for rolling a 5 and X for not, we can represent these with: 5-X-X-X, X-5-X-X, X-X-5-X, X-X-X-5.
However, since we are only multiplying the order does not really matter. So we can simply multiply our previous result by the number of possible variations, in this case four:
$$P\left(k=1\right) = 4\times\frac{125}{1296} = \frac{500}{1296}= \frac{125}{324} \approx0.3858$$

How about rolling twice a five. With the same reasoning we can calculate the probability of one of the possible ways to do this, to be:
$$p \times p \times q \times q = p^2 \times q^2 = \left(\frac{1}{6}\right)^2 \times \left(\frac{5}{6}\right)^2 = \frac{1^2}{6^2} \times\frac{5^2}{6^2} = \frac{1}{36} \times\frac{25}{36} = \frac{25}{1296}\approx0.0193$$
We can have the first two throws being a five, the first and the third, the first and the fourth, the second and the third, the second and the fourth, and the third and the fourth. Short notation: 5-5-X-X, 5-X-5-X, 5-X-X-5, X-5-5-X, X-5-X-5, X-X-5-5. So, in total 6 possible variations:
$$P\left(k=2\right) = 6\times\frac{25}{1296} = \frac{150}{1296}= \frac{25}{216} \approx0.1157$$

Okay, almost there. Rolling it three times:
$$p \times p \times p \times q = p^3 \times q = \left(\frac{1}{6}\right)^3 \times \frac{5}{6} = \frac{1^3}{6^3} \times\frac{5}{6} = \frac{1}{216} \times\frac{5}{6} = \frac{5}{1296}\approx0.0039$$
We can have three times a five with the first three, the first two and the last, the first and the last two, or the last three. Short notation: 5-5-5-X, 5-5-X-5, 5-X-5-5, X-5-5-5. So 3 possible variations:
$$P\left(k=3\right) = 4\times\frac{5}{1296} = \frac{20}{1296}= \frac{5}{324} \approx0.01543$$

And finally four times a five, for which there is only one possible variation (5-5-5-5):
$$P\left(k=4\right) = p \times p \times p \times p = p^4 = \frac{1^4}{6^4} = \frac{1}{1296} \approx0.0008$$

So, the binomial distribution of our dice rolling is a bar-chart with on the horizontal axis our values from k going from 0 to 4, and as heights the found probabilities.




Notice that in the calculations we always had the probability of success ($p$), raised to the power of how many successes ($k$) we wanted, and then multiplied by the probability of failure ($q$) to the power of the remaining number of throws ($n - k$). So if we were to generalize the calculation we could use:
$$p^k\times q^{n-k}$$

The tricky bit is that this needs to be multiplied by the number of possible combinations.

If we have $n$ items that we can choose from, and $k$ spots, we could choose out of $n$ items for our first spot. If we are not allowed to repeat the same item, this leaves $n-1$ items for our second spot, $n-2$ for the third, etc.

Now with $n$ options for the first spot, and $n-1$ options for the second, this gives $n\times\left(n-1\right)$ possible combinations for the first two spots. For the first three spots we get $n\times\left(n-1\right)\times\left(n-2\right)$. We can repeat this uptil all spots are used, i.e. $n\times\left(n-1\right)\times\left(n-2\right)\times...\times\left(n-k+1\right)$. 

To put this in math notation we can make use of the factorial operator !. This indicates a product of all integers going from 1 to the indicated value. So for example $5! = 1\times2\times3\times4\times5$. If we would use $n!$ we get:
$$n! = 1\times2\times...\times\left(n-1\right)\times n$$
This is too much for what we need. We only need it to go up to $n-k-1$. We can get rid of the rest by dividing by $\left(n-k\right)!$.

$$\frac{n!}{\left(n-k\right)!} = \frac{1\times2\times...\times\left(n-k-1\right)\times\left(n-k\right)\times\left(n-k+1\right)\times...\times n}{1\times2\times...\times\left(n-k-1\right)\times\left(n-k\right)}$$

If you cross out the factors that the numerator and denominator have in common, all the factors in the denominator disappear and you are left with:
$$\frac{n!}{\left(n-k\right)!} = \left(n-k+1\right)\times\left(n-k+2\right)\times...\times n$$

So, for example if I roll four times, and interested in rolling twice a five, I get:
$$\frac{4!}{\left(4-2\right)!} = \left(4-2+1\right)\times 4= 3\times 4 = 12$$

These twelve options are first roll and second roll, first roll and third, first roll and fourth, second roll and first, second roll and third, etc.
(on roll 1-2, 1-3, 1-4, 2-1, 2-3, 2-4, 3-1, 3-2, 3-4, 4-1, 4-2, 4-3).

However, as you might notice rolling a five on the first and second, would be the same as rolling it on the second and first. We have too many variations. We need to divide our result by how many of these will be the same. This depends on how many times we wanted a success. In the example it was twice a five, so 2. Now I can pick again out of 2 for my first one, and then only one option left for the second. So in general $k!$. If we were interested in rolling three times a five, then $k=3$ and we'll get for each variation $k!$ times.

Finally we can create the formula for determining how many variations we have in our original example. We need to divide the previous formula by this $k!$:
$$\frac{\left(\frac{n!}{\left(n-k\right)!}\right)}{k!} = \frac{n!}{k!\left(n-k\right)!}$$

This is known as the binomial coefficient, and noted as:
$$\binom{n}{k} = \frac{n!}{k!\left(n-k\right)!}$$

Remember that this only produces how many variations we have, this still needs to be multiplied with our earlier found $p^k\times q^{n-k}$:
$$ \binom{n}{k}\times p^k \times q^{n-k} $$

This is the binomial probability mass function (bpm).