# Bernoulli distribution

In real-life scenarios or applications, it is necessary to keep track of whether a specific event occurs. The outcome of such events is recorded as a success or failure. Some places where bernoulli can be used are:

- A newborn child being male or female.
- `Healthcare:` Success/failure of a medical treatment
- Transmission or non-transmission of a disease
- `Quality control`: determine whether an individual product is defective or non-defective.
- `Finance`: modeling the deafult or non-default of a loan, evaluating whether a particular investment strategy leads to a gain or loss
- `Marketing and customer behaviour`: determine whether a customer makes a purchase (is converted) or not after seeing an advert (this is known as customer conversion)
- `Sports analytics`: determine whether a player makes or misses a specfic action like a goal
- `IT`: spam detection, system uptime (modeling whether a system is up or down)

# Bernoulli experiment with n number of trials = AKA Binomial distribution

The cases below lead to bernoulli experiments and the binomial distribution. That is, a bernoulli experiment is simply repeated n number of independent trials with each trial having exactly two kinds of outcome.

`1. Flip a coin 12 times, count the number of heads (the number of successes) in these trials.`

Here each flip is a trial, so that means we have 12 independent trials. Each trial has one and only one outcome heads = success or tails = failure. 

In each trial, the prob of success is 1/2, i.e p=1/2. 

The probability of failure is therefore q = 1-p = 1-(1/2) = 1/2.

Our interest is in the variable X which counts the number of successes in 12 trials.

**P.S: This is a bernoulli experiment with 12 trials**

`2. A basketball player takes 4 independent trials with the probability of 0.7 of getting a basket on each shot. What is the count of the number of baskets made.`

Here, p = 0.7 based on historical data.

By calculation, the probability of failure (q) = 1-p = 1-0.7.

There are 4 independent trials. Each trial has one outcome, make the basket (success) or no basket (failure). 

We are interested in the variable X which counts the number of successs in 4 trials. 

**P.S: This is a bernoulli experiment with 4 trials**


`3. A bag contains 6 red marbles and 4 marbles. Five marbles (5 trials) are drawn from the bag without replacement. What is the number of red marbles observed.` 

A trial here could consist of drawing a marble from the bag and the success can be defined as getting a red. 

However, this is not a bernoulli experiment since on each trial, the probability (of success and failure) changes because the balls taken are not replaced. That is , we would have p=6/10 on the first trial, p=5/9 on the second trial ans so forth.

`4. A bag contains 6 red marbles and 4 blue marbles. A marble is drawn at random from the bag, its color is noted and then replaced. 5 marbles (5 trials) are drawn in this way and the number of red marbles (successes) are of interest.`

In this case, we can say this is a bernoulli experiment where each time we draw a marble represents a trial.

The trials are also independent since we draw randomly from the bag and the probability of success is the same on each trial (p=6/10) since we replace the balls after each trial.

**Example:**

A basketball player takes 4 independent free throws with a probability of 0.7 of getting a basket on each shot. 

`a) What is the probability that he gets exactly 2 baskets. Let X = the number of baskets he gets.`

Therefore, X = 2, n=4, p=0.7

The formula for the bernoulli distribution for n number of trials is given as:

$$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$$


where:
- n/k = binomial coefficient, calculated as:

$$\frac {n!}{k!(n-k)!}\$$

- p = probability of success in a single trial
- (1-p) = probability of failure in a single trial
- n = total number of trials
- k = number of successes


$$P(X = 2) = \binom{4}{2} 0.7^2 (1-0.7)^{4-2}$$

First calculate the binomial coefficient;

$$\binom{4}{2} = \frac{4!}{2!(4-2)!} = \frac{4\times 3\times 2\times 1}{2\times 1\times 2\times 1} = \frac{4\times 3}{2\times 1}\$$

Calculate p^k and (1-p)^(n-k)

$$\ (0.7)^2 = 0.49\$$

$$\ (1-0.7)^{4-2} = (0.3)^2 = 0.09\$$

Putting things together:

$$ \ P(X=2) = {6\times 0.49\times 0.09} = 0.2646\$$

**Therefore the probability of the basketball player getting exactly two baskets out of 4 free throws is approximately 0.26 or 26%**

`b) What is the probability that he gets all 4 shots/baskets?`

$$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$$

$$P(X = 4) = \binom{4}{4} 0.7^4 (1-0.7)^{4-4}$$

$$\binom{4}{4} = \frac{4!}{4!(4-4)!} = \frac{4!}{4!(0)!} = \frac{24}{24} = 1\$$

Calculate p^k and (1-p)^(n-k)

$$\ (0.7)^4 = 0.2401\$$

$$\ (1-0.7)^{4-4} = (0.3)^0 = 1\$$

Putting things together:

$$ \ P(X=4) = {1\times 0.2401\times 1} = 0.2401\$$

**Therefore the probability of the basketball player getting all baskets out of 4 free throws is approximately 0.24 or 24%**


**In other words:**

$$P(X=k=n) = (p)^n$$

**On the flip side:**

$$P(X=k=n-n) = (1-p)^n$$

The above means that to calculate the probability that the player did not get any basket, i.e, k = 0 (n-n), it becomes the probability of failure (1-p) multiplied by the number of chances(trials = n) which he had.

## Binomial random variables

For a bernoulli experiment with n trials, X denotes the number of successes in the n trials, where the probability of success in each trial is p. The distribution of the variable X is called `binomial distribution with parameters n,p`

Therefore, the expected value of X can be calulated as:

$$E(X) = np$$

and the standard deviation of X is calculated as:

$$\sigma (X) = \sqrt{n \cdot p \cdot q}$$


### Example:

`1. If a basketball player takes 8 independent free throws, with a probability of 0.7 of getting a basket on each shot. What is the probability that he gets exactly 6 baskets?`

$$P(X = 6) = \binom{8}{6} 0.7^6 (1-0.7)^{8-6}$$

First calculate the binomial coefficient;

$$\binom{8}{6} = \frac{8!}{6!(8-6)!} = \frac{8\times 7}{2\times 1} = \frac{56}{2} = 28\$$

Calculate p^k and (1-p)^(n-k)

$$\ (0.7)^6 = 0.12\$$

$$\ (1-0.7)^{8-6} = (0.3)^2 = 0.09\$$

Putting things together:

$$ \ P(X=8) = {28\times 0.12\times 0.09} = 0.30 \$$

**The probability of the basketball player getting exactly 6 baskets out of 8 free throws is approximately 0.30 or 30%**

`What then is the expected number of baskets that she gets?`

$$E(X) = np$$

$$np = 8*0.7 = 5.6$$

**Not 6, or 5. The expected value doesn't have to be a value that can actually occur**

`2. A student is given a multiple choice exam with 10 questions, each question with 5 possible answers. He guesses randomly for each question, where the probability of getting a question is 0.2.`

a) What is the probability that she will get exactly 6 questions correct?

b) What is the probability he will get **at least 6** questions correct?

c) What is the expected number of correct answers and what is the standard deviation.

Here, 

n=10, p=0.2

`a) X=6`


$$P(X = 6) = \binom{10}{6} 0.2^6 (1-0.2)^{10-6}$$

First calculate the binomial coefficient;

$$\binom{10}{6} = \frac{10!}{6!(10-6)!} = \frac{10\times 9\times 8\times 7}{4\times 3\times 2\times 1} = \frac{5040}{24} = 210\$$

Calculate p^k and (1-p)^(n-k)

$$\ (0.2)^6 = 0.000064\$$

$$\ (1-0.2)^{10-6} = (0.8)^4 = 0.4096\$$

Putting things together:

$$ \ P(X=6) = {210\times 0.000064\times 0.4096} =0.0055 \$$

**The probability of the student getting exactly 6 questions out of 10 is approximately 0.55%**

`b) X>=6`

**`P(X>=6) = P(X=6) + P(X=7) + P(X=8) + P(X=9) + P(X=10) = 0.0063`**

The probability of the student getting at least 6 questions is 0.63%.


`c) E(X) and sigma(X)`

$$E(X) = np = 10*0.2 = 2$$ 

The expected number of correct answers is 2, while the standard deviation:

$$\sigma (X) = \sqrt{10 \cdot 0.2 \cdot 0.8} = \sqrt{1.6} = 1.26$$

`3. The AllWell Lightbulb company produces light bulbs, which are packaged in boxes of 20 for shipment. Tests have shown that 4% of their light bulbs are defective.`

a) What is the probability that a box, ready for shipment, contains exactly 3 defective light bulbs?

$n = 20, p = 0.04, k=3$

$$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$$

$$P(X = 3) = \binom{20}{3} 0.04^3 (1-0.04)^{20-3}$$

First calculate the binomial coefficient;

$$\binom{20}{3} = \frac{20!}{3!(20-3)!} = 1.140\$$

Calculate $p^k$ and $(1-p)^{(n-k)}$

$$p^k = \ (0.04)^3\$$

$$(1-p)^{(n-k)} = \ (1-0.04)^{20-3} = (0.96)^{17}\$$

Putting things together:

$$ \ P(X=6) = {1.140\times (0.04)^3\times (0.96)^{17}} =0.036 \$$

**The probability that a box ready for shipment contains exactly three defective light bulbs is 3.6%.**


b) What is the probability that the box contains 3 or more defective bulbs?

`P(X>=3) = 1 - (P(X=2) + P(X=1) + P(X=0))`

Let:

$C = \binom{n}{k}$

$P(X=2) = {190\times (0.04)^2\times (0.96)^{18}} = 0.146$


$P(X=1) = {20\times (0.04)^1\times (0.96)^{19}} = 0.368$


$P(X=0) = {1\times (0.04)^0\times (0.96)^{20}} = 0.234$

$P(X>=3) = 1 - (0.146+0.368+0.234) = 1 - 0.748 = 0.252$

**The probability that a box ready for shipment contains three or more defective light bulbs is 25.2%.**

`4. Suppose that the voting population in Nigeria is 300 million and that 60% of this voting population intend to vote for Xavier in the next election. We take a random sample of 100 persons from this same voting popluation and ask each person chosen whether they will vote for Xavier in the next election or not.`

**X is the number of YESes in the sample. The possible values of X (i.e the number of successes) are 0,1,2,3,.....,100. => n = 100, p = 0.6, q=0.4**

Calculate:

`a) What is the probability that we get exactly 60 YESes P(X=60)`





$P(X=60) = 0.0812$


`b) What is the probability that we get 20 YESes or less P(X<=20)?`

$$P(X<=20) = P(X=20) + P(X=19) + P(X=18) + ..... + P(X=0) = 3.42 * 10^{-16}$$

`c) What is the probability that we get more than 70 YESes P(X>70)?`

$$P(X>70) = 1 - P(X<=70) = 0.0147$$

Here, the equation `P(X>70) = 1 - P(X<=70)` applies because:

`P(X<=k)` represents the probability that X takes a value less than or equal to k

`P(X>k)` represents the probability that X takes a value greater than k

These two probabilities are complementary events, in other words, it is either X <= k or X> k, so we can have:

`P(X<=k) + P(X>k) = 1`,

Rearranging:

`P(X>k) =  1- P(X<=k)`


`d) What is the probability that we get less than 50 YESes P(X<50)?`

$$P(X<50) = P(X<=49) = 0.0167$$

`e) What is the probability that we get between 50 to 60 YESes P(50<= X <=60)?`

$$P(50<= X <=60) = P(X=50) + P(X=51) + P(X=52) + ..... + P(X=60) = 0.521$$

**Other forms of complementary rules include:**'

Where n=10

**1. P(X>=2) = P(X=2) + ..... + P(X=10)**

This can be rewritten:

$$P(X>=2) + P(X<2) = 1$$

Rearranging: $$P(X>=2) = 1-P(X<2)$$

Remmeber also that `P(X<2) can be represented as P(X<=1)`

$$P(X>=2) = 1-P(X<=1)$$


Using scipy.stats cdf, we would have:

$$P(X >=2) success \ out \ of \ 10 \ trials = 1 - stats.binom.cdf(k=1, n=n_{trials}, p={prob_success})$$

## Creating functions to get the work done

In [1]:
from scipy import stats

def binomial_proba_exact(k,n,p):
    return stats.binom.pmf(k,n,p)

In [2]:
prob = binomial_proba_exact(k=60, n=100, p= 0.6)
prob

0.08121914499610608

### Creating a function that calculates the probability for:

- A specific number of successes (X = k)
- Greater than (X > k) or less than (X < k) a specific number in a binomial distribution
- Greater than/equal (X>=k) or Less than/equal(X<=k) 


**ARGUMENTS:**

- n = number of trials

- p = probability of success in each trial

- k = specific number of successes

In [3]:
def prob_func(n, p, k=None, comparison='equal'): ##default is equal
    
    if k is not None:
        if comparison == 'less_than_equal':
            return stats.binom.cdf(k,n,p)
        
        elif comparison == 'greater_than_equal':
            return 1-stats.binom.cdf(k-1, n, p)
        
        elif comparison == 'greater_than':
            return 1-stats.binom.cdf(k, n, p)
        
        elif comparison == 'less_than':
            return 1 - (1- stats.binom.cdf(k-1, n, p))
        
        elif comparison == 'equal':
            return stats.binom.pmf(k,n,p)
        
        else:
            raise ValueError('Invalid comparison. \n Choose between: less_than_equal, greater_than_equal, greater_than, less_than, or equal')
    else:
        raise ValueError('Provide k value')

In [4]:
prob_func(n=100, p=0.6, k=50, comparison='less')

ValueError: Invalid comparison. 
 Choose between: less_than_equal, greater_than_equal, greater_than, less_than, or equal

In [6]:
round(prob_func(n=100, p=0.6, k=50, comparison='less_than'), 4)

0.0168

In [7]:
round(prob_func(n=100, p=0.6, k=70, comparison='greater_than'), 4)

0.0148

In [9]:
prob_func(n=100, p=0.6, k=20, comparison='less_than_equal')

3.420435841660837e-16

In [15]:
#using python
import scipy.stats as stats

n = 100
p = 0.3

#calculating the cumulative probability P(X <= 39)
cdf = stats.binom.cdf(39, n, p)

#the probability
prob_40 = (1 - cdf)*100

print(f'The probability of selling at least 40 cartons daily = {prob_40:.2f}%')

The probability of selling at least 40 cartons daily = 2.10%


In [21]:
prob_func(n, p, k=10, comparison='equal')

1.1704179678540452e-06