<a href="https://colab.research.google.com/github/berkayopak/Bernoulli_Binomial_Variance_Examples/blob/master/Bernoulli_binomial_variance_examples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bernoulli distribution

The Bernoulli distribution is a discrete distribution having two possible outcomes labelled by n=0 and n=1 in which n=1 ("success") occurs with probability p and n=0 ("failure") occurs with probability q=1-p, where 0<p<1. It therefore has probability density function

![alt text](http://mathworld.wolfram.com/images/equations/BernoulliDistribution/NumberedEquation1.gif)

which can also be written 

![alt text](http://mathworld.wolfram.com/images/equations/BernoulliDistribution/NumberedEquation2.gif)

The corresponding distribution function is 

![alt text](http://mathworld.wolfram.com/images/equations/BernoulliDistribution/NumberedEquation3.gif)

In [0]:
import operator as op
import functools 
import sys 


def ncr(n, r):
    r = min(r, n-r)
    if r == 0: return 1
    numer = functools.reduce(op.mul, range(n, n-r, -1))
    denom = functools.reduce(op.mul, range(1, r+1))
    return numer//denom

def bernoulli_trial(n, k, p):
    return ncr(n, k) * (p**k) * ((1 - p)**(n - k))


### Example
A coin is thrown 6 times. How is the probability that the result is exactly 4 times 'head' and 2 time 'tail':

In [74]:
print("4 times head : ")
print(bernoulli_trial(6, 4, 0.5))
print("2 times tail : ")
print(bernoulli_trial(6, 2, 0.5))

4 times head : 
0.234375
2 times tail : 
0.234375


# Binomial distribution
A binomial distribution can be thought of as simply the probability of a SUCCESS or FAILURE outcome in an experiment or survey that is repeated multiple times. The binomial is a type of distribution that has two possible outcomes (the prefix “bi” means two, or twice). For example, a coin toss has only two possible outcomes: heads or tails and taking a test could have two possible outcomes: pass or fail.

## Criteria

Binomial distributions must also meet the following three criteria:

   * The number of observations or trials is fixed. In other words, you can only figure out the probability of something happening if you do it a certain number of times. This is common sense—if you toss a coin once, your probability of getting a tails is 50%. If you toss a coin a 20 times, your probability of getting a tails is very, very close to 100%.
   * Each observation or trial is independent. In other words, none of your trials have an effect on the probability of the next trial.
   * The probability of success (tails, heads, fail or pass) is exactly the same from one trial to another.


### Example
60% of people who purchase sports cars are men.  If 10 sports car owners are randomly selected, find the probability that exactly 7 are men.

In [75]:
n=10#float(input("Enter number of objects:\n"))
p=0.6#float(input("Enter probability of success:\n"))
start=7#int(input("Enter start of range:\n"))
end=7#int(input("Enter end of range:\n"))

def factorial(n):
    if(n==1):
        return 1
    else:
        return n*factorial(n-1)

def binomial(n,p,x):
    return ((factorial(n)/(factorial(n-x)*factorial(x)))*(p**x)*((1-p)**(n-x)))

ans=0
for i in range(start,end+1): #true value of range
    ans=ans+binomial(n,p,i)

print("Answer is",ans)

Answer is 0.21499084799999998


# Variance
The Variance is defined as the average of the squared differences from the Mean.

To calculate the variance follow these steps:
* Work out the Mean (the simple average of the numbers)
* Then for each number: subtract the Mean and square the result (the squared difference).
* Then work out the average of those squared differences.

## Why square the differences?
If we just add up the differences from the mean ... the negatives cancel the positives: 

![alt text](https://www.mathsisfun.com/data/images/standard-deviation-why-a.gif)
```(4 + 4 − 4 − 4) / 4 = 0```

So that won't work. How about we use absolute values?

![alt text](https://www.mathsisfun.com/data/images/standard-deviation-why-a.gif)
 	```(|4| + |4| + |−4| + |−4|) / 4 
  = (4 + 4 + 4 + 4) / 4 = 4```

That looks good (and is the Mean Deviation), but what about this case:

![alt text](https://www.mathsisfun.com/data/images/standard-deviation-why-b.gif)
``` (|7| + |1| + |−6| + |−2|) / 4 = (7 + 1 + 6 + 2) / 4 = 4```

Oh No! It also gives a value of 4, Even though the differences are more spread out.

So let us try squaring each difference (and taking the square root at the end): 

![alt text](https://www.mathsisfun.com/data/images/standard-deviation-why-b.gif)
```√( (7^2 + 1^2 + 6^2 + 2^2) / 4) = √( 90 / 4 ) = 4,74...```

That is nice! The Standard Deviation is bigger when the differences are more spread out ... just what we want.

In fact this method is a similar idea to distance between points, just applied in a different way.

And it is easier to use algebra on squares and square roots than absolute values, which makes the standard deviation easy to use in other areas of mathematics.


### Example

In [0]:
def mean(data):
    n = len(data)
    if n < 1:
        raise StatisticsError('mean requires at least one data point')
    total = sum(data)
    return total/n

def _ss(data):
    """Return sum of square deviations of sequence data."""
    c = mean(data)
    total = _sum((x-c)**2 for x in data)
    total2 = _sum((x-c) for x in data)
    total -=  total2**2/len(data)
    assert not total < 0, 'negative sum of square deviations: %f' % total
    return (total)
def variance(data):
    n = len(data)
    if n < 2:
        raise StatisticsError('variance requires at least two data points')
    ss = _ss(data)
    return ss/(n)

You and your friends have just measured the heights of your dogs (in millimeters):
![alt text](https://www.mathsisfun.com/data/images/statistics-dogs-graph.gif)

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.

Find out the Mean, the Variance, and the Standard Deviation.

Your first step is to find the Mean:

Mean 	= 	(600 + 470 + 170 + 430 + 300) / 5   = 	1970 / 5   = 	394

so the mean (average) height is 394 mm. Let's plot this on the chart:

![alt text](https://www.mathsisfun.com/data/images/statistics-dogs-mean.gif)

Now we calculate each dog's difference from the Mean:

![alt text](https://www.mathsisfun.com/data/images/statistics-dogs-deviation.gif)

To calculate the Variance, take each difference, square it, and then average the result:


```
# Variance
σ^2   = 	(206^2 + 76^2 + (−224)^2 + 36^2 + (−94)^2) / 5
  	= 	(42436 + 5776 + 50176 + 1296 + 8836) / 5
  	= 	108520 / 5
  	= 	21704
```



In [83]:
data = [600, 470, 170, 430, 300]
print("So the Variance is " + str(variance(data)))

So the Variance is 21704.0
