# Lecture 9: Averages

In [1]:
import numpy as np

from helper_functions import nCr

## Part 1. Definition

When talking about averages there are often different ways to compute it:

1. Mean
2. Median
3. Mode
4. Weighted Averages
5. Expectation values
6. Etc.

For us, the average is defined in the discrete case as:

$$E(x) = \frac{1}{n}\sum_{i=0}^{n}x_i$$

We can also think about it like this,

$$E(x) = \sum_{i=0}^{n}xP(X = x)$$

The first way this is done is the way how everyone does it, but the second one takes a little more thought. Let's create an array that we can take the average of and see how we can implement these two algorithms as functions.

In [2]:
a = np.array([1, 1, 1, 1, 2, 2, 3, 3, 3, 4])

a

array([1, 1, 1, 1, 2, 2, 3, 3, 3, 4])

In [3]:
def mean(a):
    array_sum = 0
    
    for i in a:
        array_sum+=i
    
    return array_sum/len(a)

In [4]:
def probability_mean(a):
    
    unique_vals = np.unique(a)
    
    mean_val = 0
    
    for i in unique_vals:
        prob_i = len(a[a==i])/len(a)
        
        mean_val+=i*prob_i
    
    return mean_val

Now that we have defined our functions, let's try and out and see what happens.

In [5]:
a.mean(), mean(a), probability_mean(a)

(2.1, 2.1, 2.1)

The last function might seem a little weird, but it's simply because we take the weighted average for each unique value. The weight depends on the frequency with which it appears in the list.

We can now apply this definition to various distributions to see what they expectation values ought to be.

## Part 2. Expectation of a bernouli trial

Let $X \sim Bernouli(p)$. We can use our definition of the average above to get the following:

$$E(X) = 1P(X=1) + 0P(X=0)$$

$$\therefore E(X) = p$$

This is really, really ovvious but it must be said because it's known as `fundamental bridge`.

But you would also expect this because if you were to draw from a bernouli distribution infinite times then the probability of success would simply be `p`.

## Part 3. Expectation of binomial distribution

Now let $X \sim Binomial(n, p)$.

$$E(x) = \sum_{k=0}^{n}k {n \choose k} p^k (1-p)^{n-k}$$

Note that $k {n \choose k} = n {n-1 \choose k-1}$. Why? Because this is the story "choosing your committee and then choosing a president is the same as choosing a president and then choosing a committee from the remaining people".

$$E(x) = \sum_{k=0}^{n} n {n-1 \choose k-1} p^k (1-p)^{n-k}$$

We can pull out n and p and adjust the index to get the expectation value equal to `np` by the binomial theorem.

$$E(x) = np\sum_{j=0}^{n-1} {n-1 \choose j} p^j (1-p)^{n-j-1} = np$$ 

Where `j = k-1`.

Note that this makes sense because of linearity, if a binom distribution is just a series of bernouli trials with an expectation value of `p`, then the expectation of a bernouli trial is equal to how many times you've conducted this trial. 

See lecture 10 for a proof on linearity.

### part 3.5. The Binomial Theorem

The binomial theorem states that any sum of two variables to the nth power is equal to:

$$(a+b)^n = \sum_{k = 0}^{n} {n \choose k} a^k b^{n-k}$$

However, what if `a = 1 - b` i.e. the sum of a and b are 1? 

Well, then 

$$(a + b)^n = (p + (1-p))^n = p-p+1$$

$$\therefore \sum_{k = 0}^{n} {n \choose k} p^k (1-p)^{n-k} = 1$$

Because the inside of the completely factorised form will always be equal to 1 and 1 to the nth power will always be 1.

We can even show this computationally...

In [6]:
# Vars
n = 10
p = 0.5

tot = sum([nCr(n, k) * p**(k) * (1-p)**(n-k) for k in range(0, n+1)])

tot

1.0

No matter what values you choose for n and p, the number you get will always be one.

This also kind of makes sense because the equation follows this story:

"The sum of the probability of all outcomes of flipping a coin n times should equal to one". Which makes sense because it is a PMF! 

## Part 4. Geometric distribution.

The geometric distribution is also related to the bernouli trial with the story, "how many failures until the first success"?

PMF:

$$P(X = k) = q^kp = (1-p)^kp$$

i.e. "The probability of k trials being required to get the fist success will be the probability of getting all failures before and the last draw being a success." Makes sense.

This means that the expectation value is:

$$E(X) = \sum_{k=0}^{\infty}kq^kp = p\sum_{k=0}^{\infty}kq^k$$

If you differentiate $\sum_{k=0}^{\infty}q^k = \frac{1}{1-q}$ and mulitply by `q`, then you get $\sum_{k=0}^{\infty}kq^{k} = \frac{q}{(1-q)^2}$

We can insert this into the above equation to get

$$E(X) = \sum_{k=0}^{\infty}kq^kp = \frac{pq}{p^2} = \frac{q}{p}$$

i.e. "The expectation is the rate of the rate of failures to successes." Note that this is always the same as the rate of successes to failures, this is what we might choose if we did the opposite, the number of successes until the first failure. 

In [1]:
def geom(p, k):
    q = (1-p)
    return q**k * p