In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

## Weighted averages, in the context you are probably most familiar with them

Being students, I'm sure that you are familiar with weighted averages as they apply to course grades.  For example, you might be something like: "Homework will be 20% of your grade, the two short mid-term exams will be 20% each, and the final exam will be 40%".

### Question:  

1.1  Write down the formula for grades that corresponds to the sentence above.

## Summary data and weighted averages

Now we are going to work through an exercise that shows another context in which weighted averages occur.  



In [None]:
def rollD6(nTimes):
    return np.random.randint(6, size=nTimes) + 1

In [None]:
# Roll the dice 60 times.
diceRolls = rollD6(60)

# And count the number of times each value occurred
values = np.bincount(diceRolls)
weights = np.arange(7)

In [None]:
print(diceRolls)
print(values, weights)

### Now let's write down the equation for the mean of the data two different ways

#### Using the indvidual rolls

It would look something like 

(4 + 4 + 4 + 4 + 6 + 1 + ... + 6 + 3) / 60

#### Using the bin counts

It would look something like

((10 * 1) + (4 * 2) + (9 * 3) + (16 * 4) + (11 * 5) + (10 * 6)) / 60

### Formulas

mean = $\frac{\sum_i x_i}{n}$

weighted mean = $\frac{\sum_i w_i * x_i}{\sum w_i}$

#### Let's compute both of those using numpy and compare them to the numpy.mean() function

In [None]:
mean_v1 = np.sum(diceRolls) / len(diceRolls)
mean_v2 = np.sum(values*weights) / len(diceRolls)
mean_check = np.mean(diceRolls)

In [None]:
print("V1:    ", mean_v1)
print("V2:    ", mean_v2)
print("Check: ", mean_check)

#### Pro-tip, array multiplication in numpy:

(value*weights) actually multiplies each element in value by each element in weights, 
it is equivalent to 

    n = len(values)
    outArray = np.zero((n))
    for i in range(n):
        outArray[i] = values[i] * weights[i]
        
Or, written mathemetically:

$\bf{v} = \bf{x}\bf{w}$ is equivalent to $v_i = x_i * w_i$ for each element $i$, and we use **bold** to indicate arrays.
        

In [None]:
values*weights

### When summary data "loses information"

Now, instead of rolling a dice, lets pick a bunch of real numbers between 0.5 and 6.5 and use a histogram to summarize that information.

The "a.u." on the axes labels stands for "abitrary units".

In [None]:
dataSample = np.random.uniform(low=0.5, high=6.5, size=60)
hist = plt.hist(dataSample, bins=np.linspace(0.5, 6.5, 7))
_ = plt.xlabel("Value [a.u.]")
_ = plt.ylabel("Trials [a.u.]")


In [None]:
# This grabs the bin values and bin edges from the hist data structure that matplotlib returned
values = hist[0]
edges = hist[1]
centers = (edges[0:-1] + edges[1:])/2.

print("Average bin content:  ", np.mean(values))
print("Average value:        ", np.mean(dataSample))
print("Average binned value: ", np.sum(values*centers) / len(dataSample))

# Questions for discussion

1.1 Explain, in your own words, the difference between the three values computed in the previous cell.  

1.2 How would these number change if you changed the bin size when histograming the data?

In [None]:
# This is a cell to try out different binnings for summarize the data

# Questions for discussion

2.1 Under what circumstances might it be easier use weights to compute a mean that to use the individual values?

2.2 In many cases the data might be presented already summarized, or binned into a histogram.  Can you think of some examples in real-world data when this might be the case?  List a few.

2.3 Oftentimes the way we collect data involves some averaging or sampling, so that we are effectively making a histogram as we actually collect the data.  An example of this might be an X-ray detector that counts how many X-rays it sees for per second for 6 seconds, then only sends the total number of X-rays it saw each second. (I.e., it sends out 6 numbers).  Explain how this corresponds to example above.  What does the total number of X-rays seen correspond to from our earlier example?  How about the rate of X-rays?