## Part 1: Sums of uncorrelated random variables

Variance of the sum is the sum of the variance

In [1]:
from numpy.random import randn
import numpy as np
n = 1000000
x1 = np.sqrt(9) * randn(n) # 1M samples from normal distribution with variance=9
print(x1.var()) # 9
x2 = np.sqrt(16) * randn(n) # 1M samples from normal distribution with variance=16
print(x2.var()) # 16
xp = x1 + x2
print(xp.var()) # 25

8.997334037920371
16.00182243912966
24.9763400156124


![title](img/formula1.png)

## Part 2: Weighted sums of uncorrelated random variables: Applications to machine learning and scientific meta-analysis

![title](img/formula2.png)

In [2]:
from numpy.random import randn
import numpy as np
n = 1000000
baseline_var = 10
w = 0.7
x1 = np.sqrt(baseline_var) * randn(n) # Array of 1M samples from normal distribution with variance=10
print(x1.var()) # 10
xp = w * x1 # Scale this by w=0.7
print(w**2 * baseline_var) # 4.9 (predicted variance)
print(xp.var()) # 4.9 (empirical variance) 

10.006355951074033
4.8999999999999995
4.903114416026275


![title](img/formula3.png)

### An ensemble model with equal weights

Imagine that you have built two separate models to predict car prices. While the models are unbiased, they have variance in their errors. That is, sometimes a model prediction will be too high, and sometimes a model prediction will be too low. Model 1 has a mean squared error (MSE) of 1,000 and Model 2 has an MSE of 2,000.

A valuable insight from machine learning is that you can often create a better model by simply averaging the predictions of other models. Let’s demonstrate this with simulations below.



In [3]:
from numpy.random import randn
import numpy as np
n = 1000000
actual = 20000 + 5000 * randn(n)
errors1 = np.sqrt(1000) * randn(n)
print(errors1.var()) # 1000
errors2 = np.sqrt(2000) * randn(n)
print(errors2.var()) # 2000

# Note that this section could be replaced with 
# errors_ensemble = 0.5 * errors1 + 0.5 * errors2
preds1 = actual + errors1
preds2 = actual + errors2
preds_ensemble = 0.5 * preds1 + 0.5 * preds2
errors_ensemble = preds_ensemble - actual

print(errors_ensemble.var()) # 750. Lower than variance of component models!

1000.1305790375542
2002.9286008216736
751.6656691755195
