# VARIANCE

## Problem 1

Suppose you have 3 jars with the following distributions:

- Jar 1 --> 4 red ball and 7 white balls. It has 33% probability of selection.
- Jar 2 --> 16 red balls and 3 white balls. It has 28% probability of selection.
- Jar 3 --> 11 red balls and 9 white balls. It has 39% probability of selection.

A random sample of size n = 6 is drawn.  
X represents the number of red balls in the sample.  
Calculate the variance of the random variable X = Var[X].

In [1]:
from neoBayesian.tools.variance import *
from neoBayesian.models.discrete import BinomialDist

In [2]:
sample = 6

jar1proba = 4/11
jar2proba = 16/19
jar3proba = 11/20

jar1weight = 0.33
jar2weight = 0.28
jar3weight = 0.39

# You can pass any value for the second parameter 'k'
# since it is not used to find the variance or expected value
jar1var = BinomialDist(sample, 0, jar1proba, "var")
jar2var = BinomialDist(sample, 0, jar2proba, "var")
jar3var = BinomialDist(sample, 0, jar3proba, "var")

jar1exp = BinomialDist(sample, 0, jar1proba, "exp")
jar2exp = BinomialDist(sample, 0, jar2proba, "exp")
jar3exp = BinomialDist(sample, 0, jar3proba, "exp")

In [3]:
varianceWithFormula(
    [jar1var, jar2var, jar3var],
    [jar1exp, jar2exp, jar3exp],
    [jar1weight, jar2weight, jar3weight]
)

Within variance values:
--> (1.388 * 0.33) + (0.798 * 0.28) + (1.485 * 0.39)
--> Var[X=x|W_n] * W_n: 0.458 + 0.223 + 0.579

Between expected values:
--> (2.182 * 0.33) + (5.053 * 0.28) + (3.3 * 0.39)
--> E[X=x|W_n] * W_n: 0.72 + 1.415 + 1.287

E[X]:  3.422
E[X]^2:  11.708
E[X^2]:  12.966
---------------------------------------

Within-group variance: 1.26071
Between-group variance: 1.25787
VARIANCE: 2.51858


2.5185814381767857

In [4]:
# the brute force method is implemented only for the binomial distribution.
# both functions produce the same result.
varianceBruteForce(sample, 
   [(jar1proba, jar1weight), 
    (jar2proba, jar2weight), 
    (jar3proba, jar3weight)]
)

Pr(X = k:0) -->  (0.066 * 0.33) + (0.0 * 0.28) + (0.008 * 0.39)
Pr(X = k:1) -->  (0.228 * 0.33) + (0.0 * 0.28) + (0.061 * 0.39)
Pr(X = k:2) -->  (0.325 * 0.33) + (0.007 * 0.28) + (0.186 * 0.39)
Pr(X = k:3) -->  (0.248 * 0.33) + (0.047 * 0.28) + (0.303 * 0.39)
Pr(X = k:4) -->  (0.106 * 0.33) + (0.188 * 0.28) + (0.278 * 0.39)
Pr(X = k:5) -->  (0.024 * 0.33) + (0.401 * 0.28) + (0.136 * 0.39)
Pr(X = k:6) -->  (0.002 * 0.33) + (0.357 * 0.28) + (0.028 * 0.39)

--------------
E[X]:  3.422
E[X]^2:  11.708
E[X^2]:  14.227

Var[X]:  2.519


2.5185814381767795

## Problem 2

A factory produces 3 types of gadgets. The mean number of defective products each day per type is:

- type 1 --> 6.4
- type 2 --> 14.3
- type 3 --> 24.2

A large sample of gadges is taken with 31% of type 1, 42% of type 2, and 27% of type 3.   
Random variable X represents the number of defective products observed. Caculate Var[X]

In [5]:
# Poisson distribution is used to model defective gadgets
# Lambda = expected value = variance
varianceWithFormula(
    [6.4, 14.3, 24.2],
    [6.4, 14.3, 24.2],
    [0.31, 0.42, 0.27]
)

Within variance values:
--> (6.4 * 0.31) + (14.3 * 0.42) + (24.2 * 0.27)
--> Var[X=x|W_n] * W_n: 1.984 + 6.006 + 6.534

Between expected values:
--> (6.4 * 0.31) + (14.3 * 0.42) + (24.2 * 0.27)
--> E[X=x|W_n] * W_n: 1.984 + 6.006 + 6.534

E[X]:  14.524
E[X]^2:  210.947
E[X^2]:  256.706
---------------------------------------

Within-group variance: 14.524
Between-group variance: 45.75962
VARIANCE: 60.28362


60.283624