# Probability
## Generating independent and dependent random variables
In this assignment we will discuss how to generate independent random variables in Python to understand these notions deeper. This assignment is partly automatically graded (PMF calculations) and partly peer review graded. Please, submit this assignment using "Submit" button to automatic grading, then download ipynb file and submit to peer review grading (in corresponding course element).

### Preliminaries
We need this function to test our generators.

In [14]:
def count_frequencies(data, relative=False):
    counter = {}
    for element in data:
        if element not in counter:
            counter[element] = 1
        else:
            counter[element] += 1
    if relative:
        for element in counter:
            counter[element] /= len(data)
    return counter

### Independent random variables: PMF calculation
Consider random variables $X$ and $Y$. Assume that $X$ takes values $x_1, \ldots, x_n$ with probabilities $p_1, \ldots, p_n$ and $Y$ takes values $y_1, \ldots, y_m$ with probabilities $q_1, \ldots, q_m$. Assume that $X$ and $Y$ are independent. Implement function `joint_pmf(xvalues, xprobs, yvalues, yprobs)` that takes an array of values $x_1, \ldots, x_n$ as `xvalues`, an array of probabilities $p_1, \ldots, p_n$ as  `xprobs` and the same with `yvalues` and `yprobs`. The function should return a dictionary which keys are tuples `(x, y)` where `x` is some value $x_i$ and `y` is $y_j$ and corresponding values are values of joint probability mass function $pmf_{X, Y}(x_i, y_j)$.

In [6]:
def joint_pmf(xvalues, xprobs, yvalues, yprobs):
    # your code here
    dic = {}
    i = 0
    for x in xvalues:
        j = 0
        for y in yvalues:
            dic[(x,y)] = xprobs[i] * yprobs[j]
            j += 1
        i += 1
            
    return dic

print( joint_pmf([1, 2], [0.5, 0.5], [3, 4, 5], [0.3, 0.3, 0.4]))

{(1, 3): 0.15, (1, 4): 0.15, (1, 5): 0.2, (2, 3): 0.15, (2, 4): 0.15, (2, 5): 0.2}


In [7]:
testdata = [([1], [1], [2, 3], [0.2, 0.8]),
            ([1, 2], [0.5, 0.5], [3, 4, 5], [0.3, 0.3, 0.4])]
answers = [{(1, 2): 0.2, (1, 3): 0.8},
           {(1, 3): 0.15,
            (1, 4): 0.15,
            (1, 5): 0.2,
            (2, 3): 0.15,
            (2, 4): 0.15,
            (2, 5): 0.2}]
for data, answer in zip(testdata, answers):
    assert joint_pmf(*data) == answer

### Independent random variables: generation
Implement function `indep_choice(xvalues, xprobs, yvalues, yprobs)` that samples value `x` from random variable $X$ (here `xvalues` is an array of values $x_1, \ldots, x_n$ and `xprobs` is an array of probabilities $p_1, \ldots, p_n$) and value `y` from random variable $Y$ (here `yvalues` is an array of values $y_1, \ldots, y_m$ and `yprobs` is an array of probabilities $q_1, \ldots, q_m$) and returns a tuple `(x, y)`. Use `numpy.choice` in each case. 

In [12]:
from numpy.random import choice

def indep_choice(xvalues, xprobs, yvalues, yprobs):
    # your code here
    return (choice(xvalues, p = xprobs),choice(yvalues, p = yprobs))
    
xvalues = [0, 1, 2]
xprobs = [0.2, 0.5, 0.3]

yvalues = [5, 6]
yprobs = [0.4, 0.6]

print(indep_choice(xvalues, xprobs, yvalues, yprobs))


(2, 5)


Now let us generate a large sample of these values and compare relative frequencies of each combination with corresponding value of PMF.

In [16]:
xvalues = [0, 1, 2]
xprobs = [0.2, 0.5, 0.3]

yvalues = [5, 6]
yprobs = [0.4, 0.6]

size = 10000

sample = [indep_choice(xvalues, xprobs, yvalues, yprobs) 
          for _ in range(size)] 

def print_sorted_keys(dictionary):
    for k in sorted(dictionary):
        print(f"{k}: {dictionary[k]}")

print("Obtained relative frequencies")
print_sorted_keys(count_frequencies(sample, relative=True))

print("\nValues of probability mass function")
print_sorted_keys(joint_pmf(xvalues, xprobs, yvalues, yprobs))

Obtained relative frequencies
(0, 5): 0.0806
(0, 6): 0.1195
(1, 5): 0.2016
(1, 6): 0.2964
(2, 5): 0.1216
(2, 6): 0.1803

Values of probability mass function
(0, 5): 0.08000000000000002
(0, 6): 0.12
(1, 5): 0.2
(1, 6): 0.3
(2, 5): 0.12
(2, 6): 0.18


**Peer review grading:** Values of obtained frequencies should be close to values of PMF.

### Dependent random variables: probability mass function
Consider system $(X, Y)$ of random variables, defined in the following way. Let $X$ be Bernoulli random variable with parameter $p$, i.e. random variable that takes value 1 with probability $p$ and value $0$ with probability $1-p$. Assume also that $Y$ takes values 0 and 1 as well, and $P(Y=1\mid X = 0) = q_0$ and $P(Y=1 \mid X = 1) = q_1$. Implement function `dependent_bernoulli_pmf(p, q0, q1)` that generates dictionary with joint probability mass function (like in the first problem).

In [17]:
# your code here
def dependent_bernoulli_pmf(p, q0, q1):
    dic = {}
    dic[(0,0)] = (1 - p) * (1 - q0)
    dic[(0,1)] = (1 - p) * (q0)
    dic[(1,0)] = (p) * (1 - q1)
    dic[(1,1)] = (p) * (q1)
    return dic

In [18]:
assert dependent_bernoulli_pmf(0.25, 0.125, 0.25) == {(0, 0): 0.65625, 
                                                      (0, 1): 0.09375, 
                                                      (1, 0): 0.1875, 
                                                      (1, 1): 0.0625}

### Dependent random variables: generation

Implement function `dependent_bernoulli(p, q0, q1)` that generates a pair `(x, y)` that is a sample from a system $(X, Y)$ of random variables, described above.

In [21]:
# your code here
def dependent_bernoulli(p, q0, q1):
    x = choice([1,0] , p = [p, 1-p])
    if x == 0:
        y = choice([1,0], p = [q0 , 1- q0])
    else:
        y = choice([1,0], p = [q1 , 1- q1])
    return (x,y)

In [22]:
def test_dependent(p, q0, q1, size):
    sample = [dependent_bernoulli(p, q0, q1) for _ in range(size)]

    print("Obtained relative frequencies")
    print_sorted_keys(count_frequencies(sample, relative=True))

    print("\nValues of probability mass function")
    print_sorted_keys(dependent_bernoulli_pmf(p, q0, q1))
    
test_dependent(0.25, 0.125, 0.25, 10000)

Obtained relative frequencies
(0, 0): 0.647
(0, 1): 0.096
(1, 0): 0.194
(1, 1): 0.063

Values of probability mass function
(0, 0): 0.65625
(0, 1): 0.09375
(1, 0): 0.1875
(1, 1): 0.0625


**Peer review grading:** Values of obtained frequencies should be close to values of PMF.

In [23]:
test_dependent(0.5, 0.125, 0.75, 10000)

Obtained relative frequencies
(0, 0): 0.4441
(0, 1): 0.0602
(1, 0): 0.1262
(1, 1): 0.3695

Values of probability mass function
(0, 0): 0.4375
(0, 1): 0.0625
(1, 0): 0.125
(1, 1): 0.375


**Peer review grading:** Values of obtained frequencies should be close to values of PMF.