# Let's *Program to Learn*: Probability

**Motivation:** Assume we have data of the weather statistics for the last 200 years. Analysing the data we find out that *on average* in July and in a certain city the weather is rainy for 5 days out of 31 days. Suppose there is not much change in the climate. What would be the probability of having a rainy day on 10th of July in the coming year?
In this case, we use the past relative frequency of certain event (rainy days) to predict the event in the future. We achive this by using statistical data from a certain events to assign a *numerical value* to the probability of events. Basically, we are emplying [the empirical law of large numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers). 
In the following we use this concept to practice probability.

## Simulation

**Problem:** There is a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles. What is the probability of pulling *one yellow* marble?

We try to simulate the experiment and find (an estimation) of the answer. 
The first step is to model the experiment: how should we model the jar and marbles? In order to find a proper structure to define the elements of our experiments, it is important to understand what actions are needed to perform on the elements.

First, we need to scale down the problem into a simpler case and see how we can simulate an experiment for probability problems. We start with tossing a fair coin. What is the probability of having one *Head*?
We simply choose to model the head of the coin with 0 and tail with 1. This is just a choice to make. Another choice would be using a more readable and understandable model: $\{'H','T'\}$. How do we model the action of tossing the coin? The result of the action (tossing) is a random selection from $\{0,1\}$ (or $\{'H','T'\}$) where 0 indicates the head and 1 represents the tail. Let's see how Python supports this operation: check it [here](https://docs.python.org/3/library/random.html).

In [24]:
import random

count_head = 0
max_num_experiments = 1_000_000
for _ in range(0,max_num_experiments):
    side = random.randint(0,1)
    if side == 0: # if side is Head
        count_head = count_head + 1
print('Number of heads:',count_head)
print('Probability of Head:',count_head/max_num_experiments)

Number of heads: 499644
Probability of Head: 0.499644


We have experienced that based on [the empirical law of large numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers) we can calculate the frequency of an event and find an *estimation* of the probability. 
Therefore, back to our original jar of marbles problem, we need a structure such that: (1) can simulate pulling the marbles randomly, (2) can log the frequency of target event (*what is our target event?*). We try our first code.

In [2]:
import random as rand

jar = ['Y','Y','Y','B','R','R','G','G'] # a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles

count_yellow = 0 # target event
max_num_experiments = 1_000_000

for _ in range(0,max_num_experiments):
    rand.shuffle(jar) # we shuffle the container
    marble = rand.choice(jar) # select an item from a list randomly
    if marble == 'Y':
        count_yellow = count_yellow + 1
print('Number of yellow pulls:',count_yellow)
print('Probability of pulling one Yellow marble from ',jar,' is :',count_yellow/max_num_experiments)

Number of yellow pulls: 374874
Probability of pulling one Yellow marble from  ['Y', 'Y', 'R', 'R', 'G', 'Y', 'G', 'B']  is : 0.374874


It seems we have found a solution. But, we become more curious and interested in finding the probability of other events: pulling one red marble? what about pulling one green marble? We need to refactor the code such that we keep the frequencies of all possible events. So after executing the experiment we have the answer for other events.

In [3]:
import random as rand

jar = ['Y','Y','Y','B','R','R','G','G'] # a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles
count_events = [0,0,0,0] # each index indicates a color
yellow_index = 0
red_index = 1
blue_index = 2
green_index = 3

max_num_experiments = 1_000_000
for _ in range(0,max_num_experiments):
    rand.shuffle(jar)
    marble = rand.choice(jar) # select an item from a list randomly
    if marble == 'Y':
        count_events[yellow_index] = count_events[yellow_index] + 1
    if marble == 'R':
        count_events[red_index] = count_events[red_index] + 1
    if marble == 'B':
        count_events[blue_index] = count_events[blue_index] + 1
    if marble == 'G':
        count_events[green_index] = count_events[green_index] + 1
print('Number of events:',count_events)
print('Probability of pulling one Yellow marble from ',jar,' is :',count_events[yellow_index]/max_num_experiments)
print('Probability of pulling one Red marble from ',jar,' is :',count_events[red_index]/max_num_experiments)
print('Probability of pulling one Green marble from ',jar,' is :',count_events[green_index]/max_num_experiments)
print('Probability of pulling one Blue marble from ',jar,' is :',count_events[blue_index]/max_num_experiments)

Number of events: [375464, 249155, 125128, 250253]
Probability of pulling one Yellow marble from  ['R', 'B', 'Y', 'Y', 'Y', 'G', 'G', 'R']  is : 0.375464
Probability of pulling one Red marble from  ['R', 'B', 'Y', 'Y', 'Y', 'G', 'G', 'R']  is : 0.249155
Probability of pulling one Green marble from  ['R', 'B', 'Y', 'Y', 'Y', 'G', 'G', 'R']  is : 0.250253
Probability of pulling one Blue marble from  ['R', 'B', 'Y', 'Y', 'Y', 'G', 'G', 'R']  is : 0.125128


Is it possible to write the code differently? Encoding information as a list of frequencies works fine. But, there are some hidden information which affects the readability and understandability of the code. Let's refactor the code and use dictionaries.

In [4]:
import random as rand

jar = ['Y','Y','Y','B','R','R','G','G'] # a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles
count_events = {'Y':0 , 'R':0 , 'G':0 , 'B':0 } # a dictionary with keys are colors and values as frequencies

max_num_experiments = 1_000_000
for _ in range(0,max_num_experiments):
    rand.shuffle(jar)
    marble = rand.choice(jar) # select an item from a list randomly
    count_events[marble] = count_events[marble] + 1
print('Number of events:',count_events)
probabilities = {'Yellow':count_events['Y']/max_num_experiments , 'Red':count_events['R']/max_num_experiments , 'Green':count_events['G']/max_num_experiments , 'Blue':count_events['B']/max_num_experiments }
print('Probability of pulling marbles from ',jar,' is :',probabilities)

Number of events: {'Y': 374527, 'R': 250737, 'G': 249287, 'B': 125449}
Probability of pulling marbles from  ['Y', 'G', 'R', 'B', 'R', 'G', 'Y', 'Y']  is : {'Yellow': 0.374527, 'Red': 0.250737, 'Green': 0.249287, 'Blue': 0.125449}


Next step: Refactor the first experiment and make both code as reusable pieces of code segments. Functions in programming can be helpful in dividing long code into smaller segments such that each segment can be reused (called) in other pieces of the program.


In [7]:
import random as rand


max_num_experiments = 1_000_000 # global variable

def simulate_marble():
    '''This function simulates pulling one marble from a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles.'''
    jar = ['Y','Y','Y','B','R','R','G','G'] # a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles
    count_events = {'Y':0 , 'R':0 , 'G':0 , 'B':0 } # a dictionary with keys are colors and values as frequencies

    for _ in range(0,max_num_experiments):
        rand.shuffle(jar)
        marble = rand.choice(jar) # select an item from a list randomly
        count_events[marble] = count_events[marble] + 1
    print('Number of events:',count_events)
    probabilities = {'Yellow':count_events['Y']/max_num_experiments , 'Red':count_events['R']/max_num_experiments , 'Green':count_events['G']/max_num_experiments , 'Blue':count_events['B']/max_num_experiments }
    print('Probability of pulling marbles from ',jar,' is :',probabilities)

def simulate_toss_coin():
    '''This function simulates tossing a fair coin.'''
    coin = ['H','T']
    count_events = {'H':0 , 'T':0}
    
    for _ in range(0,max_num_experiments):
        rand.shuffle(coin)
        side = rand.choice(coin)
        count_events[side] = count_events[side] + 1
    print('Number of events:',count_events)
    probabilities = {'Head':count_events['H']/max_num_experiments , 'Tail':count_events['T']/max_num_experiments}
    print('Probability of sides:',coin,' is :',probabilities)
    
simulate_marble()
simulate_toss_coin()

Number of events: {'Y': 375368, 'R': 250030, 'G': 249365, 'B': 125237}
Probability of pulling marbles from  ['R', 'G', 'R', 'Y', 'Y', 'Y', 'B', 'G']  is : {'Yellow': 0.375368, 'Red': 0.25003, 'Green': 0.249365, 'Blue': 0.125237}
Number of events: {'H': 500345, 'T': 499655}
Probability of sides: ['T', 'H']  is : {'Head': 0.500345, 'Tail': 0.499655}


Reviewing the implementation of both functions we recognise some similarities (*like what?*). Can we improve the code? Let's refactor the code.

In [8]:
import random as rand


def simulate_one_rand_event(sample_space, frequencies, max_num_experiments = 1_000_000):
    '''This function simulates an experiment given the sample space and an structure to calculate frequencies.'''

    for _ in range(0,max_num_experiments):
        rand.shuffle(sample_space)
        sample = rand.choice(sample_space) # select an item from a list randomly
        frequencies[sample] = frequencies[sample] + 1
    probabilities = {}
    for key in frequencies.keys():
        probabilities[key] = frequencies[key]/max_num_experiments
    return probabilities

    
def main():
    result_marble = simulate_one_rand_event(['Y','Y','Y','B','R','R','G','G'] , {'Y':0 , 'R':0 , 'G':0 , 'B':0 })
    result_coin = simulate_one_rand_event(['H','T'] , {'H':0 , 'T':0} , 10_000_000)
    print(result_marble)
    print(result_coin)

main()

{'Y': 0.374752, 'R': 0.250385, 'G': 0.24915, 'B': 0.125713}
{'H': 0.4999809, 'T': 0.5000191}


**Exercise:** Do we need to pass frequencies? Refactor the code. 

## Theory

### Basic Concepts

**Relative Frequency:** Suppose $A$ is a random event. The relative frequency of the event $A$ in $n$ repetitions of the experiment is defined as $f_n(A)=n(A)/n$, where $n(A)$ is the number of times that event $A$ occurred in the $n$ repetitions of the experiment. The relative frequency is a number between 0 and 1 (why?).

**Random Process:** A process is a random process if after its occurance one outcome out of all its possible outcomes for sure occures. But, it is impossible to predict with certainty which outcome that will be.

**Sample Space:** In theory, the sample space of a random process (or an experiment) is the set of all possible outcomes of that experiment. 

**Event:** Any *subset* of the sample space is called an *event*. 

**Probability of an Event:** A *probability measure* is simply a function $P$ that assigns a numerical probability to each subset of the sample space. If $S$ is a finite sample space containing all outcomes with equal likelihood and $E$ is an event in $S$, then probability of $E$ will be: $P(E) = |E| / |S|$

For example, tossing a fair coin can be considered as a random process. The sample space is the set of all possible outcomes, i.e. $\{Head, Tail\}$. One event can be having a head which is the subset $\{Head\}$, and another event can be $\{Tail\}$. The probability of having head will be: $P(\{Head\}) = \frac{|\{Head\}|}{|\{Head, Tail\}|} = \frac{1}{2}=0.5$.

**Exercise:** What are the domain, codomain and range of function $P$ (in the coin tossing experiment)?

**Exercise:** There is a jar containing 3 yellow, 2 red, 2 green and 1 blue marbles. We pull one marble. What is the random process? What is the sample space? What are the events? What is the probability of pulling *one yellow* marble?

**Problem:** There are two fair coins. We toss both independently and at the same time. What is the probability of having (only) one head? Propose your solution both in Python simulation and theory.

**Problem:** What is the probability of getting the sum of 7 when rolling two 6-sided dice? Propose your solution both in Python simulation and theory.

## We need to count

**Motivation:** We have learned that in order to calculate the probability of an event, we need to count. We need to know the size of the event and sample space. In some of the cases it is easy, like tossing a fair coin: there are two possibilities and we need to find the likelihood of one. But, there are some cases that counting is not as simple as tossing a coin. For example, in a survey there are 20 questions with three possible answers for each (yes, no, maybe). What is the probability of ...? In order to calculate the probability of any event here, we need to know the sample space and the size of the event. So, let's explore some basic techniques of counting: 1,2,3,...

**Problem:**  In a survey there are 20 questions with three possible answers for each (yes, no, maybe). What is the probability of having no for all quenstions?

**Exercise:** Implement a simulation for the experiment and find an estimation for the probability.

In [30]:
'''In a survey there are 20 questions with three possible answers for each (yes, no, maybe). 
 What is the probability of having no for all quenstions?'''
# todo


'In a survey there are 20 questions with three possible answers for each (yes, no, maybe). \nWhat is the probability of having no for all quenstions?'

In order to find a solution for the survey problem first we scale dowm the problem and see if we can find a pattern in the solution. Let's assume a survey with only 3 questions. How many possible options do we have for question 1? 3 possible answers, right? Easy. How many options do we have for question 2? Again, 3 options. Now, how many options do we have for the first two questions? Obviously, for each choice for question 1, we have three possible options for question 2. This problem turned to be generating cartesian product of two sets: $Q1=\{Y,M,N\}$ and $Q2=\{Y,M,N\}$. The result of $Q1 \times Q2$ (What is the result?) can generate all possible options for answers of questions 1 and 2. If we extend this with possible answers of question 3 then we will have three sets, i.e. $Q1 \times Q2 \times Q3$ (what is the result?). Let's generalise this and solve it for 20 questions. What will be the result of $Q1 \times Q2 \times ... \times Q20$?

**Exercise (challenging):** Implement your (without using existing libraries) Python solution to generate all possible combinations of the answers for 20 questions.

While you are busy with proposing your solution for the previous exercise, I take the easy path and use an existing library: [itertools](https://docs.python.org/3/library/itertools.html). Check the code below.

In [5]:
import itertools as iter

options = {1,2,3} # to save some space in the print, 1,2,3 are used instead of Yes, Maybe, No.
num_questions = 4 # to manage the space here, num of questions is defined as 4. Try with 20.
survey_sample_space = iter.product(options,repeat=num_questions)
print(set(survey_sample_space))

{(3, 2, 1, 3), (2, 3, 3, 1), (1, 2, 1, 1), (1, 3, 2, 3), (1, 3, 3, 2), (2, 1, 1, 1), (3, 3, 2, 1), (1, 1, 2, 3), (2, 1, 3, 1), (3, 3, 1, 2), (2, 3, 3, 3), (1, 2, 1, 3), (3, 1, 1, 2), (2, 1, 1, 3), (3, 3, 3, 2), (3, 2, 3, 1), (2, 1, 3, 3), (1, 2, 3, 2), (3, 1, 2, 2), (3, 1, 3, 1), (3, 2, 2, 2), (2, 2, 2, 1), (2, 3, 1, 1), (3, 2, 3, 3), (3, 1, 3, 3), (1, 1, 3, 2), (2, 2, 2, 3), (1, 3, 1, 1), (2, 3, 1, 3), (1, 3, 3, 1), (2, 3, 2, 1), (1, 3, 1, 3), (3, 3, 2, 3), (2, 3, 3, 2), (1, 2, 1, 2), (3, 3, 1, 1), (1, 2, 2, 1), (3, 1, 1, 1), (1, 3, 3, 3), (2, 1, 1, 2), (2, 1, 2, 1), (2, 3, 2, 3), (2, 2, 1, 2), (2, 1, 3, 2), (1, 2, 3, 1), (1, 2, 2, 3), (3, 1, 1, 3), (2, 2, 3, 2), (1, 1, 1, 2), (3, 2, 2, 1), (2, 1, 2, 3), (3, 2, 1, 2), (3, 1, 3, 2), (1, 3, 2, 2), (1, 1, 3, 1), (3, 2, 2, 3), (1, 1, 2, 2), (2, 3, 1, 2), (1, 1, 3, 3), (3, 3, 2, 2), (3, 3, 3, 1), (3, 3, 1, 3), (3, 1, 2, 1), (2, 3, 2, 2), (2, 2, 1, 1), (3, 3, 3, 3), (1, 2, 2, 2), (3, 2, 3, 2), (2, 2, 3, 1), (1, 1, 1, 1), (1, 2, 3, 3), (3, 1

**Combinatorics** is field of counting (and generating).

**Rule 1 (Multiplication)** If there are $n$ options for stage (step, problem, etc) one and $m$ options for stage two, then there $m \times n$ options for both stages provided that stages are *independent*.

**Exercise:** You are asked to generate unique codes for a group of students using four characters: first character can be I or T, second character must be 0 and the third and four characters can be any digit except 0. How many students can be uniquely coded?



In [6]:
'''Exercise: What is the final value of count after finishing the execution?'''

count = 0
for _ in range(0,25):
    for _ in range(5,30):
        for _ in range(0,100):
            count = count + 1

Let's not forget our starting problem: probablity of having **NO** for all the questions in a survey with 20 questions where each question had three options. What is the cardinality of sample space? The answer should be $3^{20}$. What is the carinality of event (no for all the questions)? The answer should be 1. What is the probability?
$P(\text{No for all 20 questions})=\frac{1}{3^{20}}$