# Probabilties

## Definition of probability of event
Probability of event E is a `ratio` of successful outcomes to all possible outcomes

$$\boxed{P(E) = \frac{\text{\# of successes}}{\text{\# of possible outcomes}}}$$

If examined from a standpoint of `set theory` we can explicitly collect desired and possible outcomes in sets:
$$\text{all outcomes} = \{outcome_1, outcome_2,\dots\}$$
$$\text{desired outcomes} = \{outcome_6,\dots\}$$
and write
$$\boxed{P(E) = \frac{|\text{desired outcome}|}{|\text{all outcomes}|}}$$
where $|\cdot|$ is a size of a set.  For example for a `coin toss` with sample space of `heads` and `tails` 
$$C = \{H,T\} \rightarrow |C| = 2$$

"All outcomes" set is often referred as of `sample space` (or state space in RL).

<br>
<i>REMARK 1: Set size counting makes sense if each entry has same probability. Although it does not restrict you to use duplicates. 

For example if $A$ is twice as likely as $B$, set state space to $S = \{A,B,A\}$.</i>
$$P(A) = \frac{|\{A,A\}|}{|\{A,B,A\}|} = \frac{2}{3}$$

<i>REMARK 2: I will use natural language notation (and, or, not) and set notation interchangeably:
$$P(A \text{ or } B) = P(A \cup B)$$
Maybe its best to default our interpretation of events $A$ and $B$ as sets.</i>
***

### Few examples (overview)

For a fair dice roll:
$$S =  \{1,2,3,4,5,6\}; |S| = 6$$
Term "fair" implies equal probabilities for all outcomes.
1. Probability to roll 2 is
    $$P(2) = \frac{|\{2\}|}{|\{1,2,3,4,5,6\}|} = \frac{1}{6}$$ 
    <i>(Optional) you may construct desired outcome space by a rule-set:</i>
    $$S_2 = \{n\in S:n = 2\} = \{2\}$$
1. Probability to roll even number (case of mutually exclusive events)
    $$P(\text{even}) = P(\text{ 2 or 4 or 6}) = \frac{|\{2,4,6\}|}{|\{1,2,3,4,5,6\}|} = \frac{3}{6}= \frac{1}{2}$$
    <i>(Optional) Rule-set:</i>
    $$S_{even} = \{n\in S: n \% 2 = 0\} = \{2,4,6\}$$

***

## Set operations
Given following sets:
$$S  = \{1,2,3,4,5,6\}$$
$$S_{even}  = \{2,4,6\} \; \ S_{odd}  = \{1,3,5\}$$
$$\emptyset = \{ \}$$
We can manipulate sets using set operations: $\{ \setminus \ , \ \cup \ , \ \cap \ , \ \cdot^c \}$

$$\text{Union (or): } S = S_{even} \cup S_{odd}$$
$$\text{Set minus: } S_{odd} = S \setminus S_{even}$$
$$\text{Intersection (and): } S_{odd} \cap S_{even} = \emptyset$$
$$\text{Complement (rest): } S_{even} \cup S_{even}^c = S $$

If $X$ is a subset of $S\rightarrow X\subseteq S$

$$ X \cap S = X \text{ (for any X)}\rightarrow \{1,2\} \cap S = \{1,2\} $$
$$ S \cap S = S$$
***

## Complementary & certain events
* Complementary events "complement" each other such that they both take up `whole` state space,
* Complementary events cannot occur at `same time`,
* Certain events have probability of 1.

### Example with a coin flip
Only two outcomes make this problem binary- if outcome is `not` heads, then its tails:
$$P(H) = P(\text{not }T) = P(T^c); \ P(T) = P(\text{not }H) = P(H^c)$$


Stating obvious: "everything" is made up from "something" and "not something" = " everything else"
$$H\text{ or not }H = H\text{ or }H^c = H\text{ or }T = C$$
Probability of "everything" (whole state space) is 1. 
$$P(C) = P(H\text{ or }T) = P(H \cup T) = P(H) + P(T) = 1$$
Any/or is represented by union $\cup$ of sets.
### Example with a dice roll
Sample space size is not important (given its greater than 1). Its about events dividing whole sample space into 2 parts.

For example two dice roll events $S_{even}$ and $S_{odd}$ are complementary, and event of `either` of them occurring is certain.
$$S_{even} \cup S_{odd} = S \ ; \ P(S) = 1$$

For that matter it works for any other event $X\subseteq S$
$$X\cup X^c = S : \{1,6\}\cap \{2,3,4,5\} = S$$

## Impossible events
Probability of `impossible` events is `zero`. 

We cannot expect multiple outcomes from one trial.

For example a coin flip that produces both heads and tails simultaneously:
$$H \text{ and } T = H \cap T = \emptyset \ ; \ P(\emptyset) = 0$$
Dice roll that produces two different numbers 2 and 3:
$$\{2\}\cap\{3\} = \emptyset \rightarrow P(S_2 \cap S_3) = 0$$

## Mutually exclusive events events

Is a 'weaker' version of complementary events. 
* Mutually exclusive events cannot occur on same time

If events A and B are not complementary, but mutually exclusive:

$$P(A \cup B) = P(A) + P(B) < 1$$

Chance to roll 2 or 3

$$P(2 \ or \ 3) = P(S_2 \cup S_3)= \frac{1}{6} + \frac{1}{6} = \frac{2}{6}  = \frac{1}{3} $$


## Mutually non-exclusive events
If events occur at `same time`, their contents might be `double counted`.


In a single coin toss, what is the chance to get either (H or T) or (T)?

Since events are non mutually exclusive we `cannot` compute probability as
$$P((H \cup T) \cup T) \neq P(H \cup T) + P(T) = 1 + 0.5 = 1.5 > P(C)$$
Event $(H \cup T)$ already covers case of $(T)$, so we should subtract one intersection/overlap (shared event part):

$$P((H \cup T) \cup T) = P(H \cup T) + P(T) - P((H \cup T) \cap T) = $$
$$ = \bigg| P((H \cup T) \cap T)  = \frac{|\{H,T\}\cap \{T\}|}{|\{H,T\}|} = \frac{|\{T\}|}{|\{H,T\}|} = P(T) \bigg| = $$
$$=  P(H \cup T) + P(T) - P(T) = P(H \cup T) = P(C) = 1$$

Or for general events A and B

$$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

## Conditional probability $P(A|B)$ (Prelude to dependent events)
$P(A|B)$ shows a probability of event $A$ if event $B$ is true/has been registered. 

<i>In short: we isolate and focus on subspace of sample space, in which $B$ has occurred.</i>

We invoke conditional probability if result of B somehow affects result of A.

* Second coin toss is not affected by first coin toss.<br>
Even if there were 5 heads in a row, chance of 6th heads is still $\frac{1}{2}$

Example from https://www.sydney.edu.au/content/dam/students/documents/mathematics-learning-centre/basic-probability.pdf <br>
(I introduce trivial conditioning to transition from P(N) to conditional probability.)

Lecture with 300 students can be split into following categories

| Gender | Doctors | Nurses |
| --- | --- | --- |
| Female | 90 | 90 |
| Male | 100 | 20 |

1. We can ask what is the chance of selected `random student` being a nurse:
    $$ P(\text{nurse}) = P(N) = \frac{\text{total nurses (M \& F)}}{\text{total students}} = \frac{|N|}{|S|} = \frac{90+20}{300} = \frac{110}{300}$$

2. Or ask if random student is, specifically, a female nurses $F$:
    $$P(N \cap F) = \frac{\text{total female nurses}}{\text{total students}} = \frac{|N \cap F|}{|S|} = \frac{90}{300}$$

    <i>REMARK: set of female nurses is an intersection of two data slices: $N \cap F =  (F+M \text{ nurses}) \cap (\text {nurses + doctors }F)$</i>

3. We can add trivial conditioning to 2. in which we consider only students:
    $$ P(\text{nurse given person is a student}) = P(\text{N|S})= \frac{\text{total nurses which are students}}{\text{total students which are students}} = \frac{|N \cap S|}{|S \cap S|} =  \frac{|N|}{|S|} = P(N)$$

    Of course this conditioning does not change probability because we have only students in our data

4. But if we condition by cherry picking only female students ($F \subseteq S$):
    $$P(N|F) = \frac{P(N \cap F)}{P(S \cap F)} = \frac{P(N \cap F)}{P(F)} = \frac{\text{total nurses which are female}}{\text{total students which are female}} = \frac{90}{90 + 90} = \frac{1}{2}$$
    <i>REMARK: Our choice to cherry pick females affected result of query</i>

General expression for events $A$ and $B$ for conditional probability of A given that B is true is:

$$\boxed{P(A|B) = \frac{P(A\cap B)}{P(B)}}$$
so
$$\boxed{P(A\cap B) = P(A|B)\cdot P(B)}$$

This expression is interesting since it "decouples" space in two "independent" events $P(A|B)$ and $P(B)$ (but only in scope of determining $P(A\cap B)$).

So $P(A\cap B)$ can be viewed as "what is the chance that $B$ happened and, on top of it, what is the chance of $A$ if $B$ happened.

And due to symmetry of $(A\cap B)$ and $(B \cap A)$:

$$P(B\cap A) = P(B|A)\cdot P(A)$$

Notice that in general $P(N|F) \neq P(F|N)$
$$P(F|N) = \frac{P(F \cap N)}{P(N)} = \frac{\text{total nurses which are female}}{\text{total students which are nurses}} = \frac{90}{90 + 20} = \frac{90}{110}$$

From data, nurses are dominantly female, while females are equally likely to be doctors and nurses.<br>
Asymmetry is due to lower total male count, and those who are present are mostly doctors.

## Independent events
If event A is `independent` of B, then 
$$ P(A|B) = P(A)$$

As we have seen in example with doctors and nurses
$$P(N) = \frac{110}{300} \approx 0.37 \neq P(N|F) = 0.5$$

<i>Overall probability that a student is a nurse is brought down by males being mostly doctors.</i>

For independent events
$$P(A\cap B) = P(A|B)\cdot P(B) \rightarrow P(A)\cdot P(B)$$
$$\boxed{P(A\cap B) =P(A)\cdot P(B)}$$

Example for and ($\cup$) constraint:

1. It can a `series` of `unrelated` experiments, such as a coin flip & dice roll.
    $$C = \{H,T\}$$
    $$P(H \cap S_{1}) = \frac{1}{2} \cdot \frac{1}{6} = \frac{1}{12}$$

    *   We can expand state space to $S^\prime = C\times S$ and view event $H \cap S_{1}$ as some event $E^\prime$ in that space. <br>
        State $S^\prime = \{(H,1), (H,2), \dots ,(T,6)\}$ has $2\times 6 = 12$ possible states, so 
        $$P(E^\prime) = P(H \cap S_{1}) = \frac{|\{(H,1)\}|}{|S^\prime|} = \frac{1}{12}$$

2. Or series of same independent experiments where order does not matter. i.e <br>
    Trivial double condition: roll 1 and/while roll any number from $1$ to $3$
    $$P(S_1 \cap S) = \frac{1}{6} \cdot \frac{3}{6}  = \frac{1}{12}$$
    *   Desired outcome space 
        $S_{1, 1-3} = S_1\times S_{1-3} = \{(1,1), (1,2), (1,3)\}$

        And all possible outcome space (sample space) to 
        $S^\prime = S \times S = \{(1,1),\dots, (6,1), \dots, (6,6)\}$
        $$P(S_1 \cap S) = \frac{|S_{1, 1-3} |}{|S^\prime|} = \frac{3}{36} = \frac{1}{12}$$

## Check if event is independent
Example (from same resource as doctors and nurses): 3 white and 3 black cards with numbers 1 or 2.

White cards: $W_H = \{W_1,W_2,W_2\}$, black cards: $B_L = \{B_1,B_1,B_2\}$.

$$S = W_H \cup B_L = \{W_1,W_2,W_2,B_1,B_1,B_2\}$$
Event $A$: draw a card and its black. 
$$A = \{B_1,B_1,B_2\} $$
Event $B$: draw a cards and its $2$.
$$B = \{W_2,W_2,B_2\}$$

Are $A$ and $B$ independent events?

If it is so, then conditional probability is not needed $$P(A|B) = P(A) \rightarrow P(A \cap B) = P(A)\cdot P(B)$$
Individual probabilities 
$$P(A) = \frac{3}{6}; \ P(B) = \frac{3}{6}$$
$$P(A)\cdot P(B) = \frac{9}{36} = \frac{1}{4}$$
Combined probability of card being black and having a number $2$ is
$$P(A \cap B) = \frac{|A \cap B|}{|S|} = \frac{|\{B_1,B_1,B_2\} \cap \{W_2,W_2,B_2\}|}{|S|} = \frac{|\{B_2\}|}{|S|} = \frac{1}{6}$$

We see that 
$$P(A \cap B) = \frac{1}{6} \neq \frac{1}{4} = P(A)\cdot P(B)$$
Thus events are not independent
***

## Solved example
<b>TASK:</b><br>
Two missiles $A$ and $B$ are shot in direction of a target. 

Chance to Hit a target for each missile is
$$P(A_H) = 1/4; P(B_H) = 2/5$$

What is the chance that
1. Both missiles hit?
2. At least one missile will hit
***
<b>SOLUTION</b>:
Both events have sample space of:

$$E_A = \{A_H, A_M\}; E_B = \{B_H, B_M\}$$

Where $\square_M$ is a chance to miss

***
1. <b>Both missiles have a chance to hit of:</b>
$$P(A_H \cap B_H) = P(A_H)\cdot P(B_H) = \frac{1}{4} \cdot \frac{2}{5} = \frac{1}{10}$$
***
2. <b>Probability of at least one missiles to hit</b> 

    *  <b>Solution by Non-Mutually Exclusive Events:</b>
        $$P(A_H \cup B_H) = P(A_H) + P(B_H) - P(A_H \cap B_H)$$
        $$P(A_H \cup B_H) = \frac{1}{4} + \frac{2}{5} - \frac{1}{10} = \frac{5}{20} + \frac{8}{20} - \frac{2}{20} = \frac{11}{20} $$
        
    * <b>Solution by a Complement:</b><br>
        Total sample space is  
        $$S = E_A \times E_B = \{(A_H,B_H), (A_H, B_M), (A_M, B_H), (A_M, B_M)\}$$

        $$P(A_H \text{ or } B_H) = P(A_H \cap B_H) +  P(A_H \cap B_M) +  P(A_M \cap B_H)$$
        We know that probability of all outcomes $S$ is $P(S) = 1$ 
        $$P(S)= P(A_H \cap B_H) +  P(A_H \cap B_M) +  P(A_M \cap B_H) + P(A_M \cap B_M) = 1 $$
        As we can observe
        $$P(A_H \text{ or } B_H) = P(S) - P(A_M \cap B_M) = P(S) - P(A_M)\cdot P(B_M) $$
        is a probability of all events except that both missiles miss
        $$P(A_H \text{ or } B_H) = P(S) - [1 - P(A_H)]\cdot[1 - P(B_H)]= \left(1 - \frac{1}{4}\right)\cdot\left(1 - \frac{2}{5}\right) = \frac{3}{4}\cdot\frac{3}{5} = \frac{9}{20}$$

        <i>
        TEST:

        We can test solution by setting $P(A_H) = 1$, which guarantees at least one missile to hit
        $$P(A_H \text{ or } B_H) = 1-[1 - 1][1 - P(B_H)] = 1 - 0 = 1$$
        </i>
***

# Extensions
## Probability chain rule
Suppose we have a problem where $s$ = starting position index, $a$ = action taken from this position and $f$ =  final position index.

$P(s,a,f) = P(s \cap a \cap f)$ shows probability to be at state $f$ after taking action $a$ from state $s$.

We can say $a \cap f = d$
$$P(s,a,f)  = P(s \cap (a \cap f)) = P(s \cap d) = P(s)\cdot P(d|s) = P(s)\cdot P(a, f|s)$$
via 
$$P(d|s) = \frac{P(s \cap d)}{ P(s)}$$
and extension (world where everything is $|s$ conditioned)
$$P([f|a]|s) = P(f|a,s) = \frac{P([a \cap f]|s)}{ P(a|s)}  \Rightarrow P(a,f|s) = P(a|s)\cdot P(f|a,s) $$
so
$$P(s,a,f)  = P(s)\cdot P(a, f|s) = P(s)\cdot P(a|s)\cdot P(f|a,s)$$
Explanation/interpretation is that we apply more and more constraints and examine the rest of space:

"what is the chance to be at state $s$" $\rightarrow$ "to be in state $s$ and take action $a$" $\rightarrow \dots$ 
$$\boxed{P(s,a,f)=P(s)\cdot P(a|s)\cdot P(f|a,s)}$$