*SMTB-2021 :: "A Practical Intro to Probability Theory."*
*a@bochkarev.io*

**Last time we discussed a model for random events:**
- Classical definition of probabilities; frequencies.
- Experiments with dice.
- Experiments with ~crocodiles~ black and while balls.
- A math model for random events: Probability space. Outcomes, events. Notes on Set theory.
- A couple of numerical illustrations for random events.

**This time, we start...**

# Topic 2: (in)dependence, tests, and Co.
- Crocodiles and rabbits revised (aka black and white balls).
- Dependent and Independent events.
- Conditional probability.
- COVID test as a random variable. Characterizing tests.

In [12]:
import numpy as np

  *First, let's warm up with some set theory...*

If we introduce a notation for a "complement" to any set $A$ (within $\Omega$), like this:
- $\bar{A}:=\Omega\setminus A$, i.e. including all elements $x\in \Omega$, such that $x\notin A$...

... then how can we calculate $\mathbb{P}(\bar{A})$, knowing $\mathbb{P}(A)$? In other words, what's the probability to have 2,3,4,5, or 6 on a dice?

$1-\mathbb{P}(A)$.

- Well, if we take $A$ and $B:=\bar{A}$, then by definition $A\cap B=\varnothing$,
- hence, $\mathbb{P}(A)+\mathbb{P}(B)=\mathbb{P}(A\cup B)= \mathbb{P}(\Omega)=1$.

(Just a useful trick.)

## Questions from the previous lecture.
### Probability for the "union" of events.
- we remember that *if* $A\cap B=\varnothing$, then $\mathbb{P}(A\cup B) = \mathbb{P}(A)+\mathbb{P}(B).$
- given the properties of probability, $\mathbb{P}$, how would you find $\mathbb{P}(A\cup B)$ for the case when $A\cap B \neq \varnothing$?
  
*Ideas / suggestions?*

We know that: $\mathbb{P}(A\cup B) = \mathbb{P}(A) + \mathbb{P}(B)$, if $A\cap B=\varnothing$ (i.e., no shared elements).

Note that:
- $\mathbb{P}(A)=\mathbb{P}(A\setminus B) + \mathbb{P}(A\cap B)$,
- $\mathbb{P}(B)=\mathbb{P}(B\setminus A) + \mathbb{P}(A\cap B)$.

(Right?)

Let's just sum up these two equations:
- $\mathbb{P}(A)+\mathbb{P}(B) = \Big[\mathbb{P}(A\setminus B) + \mathbb{P}(A\cap B)+\mathbb{P}(B\setminus A) \Big]+ \mathbb{P}(A\cap B)$.

Obviously, $\mathbb{P}(A\setminus B) + \mathbb{P}(A\cap B)+\mathbb{P}(B\setminus A)=\mathbb{P}(A\cup B)$

... therefore,
$$\Rightarrow \mathbb{P}(A\cup B) = \mathbb{P}(A)+\mathbb{P}(B) - \mathbb{P}(A\cap B) \nonumber$$

$$\mathbb{P}(A\cup B) = \mathbb{P}(A)+\mathbb{P}(B) - \mathbb{P}(A\cap B) \nonumber$$
Well, actually, we could have just imagined the picture:
![A and B](./images/AnB.png)

### "Rabbits and Crocodiles", revisited.
- Assume the hat still contains 10 🐇 and 5 🐊. Assume I am picking the animals out one by one, but now they run away every time. How would you assess the probability to pick out five rabbits (in a row)? 
  
  *Ideas?*

- "Good" outcomes: $10\cdot 9\cdot 8\cdot 7 \cdot 6=30,240$
- All outcomes: $15\cdot 14 \cdot 13\cdot 12 \cdot 11=360,360$
- therefore, the probability is $30,240/360,360 \approx 8\%$.

**sometimes we discuss general "combinatorial" concepts** instead:
(denoting $n! := n (n-1) \ldots 1$)

- number of options to choose $k$ elements out of $n$, **regardless of their order**, $A_n^K=$ *???*
- number of permutations of $k$ elements, $P_k=$ *???*
- now, choosing $k$ elements out of $n$, **taking care about their order** $C_n^k=$ ("combinations") *???*
- anyways, number of subsets of a set with $N$ elements = *???*.

**sometimes we discuss general "combinatorial" concepts** instead:
(denoting $n! := n (n-1) \ldots 1$)

- number of options to choose $k$ elements out of $n$, **regardless of their order**, $A_n^K=n! / (n-k)!$
- number of permutations of $k$ elements, $P_k=k!$ *???*
- now, choosing $k$ elements out of $n$, **taking care about their order** $C_n^k=\frac{n!}{k!(n-k)!}$ ("combinations")
- anyways, number of subsets of a set with $N$ elements = $2^N$ (*why?!*).

**Sidenote:** $2^N$, as well as $N!$ -- is a  **fast** growth!!

## Dependent and Independent Events.
We introduced probabilities as $(\Omega, \mathcal{F}, \mathbb{P})$ -- "Probability Space". Respectively,
- $\Omega$ -- a set of *outcomes* (a-la "atomic results").
- $\mathcal{F}$ -- a collection of *events* (obviously, these are subsets of $\Omega$).
- $\mathbb{P}$ -- "probability" function, mapping $\mathcal{F}$ to $[0,1]$.

We demand more or less natural things from these three. We do consider events as *sets* now. Given what we know about all this, how would you define *dependent* and *independent* events now?

E.g., intuitively speaking, are these events independent:
- getting HEAD in two consecutive tosses of a coin?
- getting ⚀ and ⚅ on different dice?
- ... on the same dice (in the same throw)?
- "getting ⚅" and "getting even"?
- "getting ⚅" and "getting odd"?

We could, for example, define independent events like this:
$A$ and $B$ are **independent** (by definition!) iff their probabilities satisfy:
$$\mathbb{P}(A\cap B)=\mathbb{P}(A)\mathbb{P}(B) \nonumber$$

Again, would this work for our examples?
- getting HEAD in two consecutive tosses of a coin?
- getting ⚀ and ⚅ on different dice?
- ... on the same dice (in the same throw)?
- "getting ⚅" and "getting even"?
- "getting ⚅" and "getting odd"?

### Example: double-checking.

Assume a machine-learning based system defines iron nugget's size from photographs. It does so using two independent cameras of different models.

**The goal:** is to spot cases when the size is off (the specifications).
- The first camera detects such cases with probability $0.6$; the second one, slightly newer model, with $0.8$. 

**Question:** what's the probability that the system will spot the problem, assuming it is present? (given two photos, one from each camera)

- first, it seems $0.6+0.8$ is just wrong? :)
- why in the world?.. this is just $\mathbb{P}(A\cup B)$...

- okay, let's just find the probability that neither of the cams will find it.
- obviously, this is $(1-0.6)(1-0.8)=0.4\cdot 0.2=0.08$.
- then, the probability we are interested in is just $1-0.08 = 0.92$!

*Wow-wow, waitasecond...*

In [13]:
success = 0
N = 50000

for i in range(N):
    # note: we are ASSUMING we have a defect!
    check_camera_1 = (np.random.uniform() <= 0.6)  # that's an event with prob 0.6
    check_camera_2 = (np.random.uniform() <= 0.8)  # this's the one with 0.8
    
    if check_camera_1 or check_camera_2:
        success += 1
        
print(f"Share of successful detections (simulated): {success*100 / N:.1f}%")

Share of successful detections (simulated): 92.1%


## Conditional probability
Sometimes, we want to think in terms of "conditionals". What's the probability of `something`, **assuming** `something else` has already happened?

- work email has been sent to a wrong client -- a scandal. What's the probability that this is our new intern Joe?
- what's the probability that my algorithm makes profit on a stock market assuming the prices going down?
- what's the probability that our method will detect cancer, assuming it is indeed there?

How would you define such a thing numerically -- say, $\mathbb{P}(A|B)$? (А "given" B.) For starters, in terms of the number of outcomes:

- well, like a share of outcomes, right?
- that is, #`both A and B happened` / #`B happened`.

pursuing the same logic further:
$$\mathbb{P}(A|B) := \frac{\mathbb{P}(A\cap B)}{\mathbb{P}(B)}\nonumber$$

By the way, if we have
$$\mathbb{P}(A|B) := \frac{\mathbb{P}(A\cap B)}{\mathbb{P}(B)}\nonumber$$

we could instead call events *independent* by definition iff $\mathbb{P}(A|B)=\mathbb{P}(A).$ (Which is intuitively pretty clear, right?)

### A quick lifehack: "Law of Total Probability".
Note that $$\mathbb{P}(A\cap B) = \mathbb{P}(A|B) \mathbb{P}(B)\nonumber$$

So, if I "split" our $\Omega := B_1 \cup \ldots \cup B_K$ so that $B_i\cap B_j=\varnothing$, then

$$\mathbb{P}(A) = \mathbb{P}(A\cap\Omega) =\mathbb{P}(A\cap B_1)+ \ldots + \mathbb{P}(A\cap B_K)=\mathbb{P}(A|B_1)\mathbb{P}(B_1) + \ldots + \mathbb{P}(A|B_K)\mathbb{P}(B_K)\nonumber$$

**For example:** sometimes it is easier to work with conditional probabilities than with absolutes. We have three cars in the rental: the first one will break down in the first day with prob 10%, the second one -- 30%, the third -- 100%. You are getting a car at random. What's the probability to get stuck on the road today?..

*(Check out the course folder for the [solution](./2021-07-30-Total_prob.pdf))*

**ProTip:** if you own the car rental -- pick up probabilities so that the clients would get stuck at most 50% of the time (so you are "overall okay").

## Speaking about tests.
This concept pops up a lot when we talk about *tests*. Let's discuss a COVID test, of course [*]

[*] does not really correspond to the real numbers -- everything is made up.

In [2]:
## Let's wrap it up beautifully, for discussion.
class Life:
    """Determines the real condition."""
    def __init__(self):
        self.COVID_prob = np.random.uniform(0, 0.6)
        
    def happen(self):
        outcome = np.random.uniform()
        if outcome <= self.COVID_prob:
            return "💀"
        else:
            return "😊"
        
class COVIDtest:
    """Describes a test."""
    def __init__(self):
        self.recall = np.random.uniform(0.6,1.0)
        self.specificity = np.random.uniform(0,0.3)
        
    def test(self, life_state):
        """Gives the test result, depending on the true state."""
        outcome = np.random.uniform()
        if life_state == "💀":
            if outcome <= self.recall:
                return "😷 (sick)"
            else:
                return "👍 (healthy)"
        else:
            if outcome <= self.specificity:
                return "👍 (healthy)"
            else:
                return "😷 (sick)"
            
def make_table(TP, FP, TN, FN):
    N = TP+FP+TN+FN
    print(f"{'Truth / Тest ->':<15} | {'😷 (+ sick)':^15} | {'👍 (- healthy)':^15} | {'∑':^15} | %")
    print(f"{'=':=<80}")
    print(f"{'💀 ':<15}   {TP:^15,}   {FN:^15,}   {TP+FN:^15,}   {(TP+FN) / N:.2f}")
    print(f"{'😊 ':<15}   {FP:^15,}   {TN:^15,}   {FP+TN:^15,}   {(FP+TN) / N:.2f}")
    print(f"{'-':-<80}")
    print(f"{'∑  ':<15}   {TP+FP:^15,}   {FN+TN:^15,}   {N:^15,}    1.0")
    print(f"{'%  ':<15}   {(TP+FP)/N:^15.2f}   {(FN+TN)/N:^15.2f}   {'100%':^15}")

In [5]:
import numpy as np
life = Life()
s=''
doc = COVIDtest()

In [19]:
s += life.happen()
s

'💀😊😊😊💀😊😊😊😊😊😊😊😊💀'

In [29]:
state = life.happen()
print(f"{state} --> {doc.test(state)}")

😊 --> 😷 (sick)


How would we characterize the test? (experimentally or theoretically)

- **option 1**. Virus detection, right?

In [31]:
success = 0
sick_cases = 0
N = 10000

for n in range(N):
    state = life.happen()
    if state == "💀":
        sick_cases += 1
    res = doc.test(state)
    if state == "💀" and res == "😷 (sick)":
        success += 1
        
print(f"Got {success*100 / sick_cases:.1f}% cases (population: {sick_cases} sick per {N} patients)")

Got 76.2% cases (population: 2408 sick per 10000 patients)


**Do we have a problem here?**

In [32]:
def run_experiments(life, doc, N=1000):
    results = {("💀","😷 (sick)"):0, ("💀","👍 (healthy)"):0, ("😊","😷 (sick)"):0, ("😊","👍 (healthy)"):0}

    for n in range(N):
        state = life.happen()
        result = doc.test(state)
        results[(state,result)] += 1

    TP = results[("💀","😷 (sick)")]
    FP = results[("😊","😷 (sick)")]
    TN = results[("😊","👍 (healthy)")]
    FN = results[("💀","👍 (healthy)")]

    return TP, FP, TN, FN

In [33]:
TP, FP, TN, FN = run_experiments(life, doc, 10000)
make_table(TP, FP, TN, FN)

Truth / Тest -> |   😷 (+ sick)    |  👍 (- healthy)  |        ∑        | %
💀                      1,782              548              2,330        0.23
😊                      5,421             2,249             7,670        0.77
--------------------------------------------------------------------------------
∑                      7,203             2,797            10,000         1.0
%                      0.72              0.28              100%      


**Other characteristics of tests:**
- percent of "type I errors" -- "False positives"
- ... type II errors -- "False negatives"
- sensitivity *aka* recall *aka* true positive rate = `True Positives` / `Positives`.
- specificity *aka* selectivity *aka* true negative rate = `True Negatives` / `Negatives`.

Basically always it's a balance between "Specificity" vs. "Sensitivity" (see [Wiki](https://en.wikipedia.org/wiki/Precision_and_recall) on this)

In [34]:
# So the true parameters are:
print(f"P(COVID) = {life.COVID_prob:.2f}")
print(f"Recall = {doc.recall:.2f}, Specificity = {doc.specificity:.2f}")

# Sample parameters:
print(f"Recall (sample) = {TP / (TP+FN):.2f}, Specificity (sample) = {TN/(TN+FP):.2f}")

P(COVID) = 0.24
Recall = 0.75, Specificity = 0.30
Recall (sample) = 0.76, Specificity (sample) = 0.29


### Some more examples of "tests"

- air defence systems and false alarms;
- spam filtering;
- image recognition problems (e.g., handwriting)
- QC: finding defects in production;
- TSA / luggage and passengers
- biometric identification;
- clinics: testing vs screening (we can have many false positives, but need it cheap!)

## Bayes formula

Sometimes we want our probability "inside out".
**What for?** What's the probability if I have COVID if I have a running nose? Assuming we know:
- share of people with running nose (on the average),
- sharae of people with COVID,
- and how often there's running nose with COVID...

so, A = {I have a virus}, B = {running nose}, and we know probabilities for A, B, and $\mathbb{P}(B|A)$. *What do we do further?*

We will come back to the example today, but let us first get the key formula for this:

Note that $\mathbb{P}(A\cap B) = \mathbb{P}(A|B)\mathbb{P}(B) = \mathbb{P}(B|A)\mathbb{P}(A)$. Hence:

$$\mathbb{P}(A|B) = \frac{\mathbb{P}(B|A)\mathbb{P}(A)}{\mathbb{P}(B)}\nonumber$$

### Numerical example: COVID test

Assume I have made an antibody test (say, in Moscow), and it is positive. What's the probability that I do have antibodies?

We know that:
- in Moscow, ~5% of population had antiboides[*].

The test documentation contains the following data:
- Sensitivity ("True Positive Rate") = 99%
- Specificity ("True Negative Rate") = 98%

So, what do we do?

[*] at the moment of writing this first time.

- denote "+" to mean "the test is positive", "✅" -- antibodies are indeed there; "❌" -- no antibodies. Then:

$$\mathbb{P}(✅|+) = \frac{\mathbb{P}(+|✅)\mathbb{P}(✅)}{\mathbb{P}(+)}\nonumber$$

- we have a problem in the denomitator. Let's expand it (using this "Law of Total Probability"):

$$\mathbb{P}(+) = \mathbb{P}(+|✅)\mathbb{P}(✅) + \mathbb{P}(+|❌)\mathbb{P}(❌)\nonumber$$

- so, we can just plug in the numbers:

- obviously, $\mathbb{P}(✅)=0.05$. Then, $\mathbb{P}(❌) = 1-0.05 = 0.95$.
- $\mathbb{P}(+|✅)$ -- this is "sensitivity", 0.99
- now we need $\mathbb{P}(+|❌)$ = 1 - Specificity = 0.02.

therefore,

$$\mathbb{P}(✅|+) = \frac{\mathbb{P}(+|✅)\mathbb{P}(✅)}{\mathbb{P}(+|✅)\mathbb{P}(✅) + \mathbb{P}(+|❌)\mathbb{P}(❌)} = \frac{0.99\cdot0.05}{0.99\cdot 0.05 + 0.02\cdot 0.95}\nonumber$$

In [35]:
Sensitivity = 0.99
Specificity = 0.98
Pplus = 0.05

print(f"For ℙ(✅)={Pplus:.2f}: ℙ(✅|+)={(Sensitivity * Pplus)/(Sensitivity*Pplus + (1-Specificity)*(1-Pplus)):.2f}")

# Now, if we have more people with antibodies:
Pplus = 0.5
print(f"For ℙ(✅)={Pplus:.2f}: ℙ(✅|+)={(Sensitivity * Pplus)/(Sensitivity*Pplus + (1-Specificity)*(1-Pplus)):.2f}")

For ℙ(✅)=0.05: ℙ(✅|+)=0.72
For ℙ(✅)=0.50: ℙ(✅|+)=0.98


*SMTB-2021 :: "A Practical Intro to Probability Theory."*
*a@bochkarev.io*

# Topic 2: (in)dependence, tests, and Co.
- Crocodiles and rabbits revised (aka black and white balls).
- Dependent and Independent events.
- Conditional probability.
- COVID test as a random variable. Characterizing tests.

## Next time: random variables
- Definition of the random variable.
- Quick examples: score on dice; an RV with countably many values (Poisson)
- Bernoulli scheme (a "biased" coin) and Binomial distribution (counting the number of "Heads").
- PMF, CDF.
   
**Thinking with 🍵:** review the lecture, maybe (the notebook).