# Slots Vacany According to Bayes Theorem
Very often, we'll want consider more than one random variable at a time. 
For instance, we may want to model the relationship between diseases and symptoms.
Given a disease and symptom, say 'flu' and 'cough', 
either may or may not occur in a patient with some probability.
While we hope that the probability of both would be close to zero,
we may want to estimate these probabilities and their relationships to each other
so that we may apply our inferences to effect better medical care.

As a more complicated example, images contain millions of pixels, thus millions of random variables. 
And in many cases images will come with a label, identifying objects in the image.
We can also think of the label as a random variable.
We can even get crazy and think of all the metadata as random variables
such as location, time, aperture, focal length, ISO, focus distance, camera type, etc.
All of these are random variables that occur jointly. 
When we deal with multiple random variables, 
there are several quantities of interest.
The first is called the joint distribution $\Pr(A, B)$. 
Given any elements $a$ and $b$,
the joint distribution lets us answer,
what is the probability that $A=a$ and $B=b$ simulataneously?
It might be clear that for any values $a$ and $b$, $\Pr(A,B) \leq \Pr(A=a)$. 

This has to be the case, since for $A$ and $B$ to happen, 
$A$ has to happen *and* $B$ also has to happen (and vice versa). 
Thus $A,B$ cannot be more likely than $A$ or $B$ individually. 
This brings us to an interesting ratio: $0 \leq \frac{\Pr(A,B)}{\Pr(A)} \leq 1$. 
We call this a **conditional probability** and denote it by $\Pr(B|A)$, 
the probability that $B$ happens, provided that $A$ has happened. 

Using the definition of conditional probabilities, 
we can derive one of the most useful and celebrated equations in statistics - Bayes' theorem. 
It goes as follows: By construction, we have that $\Pr(A, B) = \Pr(B|A) \Pr(A)$. 
By symmetry, this also holds for $\Pr(A,B) = \Pr(A|B) \Pr(B)$. 
Solving for one of the conditional variables we get:
$$\Pr(A|B) = \frac{\Pr(B|A) \Pr(A)}{\Pr(B)}$$

This is very useful if we want to infer one thing from another, 
say cause and effect but we only know the properties in the reverse direction. 
One important operation that we need to make this work is **marginalization**, i.e., 
the operation of determining $\Pr(A)$ and $\Pr(B)$ from $\Pr(A,B)$.
We can see that the probability of seeing $A$ amounts to accounting 
for all possible choices of $B$ and aggregating the joint probabilities over all of them, i.e. 

$$\Pr(A) = \sum_{B'} \Pr(A,B') \text{ and } \Pr(B) = \sum_{A'} \Pr(A',B)$$

A really useful property to check is for **dependence** and **independence**. 
Independence is when the occurrence of one event does not influence the occurrence of the other.
In this case $\Pr(B|A) = \Pr(B)$. Statisticians typically use $A \perp\!\!\!\perp B$ to express this. 
From Bayes Theorem it follows immediately that also $\Pr(A|B) = \Pr(A)$. 
In all other cases we call $A$ and $B$ dependent. 
For instance, two successive rolls of a dice are independent. 
On the other hand, the position of a light switch and the brightness in the room are not 
(they are not perfectly deterministic, though, 
since we could always have a broken lightbulb, power failure, or a broken switch). 

## Dealing With Our Data
Assume that our model is really accurate (which it's true).
It fails with 2% probability 
if the slot is vacant and the predictions returns that it's occupied,
Or it fails with 1% probability if the slot is occupied and the predictions returns that it's vacant.

We use $P$ to indicate the prediction and $T$ to denote the slot status,i i.e. the true label.
Written as a table the outcome $\Pr(P|T)$ looks as follows:

|                          | Slot is vacant (T = 0) | Slot is occupied (T = 1) |
|:-------------------------|-----------------------:|-------------------------:|
|Predicted vacant (P = 0)  | 0.98                   | 0.01                     |
|Predicted occupied (P = 1)| 0.02                   | 0.99                     |

Note that the column sums are all one (but the row sums aren't), 
since the conditional probability needs to sum up to $1$, just like the probability. 
Let us work out the probability of the slot being vacant if the prediction say that it's vacant. 
Obviously this is going to depend on how common the slot is vacant, since it affects the number of false alarms.

### Example 1 (busy parking lot)
Assume that we are in a busy parking lot when only *10% of the time the slot is vacant*, e.g. $\Pr(\text{slot is vacant}) = 0.1$.

To apply Bayes Theorem we need to determine 

$$\Pr(\text{predicted vacant}) = \Pr(P=0|T=0) \Pr(T=0) + \Pr(P=0|T=1) \Pr(T=1) = 0.98 \cdot 0.1 +0.01 \cdot 0.9 = 0.107$$

So $\Pr(T=0) = 0.1$ and $\Pr(P=0) = 0.107$.

Hence we get $$\Pr(T = 0|P = 0) = \frac{\Pr(P=0|T=0) \Pr(T=0)}{\Pr(P=0)} = \frac{0.98 \cdot 0.1}{0.107} = 0.915$$
In other words, there's only a 91.5% chance that the slot is actually vacant, despite using a model that is around 98-99% accurate!.

### Example 2 (not so busy parking lot)
Now assume that the parking lot is quite empty, around *95% of the time the slot is vacant*, e.g. $\Pr(\text{slot is vacant}) = 0.95$.

To apply Bayes Theorem we need to determine 

$$\Pr(\text{predicted vacant}) = \Pr(P=0|T=0) \Pr(T=0) + \Pr(P=0|T=1) \Pr(T=1) = 0.98 \cdot 0.95 +0.01 \cdot 0.05 = 0.9315$$

So $\Pr(T=0) = 0.95$ and $\Pr(P=0) = 0.9315$.

Hence we get $$\Pr(T = 0|P = 0) = \frac{\Pr(P=0|T=0) \Pr(T=0)}{\Pr(P=0)} = \frac{0.98 \cdot 0.95}{0.9315} = 0.9994$$
There's a 99.9% chance that the slot is actually vacant, which is great!.

### Example 3 (50/50 parking lot)
Now assume that around *50% of the time the slot is vacant*, e.g. $\Pr(\text{slot is vacant}) = 0.50$.

To apply Bayes Theorem we need to determine 

$$\Pr(\text{predicted vacant}) = \Pr(P=0|T=0) \Pr(T=0) + \Pr(P=0|T=1) \Pr(T=1) = 0.98 \cdot 0.50 +0.01 \cdot 0.50 = 0.495$$

So $\Pr(T=0) = 0.50$ and $\Pr(P=0) = 0.495$.

Hence we get $$\Pr(T = 0|P = 0) = \frac{\Pr(P=0|T=0) \Pr(T=0)}{\Pr(P=0)} = \frac{0.98 \cdot 0.50}{0.495} = 0.9898$$
There's a 98.98% chance that the slot is actually vacant, which is quite great!. 


## What's The Point With This Example?
I don't know, maybe that in the future when we deal with slots recommendation (for navigation), we should remember to take into account also the $\Pr(\text{slot is vacant})$, which can be easily calculated given slot history.

**Lets plot the outcome probability given different $\Pr(T=0)$:**

In [None]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

# Set default figure size
matplotlib.rcParams['figure.figsize'] = [12, 8]

In [None]:
Pr_T = np.linspace(0.00, 1.0, 101)
Pr_P = 0.98 * Pr_T + 0.01 * (1 - Pr_T)
prob = (0.98 * Pr_T) / Pr_P

plt.plot(Pr_T, prob)
plt.xlabel('Pr(slot is vacant)')
plt.ylabel('Pr(slot is vacant | Predicted vacant)')
plt.ylim([0.85, 1.0])
plt.plot([0, 1], [0.99, 0.99], 'r-', lw = 2)
plt.grid()
plt.show()

We can see that for small values of $\Pr(T=0)$ the outcome probability isn't very promising...