In [1]:
#Name: Chaeyoon Kim
#City Email: Chaeyoon.Kim@city.ac.uk
#Chris Bishop, "Pattern Recognition and Machine Learning", Springer, 2006 (https://g.co/kgs/CsLSX8)

# Probability

A probability distribution expresses uncertainty about the outcome of an event. We often encode this uncertainty in a variable. In the *1st week tutorial question*, if we are considering the outcome of an event, $Y$, to be a coin toss, then we might consider $Y=1$ to be heads and $Y=0$ to be tails. We represent the probability of a given outcome with the notation:
$$
P(Y=1) = 0.5
$$
The first rule of probability is that the probability must normalize. The sum of the probability of all events must equal 1. So if the probability of heads ($Y=1$) is 0.5, then the probability of tails (the only other possible outcome) is given by
$$
P(Y=0) = 1-P(Y=1) = 0.5
$$

Probabilities are often defined as the limit of the ratio between the number of positive outcomes (e.g. *heads*) given the number of trials. If the number of positive outcomes for event $y$ is denoted by $n_y$ and the number of trials is denoted by $N$ then this gives the ratio 
$$
P(Y=y) = \lim_{N\rightarrow \infty}\frac{n_y}{N}.
$$
In practice we never get to observe an event infinite times, so rather than considering this we often use the following estimate
$$
P(Y=y) \approx \frac{n_y}{N}.
$$

# Conditioning

When predicting whether a fruit boxes returns apples or oranges, we might think that this event is *independent* of the each boxes. If we include an observation such as a red box, then in a probability this is known as *conditioning*. We use this notation, $P(Y=y|X=x)$, to condition the outcome on a second variable (in this case red box or blue box). Or, often, for a shorthand we use $P(y|x)$ to represent this distribution (the $Y=$ and $X=$ being implicit). Because we don't believe either a coin toss or a fruit selection or whatever depends on case then we might write that 
$$
P(y|x) = p(y).
$$

#### Making a link to the exercise of the PoDS! (part1)
Let's use this rule to compute the approximate probability that female passengers from the Tatanic dataset has aged 40 to 50 and paid more than or equal to 40.

### The Product Rule

This number is the joint probability, $P(Y, X)$ which is much *smaller* than the conditional probability. The number can never be bigger than the conditional probability because it is computed using the *product rule*.
$$
p(Y=y, X=x) = p(Y=y|X=x)p(X=x)
$$
and $$p(X=x)$$ is a probability distribution, which is equal or less than 1, ensuring the joint distribution is typically smaller than the conditional distribution.

The product rule is a *fundamental* rule of probability that *I must remember it!* It gives the relationship between the two questions: 1) What's the probability that female passengers was aged 40 to 50? and 2) What's the probability that female passengers has paid over or equal to 40?

In our shorter notation we can write the product rule as
$$
p(y, x) = p(y|x)p(x)
$$

### The Sum Rule

The other *fundamental rule* of probability is the *sum rule* this tells us how to get a *marginal* distribution from the joint distribution. Simply put it says that we need to sum across the value we'd like to remove.
$$
P(Y=y) = \sum_{x} P(Y=y, X=x)
$$
Or in our shortened notation
$$
P(y) = \sum_{x} P(y, x)
$$

## Bayes' Rule

Bayes rule is a very simple rule, don't be afraid it too much. It follows directly from the product rule of probability. Because $P(y, x) = P(y|x)P(x)$ and by symmetry $P(y,x)=P(x,y)=P(x|y)P(y)$ then by equating these two equations and dividing through by $P(y)$ we have
$$
P(x|y) = \frac{P(y|x)P(x)}{P(y)}, \text{ or } P(y|x) = \frac{P(x|y)P(y)}{P(x)}
$$
which is known as Bayes' rule (or Bayes's rule, it depends how I choose to pronounce it). It's not difficult to derive, and its importance is more to do with the semantic operation that it enables. Each of these probability distributions represents the answer to a question we have about the world. Bayes rule (via the product rule) tells us how to *invert* the probability.