# Conditional Probability and Bayes Theorem

Conditional probability is about determining the probability of an event $A$ *conditioned on* a piece of information $B$. This is denoteed $P(A \mid B)$.

What 'conditioned on' means can vary a bit. It's simplest application it for looking at results of experimental data divided into subsets; it can be used to find probabilities of events within that subset. For example 'what is the probability that a randomly selected student is female, given that they are a Comp Sci student'. This is equivalent to $P(F \mid CS)$

Here, and in general, we can use the foundational property of conditional probability:

$$P(A\mid B) = \frac{P(A \cap B)}{P(B)}$$

(note the numerator there can also be called the **marginal probability**, and denoted $P(\text{A, B})$)

Using a simple rearrangement we can also state that in pure conditional terms:

$$P(A \mid B) = \frac{P(B\mid A)P(A)}{P(B)}$$

This is **Bayes Theorem**.

The denominator there, $P(B)$, might benefit from unpacking a bit. In the mini example above, this would be the probability that a random student of any gender is a Comp Sci student.

If the numerator is the *conditioned* probability of $B$, in general you could say that the denominator is the *unconditioned* probability of $B$, assessing $P(B \mid A)P(A)$ over every possible value of $A$.

So if you have a bunch of possible outcomes of $B$, where the outcome depends on information/event $A$, the probability of event $A$ conditioned on $B$ is equal to the probability of $B$ given $A$, divided by the probablity of $B$ across all possible values of $A$.

This then is equivalent to saying $$f(B) = \int_{-\infty}^\infty f(B\mid A)f(A) dA$$

Giving the full generalised version of Bayes theorem for continous variables of

$$f(A\mid B) = \frac{f(B\mid A)f(A) }{\int_{-\infty}^\infty f(B\mid A)f(A) dA}$$

(For discrete variables you would use the *sum* across values of A as the denominator)

$$f(A\mid B) = \frac{f(B\mid A)f(A) }{\sum_{A} f(B\mid A)f(A)}$$

We saw above how you can apply conditional probability across subsets of experimental data. A different use would be to make assessments about the underlying distribution of a variable where you have or will have experimental data and want to infer from that the statistical parameters of the distribution. This is Statistical Inference in the Bayesian Paradigm, or **Bayesian Inference**.

The idea here is that you are testing a random variable $X$ and are going to perform some experiments to get results of $X$. You can guess that $X$ is approximated by some type of distribution, but you want to know the statistical parameters, $\theta$ of that distribution. You also have a **Prior**, or a intial guess of what those parameters could take, and what the probability function, $f(\theta)$ is, before actually conducting the experiment. The idea of Bayesian Inference is to use Bayes' Theorum to come to a **posterior** result, or a new, better guess of $\theta$ after you have conducted your experiment and got result $x$.

In statistical language:

$$f(\theta \mid x) = \frac{f(x \mid \theta)f(\theta)}{\int_{-\infty}^\infty f(x \mid \theta)f(\theta) d\theta}$$