# Probability

### Learning Objectives:
- [Random Variables](#Random-Variables)
- [Introduction to Probability](#Introduction-to-Probability)
- [Joint Probability](#Joint-Probability)
- [Marginal Probability](#Marginal-Probability)
- [Conditional Probability](#Conditional-Probability)

# Random Variables
A __random variable__ is defined as a variable whose values depend on outcomes of random events. For example, when we carry out an experiment, we never get exactly the same result twice, maybe due to some measurement errors or some flaws in the system we are using to measure. But what is __randomness__? 

Some argue that this variation is not random and simply due to high-complexity processes that we are in no place of following accurately. For example, Robert Brown, a famous physicist, saw that pollen particles in water seem to bounce around randomly by bouncing about the water molecules. Some would argue that this was not random, and that if we knew where __all__ water molecules were and where they were going at all times, we could predict exactly where they would be later on. However, this is impossible to do in practice, and there was far more success when treating these as random forces!

How do we generate a random value? We can use in-built functions, such as the ones from the __random__ and __numpy__ modules to generate random values. Numpy comes with the added functionality of creating vectors and matrices of random values in one line:

In [None]:
import random
import numpy as np

# Generating randomness with random!
random_number = random.randint(0,4) # random integer between 0 and 4
random_number2 = random.random() # random floating point value between 0 and 1
print("random module:", random_number, random_number2)

# Generating randomness with numpy!
random_number = np.random.uniform(0,4) # random value from uniform distribution between 0 and 4
random_number2 = np.random.normal(0,1) # random value from standard normal distribution
print("numpy module:",random_number, random_number2)

print()

# Generating a random vector and matrix
random_vector = np.random.normal(0,1,5) # 1x5 vector from standard normal distribution
random_matrix = np.random.uniform(0,4,(2,2)) # 2x2 matrixrandom values from uniform distribution between 0 and 4
print(random_vector)
print(random_matrix)

When it comes to random variables, we cannot predict exactly what values they will take, but rather, we focus on determining on the likelihood of taking each value using various statistical methods. But how do we quantify ___how likely___ a value is to take place?


#  Introduction to Probability
__Probability__ is a measure of how likely an outcome is to occur given all other possible outcomes and the given circumstances. Therefore, when dealing with random variables, we do not concern ourselves with __what will happen__, but instead with __the probability of given outcome(s), also known as events, occuring__. Probability is formally defined as:

$$ \text{Probability of an outcome} = \frac{\text{Number of wanted outcomes}}{\text{Number of possible outcomes}} $$

Intuitively, we know that the probability of something occurring has to be somewhere between 0, where the outcome _cannot_ occur, and 1, where the outcome _will_ occur. Given the events A and B, we can also define $P(A\cup B)$ as the probability of ___either___ A or B taking place and $P(A\cap B)$ as the probability of ___both___ A and B taking place. In terms of digital logic, __union__( $\cup$ ) corresponds to the __or__ operator and __intersection__( $\cap$ ) to the __and__ operator. Using these, we can define key properties of probability theory: 

$$1. P(A) = 1-P(A') $$
$$2. \sum_{i=1}^{N}P(A_{i}) = 1$$
$$3. P(A\cup B) = P(A) + P(B) - P(A\cap B)$$

### Maybe add a small Venn diagram so they picture it better.

A' is the __complement__ of A, and represents all events that are not included in A. The complement behaves like the __not__ operator, meaning the first property implies that the probability of an event taking place is equal to 1 minus the probability of the event __NOT__ taking place. If the probability of raining tomorrow is 0.8, then the probability that it will NOT rain tomorrow is 1 - 0.8 = 0.2. The second property implies that the sum of the probability of any of the possible events must equal to 1. This makes sense, since out of all possible outcomes, at least one must take place. For example, there is a probability of 1 that it will either rain or not tomorrow, and there is a probability of 1 that the height of a human we pick at random will be between 0 and infinity! The third property shows that the probability of events A _or_ B occuring is given by the _sum_ of the probability of A with the probability of B, subtracted by the probability of both events taking place. This is because the by adding the two individual probabilities, the intersection between them is accounted for twice. If two events A and B cannot occur simultaneously, they are known as __mutually exclusive__.


# Joint & Marginal Probability



# Conditional Probability
While quantities are random, probabilities are the measure of the likelihood of an event occuring. Under different circumstances, things may become more or less likely to happen. Given that it is sunny now, the probability of raining soon is smaller than if it was cloudy. This leads to the field of __conditional probability__, which is the probability of an event taking place ___given___ another event has occured. If we consider $P(A|B)$ to represent the probability of A occuring _given_ B has occured, we can define it with the equation below.

$$P(A|B) = \frac{P(A\cap B)}{P(B)}$$

While there is no formal proof for the equation above, we can make intuitive sense of it. There is a probability associated with the uncertainty of whether B occurs or not. So if we assume that A is __dependent__ on B, meaning the probability of A differs depending on whether B does or does not take place, knowing that B has already happened removes some uncertainty from whether A and B will occur propertional to how likely B was to occur in the first place.

For example, let's say that I'm a birdwatcher following a rare yellow flamingo. My goal is to find the flamingo, and take a picture of him that we can publish on the _BirdsBirdsBirds Weekly_ magazine. For my success, both the event of finding it and getting a good picture need to take place. However, if I have already found the yellow flamingo, the probability of getting a good picture that day increases proportionally to how hard it was to find it in the first place!

We can also establish the __law of total probability__, which is given by:

$$P(B) = \sum_{i=1}^{N}P(B|A_{i})P(A_{i}) $$

Where N is the total number of prior outcomes, $A_{i}$ is the $i^{th}$ possible outcome prior to B, where all of the prior events are mutually exclusive (cannot both occur at the same time). If we analyse the equation more closely, we can see that the sum of the probabilities of $A_{i}$ and B both occuring for all possible prior events $A_{i}$ are the individual components that make up the probability of B taking place. 

For example, let's assume that if I don't find a flamingo, I can still get a picture of one from one of my sources. The probability of me getting a picture of a yellow flamingo is just the sum of the probability of _finding_ a flamingo and getting a picture and _not finding_ a flamingo and getting a picture.

The law of total probability enables us to derive another incredibly useful theorem, known as __Bayes' Theorem__, which is used for revising predictions (updating probabilities) given additional evidence. Bayes Theorem is given as follows:

$$P(A_{j}|B) = \frac{P(A_{j}\cap B)}{P(B)} = \frac{P(B\cap A_{j})}{P(B)} =  \frac{P(B|A_{j})P(A_{j})}{\sum_{i=1}^{N}P(B|A_{i})P(A_{i})} $$

This equation dictates that the probability of that the prior outcome $A_{j}$ has taken place given that the event B has now taken place is given by the ratio between the probability that $A_{j}$ and B occured and the probability that B followed any possible event $A_{i}$.

We will now look at a quick example to understand these concepts in practice, shown in the __tree diagram__ below, with the properties:
- Imagine a type of bolt that can be produced either in factory A or factory B. They sometimes end up defective.
- 60% of bolts are produced in A and 40% of bolts are produced in B
- 2% of bolts produced in A are defective and 4% of bolts produced in B are defective

<img src="tree.png" alt="tree-diagram"
	title="Tree diagram of the bolt production process" width="750px" height="500px" />
    
Given the diagram above and the process, we can answer the following questions:
1. What is the probability that the bolt is from factory A and it is defective? <br>
$P(A\cap D) = P(D|A)P(A) = 0.02\cdot 0.6 = 0.012 $
2. Using the law of total probability, what is the probability that a bolt is defective? <br>
$P(D) = \sum_{i=1}^{N}P(D|A_{i})P(A_{i}) = P(D|A)P(A) + P(D|B)P(B) = 0.02\cdot 0.6 + 0.04\cdot 0.4 = 0.028 $
3. Using Bayes Theorem, what is the probability that a bolt is from factory B, given that it is defective? <br>
$P(B|D) = \frac{P(D|B)P(B)}{\sum_{i=1}^{N}P(D|A_{i})P(A_{i})} = \frac{0.04\cdot 0.4}{0.04\cdot 0.4 + 0.02\cdot 0.6} = 0.57$
4. Using Bayes Theorem, what is the probability that a bolt is from factory A, given that it is defective? <br>
$P(A|D) = \frac{P(D|A)P(A)}{\sum_{i=1}^{N}P(D|A_{i})P(A_{i})} = \frac{0.02\cdot 0.6}{0.04\cdot 0.4 + 0.02\cdot 0.6} = 0.43$

So with the laws of probability we've gone over, we were able to calculate the probability that a bolt is defective no matter where it came from originally and even what factories are responsible for the majority of the defective bolts, which in this case is factory B!

Lastly, if the occurance of an event does not affect the probability of another, meaning for instance $P(A|B) = P(A)$, A and B are __independent events__. If I eat a croissant for breakfast today, this will not affect the probability of it raining in two weeks time. Given the above equation, we can conclude that for independent events A and B:
$$P(A \cap B) = P(A)P(B)$$