# Probability

we need to first sort some terms out.

### sample space

The sample space is the set of all possible outcomes of an experiment. For example, the sample space of a coin toss is $\{H,T\}$.

### event

An event is a subset of the sample space. For example, the event "the coin lands on heads" is $\{H\}$.

### probability

The probability of an event is the number of outcomes in the event divided by the number of outcomes in the sample space. For example, the probability of the event $\{H\}$ is $\frac{1}{2}$.


> Measure of uncertainty: It is tempting to think of probability as representing some natural randomness in the world. That might be the case. But perhaps the world isn't random. I propose a deeper way of thinking about probability. There is so much that we as humans don't know, and probability is our robust language for expressing our belief that an event will happen given our limited knowledge. This interpretation acknowledges your own uncertainty of an event. Perhaps if you knew the position of every water molecule, you could perfectly predict tomorrow's weather. But we don't have such knowledge and as such we use probability to talk about the chance of rain tomorrow given the information that we have access to.

> - [Probability for computer scientist](https://chrispiech.github.io/probabilityForComputerScientists/en/part1/probability/)

### Axioms of probability

1. $P(\emptyset) = 0$

2. $P(S) = 1$

3. $P(A \cup B) = P(A) + P(B) - P(A \cap B)$

4. $P(A) \geq 0$

5. $P(A) \leq 1$


## Equally Likely Outcomes

If all outcomes are equally likely, then the probability of an event is the number of outcomes in the event divided by the number of outcomes in the sample space.

In real problems, it is more much more difficult than it seems. 

1. how to define the sample space and argue that all outcomes in your sample space is equally likely?

2. count the number of elements in the sample space.

3. count the number of elements in the event.


## Mutually exclusive events and Non-mutually exclusive events

P(A or B) = P(A) + P(B) - P(A and B)

mutually exclusive events: P(A and B) = 0

## Conditional Probability

P(A|B) = P(A and B) / P(B)

this is widely used in machine learning, especially recommendation system.

netflix use this to find out the probability of a user like a movie given the user has watched a movie.

P(E) = lim n->inf count(E) / n = count(people who watched movie E) / count(people on netflix)

P(E|F) = P(E and F) / P(F) = count(people who watched movie E and F) / count(people who watched movie F)


## Independence

## Law of Total Probability

P(A) = P(A|B)P(B) + P(A|B')P(B')


## Bayes' Theorem

P(A|B) = P(B|A)P(A) / P(B)

this theorem exceptionally useful when you fall into this kind of problem:

"how can i update my belief about something, which is not directly observable, given my observation of something else, of which it depended upon and directly observable?"

it turns out we use this kind of heuristic all the time. why do use exam to test people's ability? because we can observe the exam result, but we can't observe the ability of the test taker, but we can use it to infer the ability of the test taker.

there are names for the different terms in Bayes' Theorem:

1. P(A|B) is called the **posterior** probability of A given B.

2. P(B|A) is called the **likelihood** of B given A.

3. P(A) is called the **prior** probability of A.

4. P(B) is called the **marginal likelihood** of B.


## Log Probability

why do we use log probability?

computers are not good at dealing with small numbers. so we use log probability to avoid this problem. log(0.00001) = -11.51. 

logs can turn multiplications into additions. so we can use log probability to avoid the problem of underflow.

log(P(A) * P(B)) = log(P(A)) + log(P(B))