# Sample spaces

Sample spaces are the fundamental building blocks of probability theory. Here's an example using a scenario from a typical systems neurosceince experiment. 

Imagine there's a mouse in an experiment box with two levers available. We'll call them lever $f$ for food and lever $w$ or water, when the mouse pulls the appropriate lever it gets either food or water. When the mouse makes a choice to select either of two, the **random event** of the mouse picking a lever crystalizes into an sample (namely that either water or food appear). The sample space here is the set of all choices the mouse can make. So if we denote the sample space by the script $\mathbb{S}$, it has two sets in it: 

$$\mathbb{S} = \big(f, w \big)$$

$f$ is the set that contains food, while $w$ is the set that contains water.

Let's further say that $f$ consists of 3 different foods $f = \{oats, seeds, apples\}$ and that the set $w$ consists of 3 flavors of water $w = \{lemon, strawberry, cucumber\}$


To reiterate, a sample space is the space of all possible things that can happen in an experiment. In this case the sample space consists of two sets, and each of those sets has multiple elements. 

$$\mathbb{S} = \big(f = \{oats, seeds, apples\}, w = \{lemon, strawberry, cucumber\} \big)$$


# Random variables

The term random variable is a misnomer. Random variables are neither random nor variables. Random variables are **functions**, that take as input random events (this is where the randomness comes in) and **deterministically map** those events to some output space. 

To be overly technical for a moment we'll denote a random variable by a capital letter, then say $P(A = a)$ is the probability that the random variable $A$ equals the specific value $a$. We can be messy with notation and instead of writing $P(A = a)$,  simply write $P(a)$. While the above is sloppy, it isn't incorrect. On the other hand, if someone writes $P(A)$ and they really do intend $A$ to be a random variable, they are confused, as that makes no sense. 


In the systems neurscience example of the mouse picking a lever to press, we'll say that the random variable is the process of the mouse making the choice. Then when the choice is made, the event of food or water appearing cyrstalizes. Notice that the randomness is in the animal making the choice. However when the choice is made, it always deterministically maps to either food or water.

# Probability functions

A probability function is a function that assigns a numerical value between 0 and 1, to the outcome of a random variable. It says "what is the probability (which is between 0 and 1) that the random variable takes on a certain value". Going with our running example we'll denote $C$ as the random variable of the mouses choice. Then, if we want to denote the probability of getting water, we'd write 

$$P(C = w)$$

# Distributions of random variables

The distribution of a random variable specifies the probabilities of **all** events associated with the random variable. There are commonly known distributions that you'll get to know in the next notebook. For now, we'll make our own distribution using our running example. 


Recall that our sample space consists of two sets of events: 

$$\mathbb{S} = \big(f = \{oats, seeds, apples\}, w = \{lemon, strawberry, cucumber\} \big)$$


assume that all events, across sets have equal probability. An axiom of probability is that distributions must sum to 1. Using this fact, what is the probability of: 

$$P(C = \{f=oats\})$$

Write your answer as code comment in the next cell

In [2]:
"""
""";

What is the probability of 

$$P(C = w)$$

In [3]:
"""
""";

# Independance & Joint probability
In probability, if two events are independent then knowing one event occurred does not influence the probability of the other occuring and vice versa. If we consider the probability of two events, we say we are condisering their **joint probability** denoted as: 

$$P(f, w)$$


if the events are independent then: 

$$P(f, w) = P(f)P(w)$$

recall how the experiment for our running example works, are $P(f)$ and $P(w)$ independet? That is, can we write $P(f, w) = P(f)P(w)$ ? 


Write your answer in the code comment in the next cell, explain why

In [None]:
"""
""";

# Conditional probability

Conditional probabilities are written as $P(f|w)$ the way to read this is "probability of $f$, conditioned on $w$. That is, if we have informaion abou $w$, what is the probability of $f$?

One way to define conditional probability is: 

$$P(f|w) = \frac{P(f, w)}{P(w)}$$

Use the information above to compute $P(f|w)$

write your answer in the code block below as a comment

In [None]:
"""
""";

What is the following conditional probability? 

$$P(lemon | w)$$

Write your answer in the code comment in the next cell, explain why

In [4]:
"""
""";

# Two most important operations in probability

### Product rule:

$$P(a, b) = P(a|b) P(b) = P(b|a) P(a)$$

Above we show for 2 variables, here's another example for 3 variables:

$$P(a, b, c) = P(a|b, c) P(b, c) = P(a|b,c)P(b|c)P(c)$$

We can generalize this for $n$ variables inside of $P(\cdot)$ once we have decided on an ordering of the variables.

### Law of total probability :

$$P(a) = \sum_{b_i \in b} P(a, b)$$

above we assume b is discrete. If it is continuous then we just replace the sum with another summation technique: 

$$P(a) = \int P(a, b) \mathop{db}$$