### Sample Space and Events

>The **sample space S** of an **experiment** is the `set of all possible outcomes` of the `experiment`. 

>An **event A** is a (`condition`) `subset of the sample space S`, and we say that A occurred if the actual outcome is in A.

<img src="images/sample-space-and-events.png" width="500" align="center">

#### For example: 

>**The experiment** such that `a coin is flipped 10 times.`  (Writing `Heads as H and Tails as T`  )   

>**The sample space** of the **experiment** is the `set of all possible strings of length 10 of H’s and T’s`  

>`A possible outcome (pebble) is HHHTHHTTHT`  

We can **encode** `H as 1 and T as 0,` so that an outcome is a `sequence` (s1 , . . . , s10) where sj belongs to {0, 1} for 2 <= j <= 10},  

>**The event A1** such that (with the `condition`) the `first flip is Head`  

>**A1 is a set**, A1 = {(1, s2 , . . . , s10 ) : sj belongs to {0, 1} for 2 <= j <= 10}.

>**A1 is a subset of the sample space S**, so `it is indeed an event`   

Saying that A1 **occurs** is the same thing as saying that **the first flip is Heads**.  

<img src="images/sentence-to-sets.png" width="500" align="center">

### Probabilities

>**Function** that **takes an event** and `assigns to it a real number between 0 and 1`

<img src="images/events-and-probabilities.png" width="500" align="center">

>`Events are sets` while `Probabilities are numbers`  

>`Before the experiment` is done, we generally don’t know `whether or not a particular event will occur` (happen)  
So `we assign it a probability of happening`, using a `probability function P` .

### Why probabilities

> It is a function indeed (as a random variable) but it's meant to `quantify the uncertainty of each posible event (outcome)` of the sample space.

### Random Variables

```
Expressing compound events in terms of their single notation can become unnecesarelly overwhelming. Taking the example of the coin flips we could ask something like *the odd number of heads after n flips with two coins*. So, to instead of defining this event in terms of the basic event `Aj`, we use random variables
```

>Given an **experiment with sample space S**, a random variable (r.v.) is a **function** `from the sample space S to the real numbers R`  

>A `random variable X` assigns a `numerical value X(s)` to each possible `outcome "s" of the experiment`.

```
For a sample space with a finite number of outcomes we can visualize the outcomes as pebbles, with the mass of a pebble corresponding to its probability, such that the total mass of the pebbles is 1. A random variable simply labels each pebble with a number.
```

<img src="images/rv-mapping.png" width="500" align="center">

### Why random variables

> It is a function indeed (as a probability) but it's meant to `quantify results of any experiment`, but how's that usefull?

```
If we flip 5 coins and want to answers questions like:
1. What is the probability of getting exactly 3 heads?
2. What is the probability of getting less than 4 heads?
3. What is the probability of getting more than 1 head?

Then our general way of writing would be:

· P(Probability of getting exactly 3 heads when we flip a coin 5 times)
· P(Probability of getting less than 4 heads when we flip a coin 5 times)
· P(Probability of getting more than 1 head when we flip a coin 5 times)

But if we use random variables to represent above questions then we would write:
1. P(X=3)
2. P(X<4)
3. P(X>1)
```

### For example:

>`The experiment` where we `toss a fair coin twice`  

>`The sample space S` consists of the set of four possible **outcomes**: `S = {HH, HT, TH, TT }`  

>`The event A` (**subset of outcomes**) such that in both flips we get Heads: `A = { HH }`

>`The probability of each outcome` (**including those of the event**) can be represented as `the size of each pebble`

>`The random variable X` as the number of Heads.

>`Quantifying each outcome` we could express the probability of `the event A of getting two heads`  as `P(X=2)`  

>`Mapping any events into numbers` in order to express qualities (language) into quantities (numbers)  
`P(the event B of getting at least one head) === P(X>=1)`  


<img src="images/space-event-rv.png" width="500" align="center">

### Expected Value

>The intuition behind the expected value is the ways in which the mean can be expresed: as `arithmetic mean` <img src="images/arithmetic-mean.png" width="140" align="center"> or as `weighted mean` <img src="images/weighted-mean.png" width="270" align="center">

>The definition of `expectation for a discrete r.v.` is inspired by the `weighted mean` of a list of numbers, `with weights given by probabilities`.

>Then, the `expected value` is define as <img src="images/expected-value-formula.png" width="230" align="center">
>Note that `its value could be undefined if the sum diverges`.

### LOTUS a.k.a. Law of the unconscious statistician

>What the law says is that: <img src="images/lotus.png" width="270" align="center">. What this means is that we do not need to use g(x) as `P(X=g(x))` but rather we can almost replace *unconsciously* *x* as in the base distribution of `P(X=x)`  
>Note that `this will only work if g(x) is a linear function of the random variable X`.

### Moments

>`Moments define the shape of a function`.

>Let X be an r.v. with mean μ and variance sigma^2.  

>For any positive integer n, the nth moment of X is <img src="images/single-moment.png" width="60" align="center">  

>the nth central moment is <img src="images/central-moment.png" width="100" align="center">  

>the nth standarized moment is <img src="images/standar-moment.png" width="100" align="center">  

<img src="images/moments-analogy.png" width="600" align="center">  


### Moment Generating Functions

>The general idea behind a generating function is as follows: `starting with a sequence of numbers, create a continuous function —the generating function— that encodes the sequence`.  

>We then have all the `tools of calculus` at our disposal `for manipulating the generating function`.

>A m.g.f. `is a generating function that encodes the moments of a distribution`.

>MGFs are useful for three main reasons:  
1. for `computing moments` easier `with derivatives` (as an alternative to LOTUS that will use integrals)  
2. for `studying sums of independent r.v.s`  
3. and since `they fully determine the distribution` and `thus serve as an additional blueprint for a it`.

><div style="float: left">
<img src="images/mgf-def.png" width="700" align="left">
<img src="images/mgf-use.png" width="700" align="right">
</div>

### The big Picture

>The concepts can be seen tight together as in the following image:  

<img src="images/big-picture-X-rv.png" width="700" align="center">

### Resources

* [cheat sheet 1 - escential](https://stanford.edu/~shervine/teaching/cme-106/cheatsheet-probability)
* [cheat sheet 2 - summary](https://static1.squarespace.com/static/54bf3241e4b0f0d81bf7ff36/t/55e9494fe4b011aed10e48e5/1441352015658/probability_cheatsheet.pdf)
* [cheat sheet 3 - pd stories](http://web.cs.elte.hu/~mesti/valszam/kepletek)