### Terminology use in inference

#### Random Variable (r.v.)

$ S = \{s_1, s_2, ..., s_n \} $

Where $ n $ is the number of simple events, then $ S $ is the entire sample space.

A random variable is:

$ X : S \to \mathbb{R} $

Then $ X $ maps a simple event to a real value.

Then a certain mapping of a simple event is $ x_i = X(s_i) $

A random variable cannot be certain.

#### Discrete r.v.

An r.v. is discrete if it assumes a non-countable set of values.

#### Continuous r.v.

An r.v. is continuous if it assumes a finite or countably infinite number of possible values. This is usually anything we measure.

Since the values can't be counted then the probability that the r.v. takes on a particular value is always 0.

$ P(X = a) = 0 $

However the probability that it lies within an interval can be non-zero.

And these are all equivalent:

$ P(a \le X \le b) $

$ P(a < X \le b) $

$ P(a \le X < b) $

$ P(a < X < b) $

But this fact is generally not true for discrete r.v.

#### Random Variable (distributed)

If a random variable $ X $ is distributed by a certain distribution, $ D $, then:

$ X \sim D $

#### Probability Mass Function (pmf)

Concerning a discrete r.v.

$ p(x) = P(X = x) $

|      |     |     |     |     | 
|------|-----|-----|-----|-----|
| X    | x_1 | x_2 | ... | x_n |
| P(x) | p_1 | p_2 | ... | p_n |

In other words, the probability that $ X $ takes the value $ x $.

This is not the same as how we consider probabilities for continuous random variables.

The pmf has these properties:

- $ p(x) \ge 0 $ for each $ x $

- The sum of all $ p(x) = 1 $

#### Probability Density Function (pdf)

Concerning a continuous r.v.

_Denoted by $ f(x) $ as opposed to $ F(x) $ for the CDF._

To determine the probability that a r.v. is within (a, b) then we calculate the area under the curve of the pdf.

$ (a \le X \le b) = \int_a^b f(x) dx $

Where $ f(x) $ is the pdf.

_A pdf must have these properties:_

$ f(x) \ge 0 $ for all $ x $

$ \int_\infty^\infty f(x) dx = 1 $ 

So we can intuitively think of this as the pdf having infinite tails but not having $ f(x) \ge 0 $ for entire interval. For some pdfs the range of integration is much smaller than the real number line.

#### Sample density histogram (sdh)

We can approximate the PDF by using a sample density histogram. This is the frequencies of results on the Y axis and its spread-outness should give you an idea of the probability density.

To approximate the pdf, we should multiply a histogram bar height by its width (difference between intervals of x) to get an approximation for the area under the curve.

#### Cumulative Distribution Function (cdf) (DISCRETE)

$ F(x) = P(X \le x) $

In other words, it's the probability that a random event $ X $ (CAN IT BE DISCRETE?) is less than or equal to $ x $.

The graph of the cdf should be a graph, where each $ p_i $ is the value of a step towards the upper limit of 1.

Intuitively, we add up the discrete values.

#### Cumulative Distribution Function (cdf) (CONTINUOUS)

_Denoted by $ F(x) $ as opposed to $ f(x) $ for the PDF._

$ F(x) = P(X \le x) $

And therefore it's conceptually the same as for discrete r.v. except for how we add up the values:

$ F(x) = \int_{-\infty}^x f(t) dt $

$ \frac{d}{dx}(F(x)) = f(t) $

#### Relationship between a pmf and its cdf

_Need to check, this is for discrete r.v. only, right?_

Given two fixed points $ x_1 < x_2 $

$ P(x_1 < X \le x_2) = F(x_2) - F(x_1) $

Think about this one above first. Then:

$ P(x_1 < X < x_2) = F(x_2 -) - F(x_1) $

$ P(x_1 \le X \le x_2) = F(x_2) - F(x_1 -) $

$ P(x_1 \le X < x_2) = F(x_2 -) - F(x_1 -) $

Where $ F(a -) $ refers to the limit approaching from the left.

#### Relationship between a pdf and its cdf 

$ P(a < X < b) = F(b) - F(a) = \int_a^b f(x) dx $

#### Mean

This is typically the mean taken from the sample. We compare this to the expected value.

#### Expected value (discrete r.v.)

"mean of the random variable" or $ \mu $

This is the mean of a probability distribution. This is what we would expect to see before any data is collected.

Let:

$ X $ be a discrete r.v.

$ p(x) $ is the probability mass function of $ X $

$ \mu = E(X) = \sum_i x_i p(x_i) = \sum_i x_i p_i $

For example, if there is a game where you could win or lose then the payoffs are:

| result | p   |
|--------|-----|
| +1     | 0.2 |
| +2     | 0.3 |
| -2     | 0.2 |
| -1     | 0.3 |

Then:

$ E(X) = (1)(0.2) + (2)(0.3) + (-2)(0.2) + (-1)(0.3) $

$ E(X) = 0.2 + 0.6 - 0.4 - 0.3 = 0.1 $ is the expected payoff

#### Expected value (continuous r.v.)

"mean of the random variable" or $ \mu $

$ X $ be a discrete r.v.

$ f(x) $ is the probability density function of $ X $


$ \mu = E(X) = \int_{-\infty}^\infty x f(x) dx $

#### Expected value with a constant

For any random variable $ X $ and a constant $ a $

$ E(X + a) = E(X) + a $

$ E(Xa) = aE(X) $

#### Expected values and linear combinations

If you have sequences:

$ \{ X_1, X_2, ..., X_n \} $ random variables

$ \{ a_1, a_2, ..., a_n \} $ constantes

Then:

$ E(X) = E(a_1 X_1 + a_2 X_2 + ... + a_n X_n) =  a_1E(X_1) + a_2E(X_2) + ... + a_nE(X_n) ) $

Then $ E(X) $ is the summation of all random variables, but $ X $ is not a variable itself, I don't think.

#### Variance of a discrete r.v.

This is the measure of dispersion of $ X $ from $ E(X) $

This is:

$ \sigma^2 = Var(X) = E((X - \mu)^2) = \sum_i (X - \mu)^2 p(x_i) $

#### Variance of a continuous r.v.

$ Var(X) = E((X - \mu)^2) = \int_{-\infty}^\infty (X - \mu)^2 f(x) dx $

$ \sigma^2 = \int_{-\infty}^\infty x^2 f(x) dx - \mu^2$

#### Variance with a constant

For any random variable $ X $ and a constant $ a $

$ Var(X + a) = Var(X) $

$ Var(Xa) = a^2 Var(X) $

If you have sequences:

$ \{ X_1, X_2, ..., X_n \} $ random variables

$ \{ a_1, a_2, ..., a_n \} $ constantes

Then:

$ Var(a_1 X_1 + a_2 X_2 + ... + a_n X_n) =  a_1^2 Var(X_1) + a_2^2 Var(X_2) + ... + a_n^2 Var(X_n) ) $

#### Variance of a sum of two random variables

Let:

$ X, Y $ be random variables.

$ Var(X, Y) = Var(X) + Var(Y) + 2Cov(X,Y) $

If $X, Y$ are independent then $ Cov(X, Y) = 0 $

Then:

$ Var(X, Y) = Var(X) + Var(Y) $

#### Standard deviation

$ \sigma = \sqrt{Var(X)} $


#### Expected value of an arbitrary function (discrete r.v.)

$ E(g(X)) = \sum_i g(x_i) p(x_i) $

#### Expected value of an arbitrary function (continuous r.v.)

$ E(g(X)) = \int_{-\infty}^\infty g(x) f(x) dx $

#### Skewness (discrete r.v.)

Using the result above, we can use this to find the skewness of a pmf.

$ g(x) = \frac{(x-\mu)^3}{\sigma^3} $

$ \gamma = E(g(X)) = \sum_i g(x_i) p(x_i) $

$ \gamma = \sum_i \frac{(x-\mu)^3}{\sigma^3} p(x_i) $

$ \gamma < 0 $ indicates the pmf has a negative skew.

$ \gamma = 0 $ indicates the pmf has no skew (symmetric).

$ \gamma > 0 $ indicates the pmf has a positive skew.

#### Skewness (continuous r.v.)

Using the $ E(X) $ from above, we can use this to find the skewness of a pdf.

$ g(x) = \frac{(x-\mu)^3}{\sigma^3} $

This is the same as for the discrete r.v. except,

$ \gamma = E(g(X)) = \frac{\int_{-\infty}^\infty (x-\mu)^3 f(x) dx}{\sigma^3} $

#### Covariance

Let: 

$ X, Y $ be two random variables

$ \mu_X, \mu_Y $ be their mean respectively, and here we mean the expected values.

$ Cov(X, Y) = E((X - \mu_X)(Y - \mu_Y)) $

If $ X, Y $ are independent then $ Cov(X, Y) = 0 $

But:

$ Cov(X, Y) = 0 $ does not imply that $ X, Y $ are independent.