# Chapter 4: Discrete Random Variables

* `Discrete data`: Data you can count.
* `Random variable`: describes the outcomes of a statistical experiment in words.
    * The values of a random variable can vary with each repetition of an experiment.

## Random Variable Notation
* **Upper case** letters such as $X$ or $Y$ denote a random variable. 
* **Lower case** letters such as $x$ or $y$ denote the _value_ of a random variable.
> Said anothery way, $X$, $Y$ are given as words, whereas $x$, $y$ are given as a number.

##### <span style="color:orange">Example:</span>
Let:
* $X$ = the number of heads you get when you toss three fair coints.
* The sample Space for the toss of three fiar coins is:
    * TTT; THH; HTH; HHT; HTT; THT; TTH; HHH

Then:
* $x$ = 0, 1, 2, 3

Notice $X$ is in words and $x$ is a number.  $x$ values are countable outcomes.

In [3]:
flip_options = 'HT'
list(combinations_with_replacement(flip_options, 3))

[('H', 'H', 'H'), ('H', 'H', 'T'), ('H', 'T', 'T'), ('T', 'T', 'T')]

## 4.1 Probability Distribution Function (PDF) for a Discrete Random Variable
Two Characteristics:
1. Each probability is between zero and one, inclusive.
2. The sum of the probabilities is one.

##### <span style="color:orange">Example 4.1:</span>
$P(x)$ = probability that $X$ takes on a value of $x$.

|$x$|$P(x)$|
|--|--|
|0|$P(x=0)=\frac{2}{50}$|
|1|$P(x=1)=\frac{11}{50}$|
|2|$P(x=2)=\frac{23}{50}$|
|3|$P(x=3)=\frac{9}{50}$|
|4|$P(x=4)=\frac{4}{50}$|
|5|$P(x=5)=\frac{1}{50}$|

$X$ takes on the values 0, 1, 2, 3, 4, 5.  This is a discrete PDF because:
* Each $P(x)$ is between **zero** and **one**, inclusive.
* The sum of the probabilities is **one**, that is,
    * $\frac{2}{50} + \frac{11}{50} +\frac{23}{50} +\frac{9}{50} +\frac{4}{50} +\frac{1}{50} = 1$


## 4.2 Mean or Expected Value and Standard Deviation
* `Expected Value`: often referred to as the **"long-term" average or mean**.  This means that over the long term of doing an experiment over and over, you would **expect** this average.
* `Probability`: does not describe the short-term results of an experiment. It gives information about what can be expected in the long term (illustrating the **Law of Large Numbers**).
* `The Law of Large Numbers`: states that, as the number of trials in a probability experiment increases, the difference between the theoretical probability of an event and the relative frequency approaches zero (**the theorettical probability and the relative frequency get closer and closer together**).
* $\mu$ : the **mean** or **expected value** of the experiment is denoted by the Greek letter $\mu$. After conducting many trials of an experiment, you would expect this average value.

> Note: To find the expected value or long germ average, $\mu$, simply multiply each value of the random variable by its probability and add the products.


##### <span style="color:orange">Example 4.3:</span>

A men's soccer team plays soccer zero, one, or two days a week. The probability that they play zero days is 0.2,
the probability that they play one day is 0.5, and the probability that they play two days is 0.3. Find the long-term
average or expected value, $\mu$, of the number of days per week the men's soccer team plays soccer.

To solve this, let:
* the random variable $X$ = the number of days the men's soccer team plays soccer per week.
* $X$ takes on the values 0, 1, 2
* Construct a PDF adding a column $x*P(x)$, in this column multiply each $x$ value by its probability.

##### The Expected Value Table:
|$x$|$P(x)$|$x*P(x)$|
|--|--|--|
|0|0.2|(0)(0.2)=0|
|1|0.5|(1)(0.5)=0.5|
|2|0.3|(2)(0.3)=0.6|

The _long term average_ or _expected value_ is 0 + 0.5 + 0.6 = 1.1

The number 1.1 is the long-term average or expected value if the men's soccer team plays soccer week after week after week.

We say $\mu=1.1$

##### <span style="color:orange">Example 4.4:</span>
Find the expected value of the number of times a newborn baby's crying wakes its mother after midnight.
The expected value is the expected number of times per week a newborn baby's crying wakes its mother after
midnight. Calculate the standard deviation of the variable as well.

|$x$|$P(x)$|$x*P(x)$|$(x-\mu)^2 \cdot P(x)$|
|--|--|--|--|
|0|$P(x=0)=\frac{2}{50}$|$(0) (\frac{2}{50} ) = 0$|$(0-2.1)^2 \cdot 0.04 = 0.1764$|
|1|$P(x=1)=\frac{11}{50}$|$(1) (\frac{11}{50} ) = \frac{11}{50}$|$(1-2.1)^2 \cdot 022. = 0.2662$|
|2|$P(x=2)=\frac{23}{50}$|$(2) (\frac{23}{50} ) = \frac{46}{50}$|$(2-2.1)^2 \cdot 0.46 = 0.0046$|
|3|$P(x=3)=\frac{9}{50}$|$(3) (\frac{9}{50} ) = \frac{27}{50}$|$(3-2.1)^2 \cdot 0.18 = 0.1458$|
|4|$P(x=4)=\frac{4}{50}$|$(4) (\frac{4}{50} ) = \frac{16}{50}$|$(4-2.1)^2 \cdot 0.08 = 0.2888$|
|5|$P(x=5)=\frac{1}{50}$|$(5) (\frac{1}{50} ) = \frac{5}{50}$|$(5-2.1)^2 \cdot 0.02 = 0.1682$|

Having added the values in the third column of the table to find the expected value of $X$:

$\mu$ = Expected Value = $\frac{105}{50}=2.1$

Use $\mu$ to complete the table.  The fourth column of this table will provide the values you need to calculate the standard deviation.  For each value $x$, multiply the square of its _deviation_ by its _probability_.  (Each deviation has the format $x-\mu$).

Add the values in the fourth column of the table:

0.1764 + 0.2662 + 0.0056 + 0.1458 + 0.2888 + 0.1682 = 1.05

The _standard deviation_ of $X$ is the square root of this sum: $\sigma = \sqrt{1.05} \approx 1.0247$