# Expected Value

Let's start by showing how we can use the idea of the average of a data set to build a similar concept for a random variable. 
Let $X$ be a  discrete random variable that takes on values from a finite range $\operatorname{Range}(X) = \{ a_0, a_1, \ldots, a_{k-1} \}$.  Let the PMF of $X$ be denoted by $p_X(x)$. 

Now suppose we have $n$ random values sample from this distribution, 
$x_0, x_1, \ldots, x_{n-1}$.  Then the  average of the data is 
```{math}
:label: average
 \overline{x} = \frac{1}{n} \sum_{i=1}^{n} x_i.
```
We would like to find a similar average for $X$, where we do not have to sample values from the distribution of $X$. We will call this statistic for $X$ an *ensemble average* because it is computed over the ensemble of potential values that $X$ takes on, and is computed from the distribution of $X$. 

We can use *relative frequency* to connect the average of the sample to the ensemble average. Note that in {eq}`average`, some of the sample values $x_i$ may actually be the same number. For instance, the Range of $X$ may only be 10, but we may draw 100 samples values, meaning that at least one of those 100 sample values must be repeated. For each possible value $a_k$, let $n_k$ be the number of time $a_k$ appears in the sample $x_0, x_1, \ldots, x_{n-1}$. The total contribution of all the terms with value $a_k$ to the sum in {eq}`average` is then $n_k \cdot a_k$.
Then we can rewrite {eq}`average` as
```{math}
:label: average2
 \overline{x} = \frac{1}{n} \sum_{i=1}^{n} x_i 
 = \frac{1}{n} \sum_{i=0}^{k-1}  n_i\cdot a_i.
```
Let's take the factor $1/n$ inside the summation in {eq}`average2` to yield
```{math}
:label: average3
 \overline{x} 
 = \sum_{i=0}^{k-1}  a_k \left(\frac{ n_k}{n} \right).
```
Note that $n_k/n$ is the relative frequency of outcome $k$. If the experiment possesses statistical regularity, then as $n \rightarrow \infty$, 

$$
\lim_{n \rightarrow \infty} \frac{n_k}{n}  = p_X(k), 
$$ 
where $p_X(k)$ is the probability of outcome $k$.
Applying this to {eq}`average3` and moving the limit inside the summation yields

$$
\lim_{n \rightarrow \infty} \overline{x}  = 
\sum_{i=0}^{k-1}  a_k p_X(k) .
$$
In the limit, the average converges to a value that does not depend on the data sample from the distribution of $X$ but instead depends directly on the distribution of $X$ through $p_X(x)$.

We use this approach to define the *expected value* or *mean* of $X$:

````{card}
DEFINITION
^^^
```{glossary}
expected value, discrete random variable
    The expected value, or ensemble mean, is denoted by $E[X]$ or by $\mu_X$ and is given by 
      \begin{equation*}
      \mu_X = E \left[ X \right] = \sum x P_X (x). 
      \end{equation*}
```
````




Continuous random variables do not have PMF, and our arguments regarding convergence of the sample average do not apply in the same way. If $X$ is a continuous random variable, then $\mu_X=E[X]$ is defined as follows:
````{card}
DEFINITION
^^^
```{glossary}
expected value, continuous random variable
    The expected value, or ensemble mean, is denoted by $E[X]$ or by $\mu_X$ and is given by 
      \begin{equation*}
      \mu_X = E \left[ X \right] = 
      \int_{-\infty}^{\infty} x f_X (x) ~dx.
      \end{equation*}
```
````

```{note}

There are some special cases where $E[X]$ may not be defined. Such cases are outside the scope of this book.  

In some cases, $E[X]$ may be defined and still be infinite.
```

The concept of expected value is broader than just the mean. For a random variable $X$, the mean is defined above and is $\mu_X=E[X]$. But we compute expected values for functions of $X$, like $E[X^2]$ or $E[(X- \mu_X)^2]$.


##  Why do we care about the mean?

There are several reasons we care about the mean.
1. As we already saw, the limit of the average value is the mean for most experiments. 
<!-- 
In fact, we will show that we can determine a limit on the
  number of times the experiment must be repeated to ensure that the
  average is within a range around the mean with a specified
  probability \pause (Chebyshev's inequality, covered later)
-->
2. If we wish to use a constant value to estimate a random
  variable, then the mean is the value that minimizes the mean-square error. 
3. The mean is commonly used as a parameterization of distributions.  


## Examples


**Rolling a fair 6-sided die**
      
Let $D$ be a random variable whose value is the top face when a fair 6-sided die is rolled. Then the PMF of $D$ is 

$$
p_D(d) = 
\begin{cases}
\frac 1 6, & d =1,2,3,4,5,6 \\
0, & \mbox{o.w.}
\end{cases}
$$

Then the mean of $D$ is 

\begin{align*}
E[D] &= \sum_{d=1}^{6} d \cdot p_D(d) \\
&= \sum_{d=1}^{6} d \cdot \frac 1 6, \\
\end{align*}
which is

In [13]:
mu_d = 0

## Be careful! This sum starts at 1 and includes 6. Since
## the upper limit of a range is not included, we need to 
## set the upper limit of the range to 7
for d in range(1, 7):
  mu_d += d* (1/6)
  
print(f'E[D] = {mu_d}')

E[D] = 3.5


**Bernoulli Random Variable**

This may seem like a trivial example, but it will be used to demonstrate an important property of expected values. From {doc}`Section 8.4.2<../08-random-variables/important-discrete-rvs>`, the PMF of a Bernoulli random variable B with probability of success $p$ is 
\begin{equation*}
p_B(b) = 
\begin{cases}
1-p, & b = 0 \\
p, & b =1 \\
0, & \mbox{o.w.}
\end{cases}
\end{equation*}

Then $E[B]$ is

\begin{align*}
E[B] & = \sum_{b=0}^{1} b p_B(b) \\
&= 0 \cdot (1-p) + 1 \cdot (p) \\
& = p
\end{align*}

This simple result can help us find the expected value of the Bernoulli random variable, which has a much more complicated PMF. To do that, we need to know more about the properties of expected value.

## Properties of Expected Value

**1. Expected value of a constant is that constant.**

A constant $c$ can be treated as a discrete random variable with all of its probability mass at $c$:

$$
p_C(x) = 
\begin{cases}
1, & x=c \\
0, & x \ne c
\end{cases}.
$$
Then we can find the expected value of the constant as

$$
E[c] = \sum_{x=c} x p_X(x) = c(1) = c.
$$

**2. Expected value is a linear operator.**

If $X$ and $Y$ are random variables, 
      and $a$ and $b$ are arbitrary constants, then
      
\begin{equation*}
E[aX +bY] = aE[X] +bE[Y]
\end{equation*}

*Note that this result holds regardless of whether $X$ and $Y$ are independent.*

This result generalizes easily, so if $X_i, ~i = 0, 1, \ldots, N-1$ are random variables and $a_i, ~i = 0,1, \ldots, N-1$ are arbitrary constants, then 
\begin{equation*}
E\left[ \sum_{i=0}^{N-1} a_i X_i \right] = \sum_{i=0}^{N-1} a_i E \left[ X_i\right].
\end{equation*}

    



## Example: Expected Value of Binomial RV

Suppose we want to find the formula for the mean of a general
Binomial random variable with $N$ trials with probability of success $p$. Let $X$ denote this random variable. We now know two ways to find $E[X]$ analytically.

**1.** We can write an equation for the mean using the values and the PMF, where the PMF is 

$$
p_X(x) = 
\begin{cases}
\binom{N}{x} p^x (1-p)^{N-x}, & x = 0, 1, \ldots, N\\
0, & \mbox{otherwise}
\end{cases}
$$

Then

$$
E[X] = \sum_{x=0}^{N} x \cdot \binom{N}{x} p^x (1-p)^{N-x}.
$$
This can be manipulated into a very simple final result by expanding the binomial coefficient and then canceling factors, or we could solve this using Python for specific values of $N$ and $p$. However, there is a simpler way.

**2.** Recall from {doc}`Section 8.4.3<../08-random-variables/important-discrete-rvs>`) that we can think of a Binomial$(N,p)$ random variable as the sum of $N$ independent Bernoulli$(p)$ random variables. Then we can use the fact that expected value is a linear operator to find the mean quickly.

Let $B_i,~~~ i=1,2,\ldots, N$ be the  Bernoulli$(p)$ random variables. Then
\begin{align*}
E[X] &= E \left[ \sum_{x=0}^{N}  B_i \right] \\
&= \sum_{x=0}^{N} E \left[  B_i \right]  ~~~\mbox{(by linearity)}\\
&= \sum_{x=0}^{N} (p) ~~~\mbox{(Using mean of Bernoulli RV)} \\
&= Np.
\end{align*}


Note that SciPy.stats distributions have a `mean()` method. So, if we have a Binomial(100, 0.25)  random variable, we can find its mean using SciPy.stats as follows:

In [11]:
import scipy.stats as stats
X = stats.binom(100, 0.25)
print(f'E[X] = {X.mean()}')

E[X] = 25.0


The results match our formula, $E[X] = Np = (100)(0.25) =25$.

Continuous random variables require integration to find the mean, which can sometimes be complicated and introduce errors in calculation. To ease the burden of doing calculus, in the next section we show how to use the SymPy library to do calculus and use SymPy to evaluate the expected value of a continuous random variable.