## 1) Expected Value
Denoted ${\displaystyle E(X)}$ or ${\displaystyle E[X]}$, $\mathbb{E}[X]$ Indicates the "average" value of the random variable $X$. It  is a generalization of the weighted average, also known as the expectation, mathematical expectation, mean, average, or first moment.

Let ${\displaystyle X}$ be a random variable with a finite number of finite outcomes ${\displaystyle x_{1},x_{2},\ldots ,x_{k}}$ occurring with probabilities ${\displaystyle p_{1},p_{2},\ldots ,p_{k},}$ (**p.m.f**) respectively. The expectation of ${\displaystyle X}$ is defined as:

${\displaystyle \operatorname {E} [X]=\sum _{i=1}^{k}x_{i}\,p_{i}=x_{1}p_{1}+x_{2}p_{2}+\cdots +x_{k}p_{k}.}$


If ${\displaystyle X}$ is a random variable with a probability density function of ${\displaystyle f(x)}$, then the expected value is defined as the Lebesgue integral

${\displaystyle \operatorname {E} [X]=\int _{\mathbb {R} }xf(x)\,dx,}$

#### Example of expected value for a continuous random variable 
Image random variable $X$ has teh following p.d.f.


$f(x) = \left\{\begin{array}{l l} 
x, & \text{for}\ 0\leq x\leq 1 \\ 
2-x, & \text{for}\ 1< x\leq 2 \\ 
0, & \text{otherwise} 
\end{array}\right.\notag$


$\text{E}[X] = \int\limits^1_0\! x\cdot x\, dx + \int\limits^2_1\! x\cdot (2-x)\, dx = \int\limits^1_0\! x^2\, dx + \int\limits^2_1\! (2x - x^2)\, dx = \frac{1}{3} + \frac{2}{3} = 1.\notag$

### 1-1) Expected Value Properties

${\displaystyle {\begin{aligned}\operatorname {E} [X+Y]&=\operatorname {E} [X]+\operatorname {E} [Y]\end{aligned}}}$

$\operatorname {E}[aX]=a\operatorname {E}[X]$

If ${\displaystyle X\leq Y}$ , and both ${\displaystyle \operatorname {E} [X]}$ and ${\displaystyle \operatorname {E} [Y]}$ exist, then ${\displaystyle \operatorname {E} [X]\leq \operatorname {E} [Y]}$


In general, the expected value is not multiplicative, i.e. ${\displaystyle \operatorname {E} [XY]}$ is not necessarily equal to ${\displaystyle \operatorname {E} [X]\cdot \operatorname {E} [Y]}$. If ${\displaystyle X}$ and ${\displaystyle Y}$ are **independent**, then  ${\displaystyle \operatorname {E} [XY]=\operatorname {E} [X]\operatorname {E} [Y]}$. 


### 1-2) Expected Value of a Function $E[h(X)]$
Sometimes interest will focus on the expected value of some function
$h(X)$ rather than on just $E(X)$.

If the random variable $X$ has a set of possible values $D$ and pmf $p(x)$, then the expected value of any function $h(X)$, denoted by $E[h(X)]$ or $\mu_{h(X)}$:

$E[h(X)]=\sum_{D} h(x).p(x) $


For a continuous random variable $X$  with p.d.f $p(x)$, the expected value of the function $h(x)$:

$\mathbb{E}[h(X)]=\int_x h(x) \cdot p(x) \ dx$

### Subscript Notation in Expected Value of a Function $\mathbb{E}_{p(\mathbf{x};\mathbf{\theta})}[f(\mathbf{x};\mathbf{\phi})]$

In the following $f(\mathbf{x};\mathbf{\phi})$ is a function of random variable $X$ and $p(\mathbf{x};\mathbf{\theta})$ is the p.d.f:


$\mathbb{E}_{p(\mathbf{x};\mathbf{\theta})}[f(\mathbf{x};\mathbf{\phi})] = \int p(\mathbf{x};\mathbf{\theta}) f(\mathbf{x};\mathbf{\phi}) d\mathbf{x}$


The above can be written as: 

$\mathbb{E}_{\mathbf{x}}[f(\mathbf{x};\mathbf{\phi})]$

$\mathbf{x} \sim p(\mathbf{x};\mathbf{\theta})$


#### Example of expected value of a function for a discrete random variable
A computer store has purchased three computers of a certain type at 500 USD apiece. It will sell them for 1000 USD apiece. The manufacturer has agreed to repurchase any computers still unsold after a specified period at 200 USD apiece.
Let $X$ denote the number of computers sold, and suppose that: 
- $p(0) =0.1$ 
- $p(1) =0.2$ 
- $p(2) =0.3$
- $p(3) =0.4$

With $h(X)$ denoting the profit associated with selling $X$ units, the given information implies that:

$h(X)= revenue-cost$

$cost=3*500$

$revenue=1000 \times x + 200\times(3 - x)$

The expected profit is then:

$E[h(X)]=p(0)\times h(0) +p(1)\times h(1) +p(2)\times h(2) +p(3)\times h(3)= (-900)(.1) + (- 100)(.2) + (700)(.3) + (1500)(.4)=700$



Refs:[1](https://www.stat.purdue.edu/~zhanghao/STAT511/handout/Stt511%20Sec3.3.pdf)

#### Example of expected value of a function for a continuous random variable

$h(x)=x^2$

$\mathbb{E}[h(X)]=\int_{-\infty} ^\infty  h(x) \cdot p(x) \ dx=\int_{-\infty} ^\infty x^2 \cdot p(x) \ dx$


### 1-3) Conditional expectation $\operatorname {E} (X\mid Y=y)$
Conditional expectation value, is its expected value (the value it would take "on average" over an arbitrarily large number of occurrences) given that a certain set of "conditions" is known to occur. 

Depending on the context, the conditional expectation can be either a random variable  ${\displaystyle E(X\mid Y)}$
or a function ${\displaystyle E(X\mid Y=y)}$ or ${\displaystyle E(X\mid Y)=f(Y)}$.


Note: The conditional expected values $E( X | Y )$ is a random variable whose value depend on the value of $Y$. Note that the conditional expected value of $X$ given the event $Y = y$ is a function of $y$. If we write $E( X | Y = y) = f(y)$ then the random variable $E( X | Y )$ is $f(Y)$.



### 1-3-1) Conditional expectation for discrete random variables

${\displaystyle {\begin{aligned}\operatorname {E} (X\mid Y=y)&=\sum _{x}xP(X=x\mid Y=y)\\&=\sum _{x}x{\frac {P(X=x,Y=y)}{P(Y=y)}}\end{aligned}}}$

${\displaystyle P(X=x,Y=y)}$ is the **joint probability mass function** of $X$ and $Y$.


The joint probability mass function of two discrete random variables ${\displaystyle X,Y}$ is:

${\displaystyle p_{X,Y}(x,y)=\mathrm {P} (X=x\ \mathrm {and} \ Y=y)}$

#### Example

Consider the roll of a fair die and let $A = 1$ if the number is even (i.e., 2, 4, or 6) and $A = 0$ otherwise. Furthermore, let $B = 1$ if the number is prime (i.e., 2, 3, or 5) and $B = 0$ otherwise.


|   |1	|2	|3	|4	|5	|6  |
|---|---|---|---|---|---|---|
|A	|0	|1	|0	|1	|0	|1  |
|B	|0	|1	|1	|0	|1	|0  |


1. The unconditional expectation of $A$ is ${\displaystyle E[A]=(0+1+0+1+0+1)/6=1/2}$
2. The expectation of A conditional on $B = 1$ (i.e., conditional on the die roll being 2, 3, or 5) is ${\displaystyle E[A\mid B=1]=(1+0+0)/3=1/3}$
3. The expectation of A conditional on $B = 0$ (i.e., conditional on the die roll being 1, 4, or 6) is ${\displaystyle E[A\mid B=0]=(0+1+1)/3=2/3}$

### 1-3-2) Conditional expectation for continuous random variables

Suppose $X$ and $Y$ are continuous random variables with:
- $f(x,y)$: joint probability density function (joint p.d.f)
- $f_X(x)=\int_{-\infty}^\infty f(x,y)dy$: Marginal probability density functions of $x$, 
- $f_Y(y)=\int_{-\infty}^\infty f(x,y)dx$: Marginal probability density functions of $y$.
- $f_{Y|X}(y|x)=\dfrac{f(x,y)}{f_X(x)}$: The conditional probability density function of $Y$ given $X=x$ 


$ \operatorname {E} (Y\mid X=x)=\int _{-\infty }^{\infty }yf_{Y|X}(x,y)\mathrm {d} y=\int _{-\infty }^{\infty }{\frac {yf_{X,Y}(x,y)}{f_{X}(x)}}\mathrm {d} y\}$

Refs: [1](https://online.stat.psu.edu/stat414/lesson/20/20.2)

## 1-4) Expectation of random variables from the joint distribution $E(X)$
### 1-4-1) Expectations of  continuous random variables from the joint p.d.f
The expected value of a continuous random variable $X$ can be found from the joint p.d.f of $X$ and $Y$ by:

$E(X)=\int_{-\infty}^\infty \int_{-\infty}^\infty xf(x,y)dxdy$

$E(Y)=\int_{-\infty}^\infty \int_{-\infty}^\infty yf(x,y)dydx$


Where $f(x,y)$  is the p.d.f of  $X$ and $Y$ 

Let $X$ and $Y$ have joint probability density function:


#### Example

$f(x,y) = \left\{\begin{array}{l l} 
4xy &  0<x<1 , 0<y<1 \\ 
0 & \text{otherwise}. 
\end{array}\right.\notag$

The expected value of $X$  is  as is $\frac{2}{3}$ found here:


$E(X)=\int_{-\infty}^\infty \int_{-\infty}^\infty xf(x,y)dxdy=\int_{0}^{1} \int_{0}^{1} (x)(4xy)dxdy=\frac{2}{3}$ 


## 1-5) Expectations of Functions of Jointly Distributed  ${E}[g(X,Y)]$
### 1-5-1) Discrete Random Variables

Suppose that  $X$  and  $Y$  are jointly distributed discrete random variables with joint **p.m.f**  $p(x,y)$.

If  $g(X,Y)$  is a function of these two random variables, then its expected value is given by the following:

$\text{E}[g(X,Y)] = \mathop{\sum\sum}_{(x,y)}g(x,y)p(x,y).\notag$

#### Example


We toss a fair coin three times and record the sequence of heads  $(h)$  and tails  $(t)$. Random variable  $X$  denote the number of heads obtained and random variable  $Y$  denote the winnings earned in a single play of a game with the following rules

- $\$1$ if first  $h$  occurs on the first toss
- $\$2$ if first $h$ occurs on the second toss
- $\$3$ if first $h$ occurs on the third toss
- $\$-1$ if no $h$ occur


Note that the possible values of $X$ are  $x=0,1,2,3$ , and the possible values of  $Y$  are  $y=−1,1,2,3$. 


Joint pmf of $X$ and $Y$
<table>
    <thead>
        <tr>
            <th>p (x,y)</th>
            <th  colspan="4" rowspan="1" scope="row">\(X\)</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <th  scope="row">$Y$</th>
            <th >0</th>
            <th >1</th>
            <th >2</th>
            <th >3</th>
        </tr>
        <tr>
            <th  scope="row">-1</th>
            <td ><span  >1/8</span></td>
            <td >0</td>
            <td >0</td>
            <td >0</td>
        </tr>
        <tr>
            <th  scope="row">1</th>
            <td >0</td>
            <td ><span  >1/8</span></td>
            <td ><span  >2/8</span></td>
            <td ><span  >1/8</span></td>
        </tr>
        <tr>
            <th  scope="row">2</th>
            <td >0</td>
            <td ><span  >1/8</span></td>
            <td ><span  >1/8</span></td>
            <td >0</td>
        </tr>
        <tr>
            <th  scope="row">3</th>
            <td >0</td>
            <td ><span  >1/8</span></td>
            <td >0</td>
            <td >0</td>
        </tr>
    </tbody>
</table>




1. If, we define  $g(x,y)=xy$ , and compute the expected value of  $XY$ :
$\begin{align*} 
    \text{E}[XY] = \mathop{\sum\sum}_{(x,y)}xy\cdot p(x,y) &= (0)(-1)\left(\frac{1}{8}\right) \\ 
    &\ + (1)(1)\left(\frac{1}{8}\right) + (2)(1)\left(\frac{2}{8}\right) + (3)(1)\left(\frac{1}{8}\right) \\ 
    &\ + (1)(2)\left(\frac{1}{8}\right) + (2)(2)\left(\frac{1}{8}\right) \\ 
    &\ + (1)(3)\left(\frac{1}{8}\right) \\ 
    &= \frac{17}{8} = 2.125 
    \end{align*}$

2. Next, if we define  $g(x)=x$ , and compute the expected value of  $X$:
$\begin{align*} 
    \text{E}[X] = \mathop{\sum\sum}_{(x,y)}x\cdot p(x,y) &= (0)\left(\frac{1}{8}\right) \\ 
    &\ + (1)\left(\frac{1}{8}\right) + (2)\left(\frac{2}{8}\right) + (3)\left(\frac{1}{8}\right) \\ 
    &\ + (1)\left(\frac{1}{8}\right) + (2)\left(\frac{1}{8}\right) \\ 
    &\ + (1)\left(\frac{1}{8}\right)\\ 
    &= \frac{12}{8} = 1.5 
    \end{align*}$

## 1-6) Conditional Expectations of Functions of Jointly Distributed $E_X[h(X,Y)]$

When many random variables are involved, and there is no subscript in the 𝐸 symbol, the expected value is taken with respect to their joint distribution:

$E[h(X,Y)] = \int_{-\infty}^\infty \int_{-\infty}^\infty h(x,y) f_{XY}(x,y) \, dx \, dy$

When a subscript is present, it tells us on which variable we should condition.


$E_X[h(X,Y)] = E[h(X,Y)\mid X] = \int_{-\infty}^\infty h(x,y) g(y|x)\,dy$


$g(y|x)=f_{Y|X}(y|x)=\dfrac{f(x,y)}{f_X(x)}$


Refs: [1](https://stats.stackexchange.com/questions/72613/subscript-notation-in-expectations)



#### Example
Suppose that the joint distribution of $X$, $Y$  is uniform over the unit circle.


$f(x,y) = \left\{\begin{array}{l l} 
\frac{1}{\pi} &  0\leq x^2 +y^2 \leq1 \\ 
0 & \text{otherwise}. 
\end{array}\right.\notag$


Let $h(X,Y)=X^2+Y^2$. 

Find $E(h(X,Y)|Y=-0.2)$

$E(h(X,Y)|Y=y)= \int_{-\infty}^\infty h(x,y) g(x|y)\,dx$


We have: 

$h(x,y)=x^2+y^2$


First we need $g(x|y)$ which is:


$g(x|y)=f_{X|Y}(x|y)=\dfrac{f(x,y)}{f_Y(y)}$

We can easily verify that:


$f_Y(y)=\int_{-\infty}^\infty f(x,y)dx = \int_{-\sqrt{1-y^2}}^{\sqrt{1-y^2}}  \frac{1}{\pi}dx=\frac{2}{\pi}\sqrt{1-y^2}$


So 

$g(x|y)=f_{X|Y}(x|y)=\dfrac{1}{2 \sqrt{1-y^2}}$



$E(h(X,Y)|Y=y)= \int_{-\infty}^\infty h(x,y) g(x|y)\,dx=\int_{-\infty}^\infty (x^2+y^2) \dfrac{1}{2 \sqrt{1-y^2}}\,dx$

Refs: [1](https://www.asc.ohio-state.edu/herbei.1/6201/PDF/10_21_15.pdf)

## 1-7) Expected Value and Marginal Probability Density Functions $ f_{X}(x)=\operatorname {E} _{Y}[f_{X\mid Y}(x\mid y)]\;$





A marginal probability can always be written as an expected value:

${\displaystyle f_{X}(x)=\int_{-\infty}^\infty f_{X\mid Y}(x\mid y)\,f_{Y}(y)\,\mathrm {d} y=\operatorname {E} _{Y}[f_{X\mid Y}(x\mid y)]\;.}$

where:
- $f(x,y)$: joint probability density function (joint p.d.f)
- $f_X(x)=\int_{-\infty}^\infty f(x,y)dy$: Marginal probability density functions of $x$, 
- $f_Y(y)=\int_{-\infty}^\infty f(x,y)dx$: Marginal probability density functions of $y$.
- $f_{Y|X}(y|x)=\dfrac{f(x,y)}{f_X(x)}$: The conditional probability density function of $Y$ given $X=x$ 






## 1-8) Law of total expectation ${\displaystyle \operatorname {E} [X]=\operatorname {E} [\operatorname {E} [X\mid Y]]}$

If $X$ is a random variable whose expected value $\operatorname {E}[X]$ is defined, and $Y$ is any random variable on the same probability space, then:

${\displaystyle \operatorname {E} [X]=\operatorname {E} [\operatorname {E} [X\mid Y]]}$

i.e., the expected value of the conditional expected value of $X$ given $Y$ is the same as the expected value of $X$.


Proof.

${\displaystyle {\begin{aligned}\operatorname {E} \left(\operatorname {E} (X\mid Y)\right)&=\operatorname {E} {\Bigg [}\sum _{x}x\cdot \operatorname {P} (X=x\mid Y){\Bigg ]}\\[6pt]&=\sum _{y}{\Bigg [}\sum _{x}x\cdot \operatorname {P} (X=x\mid Y=y){\Bigg ]}\cdot \operatorname {P} (Y=y)\\[6pt]&=\sum _{y}\sum _{x}x\cdot \operatorname {P} (X=x,Y=y)=\sum _{x}x\sum _{y}\operatorname {P} (X=x,Y=y)\\[6pt]&=\sum _{x}x\cdot \operatorname {P} (X=x)\\[6pt]&=\operatorname {E} (X).\end{aligned}}}$


One special case states that if ${\displaystyle {\left\{A_{i}\right\}}_{i}}$ is a finite or countable partition of the sample space, then

${\displaystyle \operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}}$


### Example

Suppose that only two factories supply light bulbs to the market. Factory $X$'s bulbs work for an average of 5000 hours, whereas factory $Y$'s bulbs work for an average of 4000 hours. It is known that factory $X$ supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?



$E[L]$ is the expected life of the bulb.

$P[X]=\frac{6}{10}$ is the probability that the purchased bulb was manufactured by factory $X$.

$P[Y]=\frac{4}{10}$ is the probability that the purchased bulb was manufactured by factory $Y$.

$E[L|X]=5000$ is the expected lifetime of a bulb manufactured by $X$.

$E[L|Y]=4000$ is the expected lifetime of a bulb manufactured by $Y$.



${\displaystyle {\begin{aligned}\operatorname {E} (L)&=\operatorname {E} (L\mid X)\operatorname {P} (X)+\operatorname {E} (L\mid Y)\operatorname {P} (Y)\\[3pt]&=5000(0.6)+4000(0.4)\\[2pt]&=4600\end{aligned}}}$

## Take the expectation with respect to a probability measure

In neural network architecture, the posterior probability of classes $\mathbf{y}=y_1,y_2,...,y_K]$ given an input feature vector $\mathbf{x}$ is $p(\mathbf{y}|\mathbf{x};\mathbf{w})$ where $\mathbf{w}$ are the parameters of the network. Note that $\mathbf{y}$ is in one-hot encoding.


This posterior probability is estimated using maximum likelihood estimation, and therefore the objective is to maximize $E_{p(\mathbf{x},\mathbf{y})}[log(p(\mathbf{y}|\mathbf{x};\mathbf{w}))]$



Let $f$ be a function and $\mu$ be a probability measure. A notation $\mathbb E_\mu[f]$ means 

$\mathbb E_\mu[f]=\int_\mu f=\int f(x)d(\mu(x))$





$\mathbb{E}_{\mathbf{x} \sim p(\mathbf{x}|\theta)}[X].$


Refs: [1](https://www.youtube.com/watch?v=9zKuYvjFFS8&ab_channel=ArxivInsights), [2](https://www.youtube.com/watch?v=2pEkWk-LHmU&ab_channel=JordanBoyd-GraberJordanBoyd-Graber)