## Expected Value
Denoted ${\displaystyle E(X)}$ or ${\displaystyle E[X]}$  is a generalization of the weighted average, also known as the expectation, mathematical expectation, mean, average, or first moment.

Let ${\displaystyle X}$ be a random variable with a finite number of finite outcomes ${\displaystyle x_{1},x_{2},\ldots ,x_{k}}$ occurring with probabilities ${\displaystyle p_{1},p_{2},\ldots ,p_{k},}$ respectively. The expectation of ${\displaystyle X}$ is defined as:

${\displaystyle \operatorname {E} [X]=\sum _{i=1}^{k}x_{i}\,p_{i}=x_{1}p_{1}+x_{2}p_{2}+\cdots +x_{k}p_{k}.}$


If ${\displaystyle X}$ is a random variable with a probability density function of ${\displaystyle f(x)}$, then the expected value is defined as the Lebesgue integral

${\displaystyle \operatorname {E} [X]=\int _{\mathbb {R} }xf(x)\,dx,}$

## Expected Value Properties

${\displaystyle {\begin{aligned}\operatorname {E} [X+Y]&=\operatorname {E} [X]+\operatorname {E} [Y]\end{aligned}}}$

$\operatorname {E}[aX]=a\operatorname {E}[X]$

If ${\displaystyle X\leq Y}$ , and both ${\displaystyle \operatorname {E} [X]}$ and ${\displaystyle \operatorname {E} [Y]}$ exist, then ${\displaystyle \operatorname {E} [X]\leq \operatorname {E} [Y]}$


In general, the expected value is not multiplicative, i.e. ${\displaystyle \operatorname {E} [XY]}$ is not necessarily equal to ${\displaystyle \operatorname {E} [X]\cdot \operatorname {E} [Y]}$. If ${\displaystyle X}$ and ${\displaystyle Y}$ are **independent**, then  ${\displaystyle \operatorname {E} [XY]=\operatorname {E} [X]\operatorname {E} [Y]}$. 


## Expected Value


$\mathbb{E}[X] = \int_x x \cdot p(x) \ dx$

Indicates the "average" value of the random variable $X$. 



## Expected Value of a Function
Sometimes interest will focus on the expected value of some function
$h(X)$ rather than on just $E(X)$.

If the random variable $X$ has a set of possible values $D$ and pmf $p(x)$, then the expected value of any function $h(X)$, denoted by $E[h(X)]$ or $\mu_{h(X)}$:

$E[h(X)]=\sum_{D} h(x).p(x) $


For a continuous function the expectation function is:

$\mathbb{E}[h(X)]=\int_x h(x) \cdot p(x) \ dx$

### Example
A computer store has purchased three computers of a certain type at 500 USD apiece. It will sell them for 1000 USD apiece. The manufacturer has agreed to repurchase any computers still unsold after a specified period at 200 USD apiece.
Let $X$ denote the number of computers sold, and suppose that: 
- $p(0) =0.1$ 
- $p(1) =0.2$ 
- $p(2) =0.3$
- $p(3) =0.4$

With $h(X)$ denoting the profit associated with selling $X$ units, the given information implies that:

$h(X)= revenue-cost$

$cost=3*500$

$revenue=1000 \times x + 200\times(3 - x)$

The expected profit is then:

$E[h(X)]=p(0)\times h(0) +p(1)\times h(1) +p(2)\times h(2) +p(3)\times h(3)= (-900)(.1) + (- 100)(.2) + (700)(.3) + (1500)(.4)=700$



Refs:[1](https://www.stat.purdue.edu/~zhanghao/STAT511/handout/Stt511%20Sec3.3.pdf)

## Conditional expectation
Conditional expectation value, or conditional mean of a random variable is its expected value (the value it would take “on average” over an arbitrarily large number of occurrences) given that a certain set of "conditions" is known to occur. 

Depending on the context, the conditional expectation can be either a random variable  ${\displaystyle E(X\mid Y)}$
or a function ${\displaystyle E(X\mid Y=y)}$ or ${\displaystyle E(X\mid Y)=f(Y)}$.


### Discrete random variables

${\displaystyle {\begin{aligned}\operatorname {E} (X\mid Y=y)&=\sum _{x}xP(X=x\mid Y=y)\\&=\sum _{x}x{\frac {P(X=x,Y=y)}{P(Y=y)}}\end{aligned}}}$

${\displaystyle P(X=x,Y=y)}$ is the **joint probability mass function** of $X$ and $Y$.


The joint probability mass function of two discrete random variables ${\displaystyle X,Y}$ is:

${\displaystyle p_{X,Y}(x,y)=\mathrm {P} (X=x\ \mathrm {and} \ Y=y)}$
 



### Continuous random variables

${\displaystyle {\begin{aligned}\operatorname {E} (X\mid Y=y)&=\int _{-\infty }^{\infty }xf_{X|Y}(x,y)\mathrm {d} x\\&=\int _{-\infty }^{\infty }{\frac {xf_{X,Y}(x,y)}{f_{Y}(y)}}\mathrm {d} x\end{aligned}}}$


### Example

Consider the roll of a fair die and let $A = 1$ if the number is even (i.e., 2, 4, or 6) and $A = 0$ otherwise. Furthermore, let $B = 1$ if the number is prime (i.e., 2, 3, or 5) and $B = 0$ otherwise.


|   |1	|2	|3	|4	|5	|6  |
|---|---|---|---|---|---|---|
|A	|0	|1	|0	|1	|0	|1  |
|B	|0	|1	|1	|0	|1	|0  |


1. The unconditional expectation of $A$ is ${\displaystyle E[A]=(0+1+0+1+0+1)/6=1/2}$
2. The expectation of A conditional on $B = 1$ (i.e., conditional on the die roll being 2, 3, or 5) is ${\displaystyle E[A\mid B=1]=(1+0+0)/3=1/3}$
3. The expectation of A conditional on $B = 0$ (i.e., conditional on the die roll being 1, 4, or 6) is ${\displaystyle E[A\mid B=0]=(0+1+1)/3=2/3}$


## Expectations of Functions of Jointly Distributed Discrete Random Variables


Suppose that  $X$  and  $Y$  are jointly distributed discrete random variables with joint pmf  $p(x,y)$.

If  $g(X,Y)$  is a function of these two random variables, then its expected value is given by the following:

$\text{E}[g(X,Y)] = \mathop{\sum\sum}_{(x,y)}g(x,y)p(x,y).\notag$

### Example


We toss a fair coin three times and record the sequence of heads  $(h)$  and tails  $(t)$. Random variable  $X$  denote the number of heads obtained and random variable  $Y$  denote the winnings earned in a single play of a game with the following rules

- $\$1$ if first  $h$  occurs on the first toss
- $\$2$ if first $h$ occurs on the second toss
- $\$3$ if first $h$ occurs on the third toss
- $\$-1$ if no $h$ occur


Note that the possible values of $X$ are  $x=0,1,2,3$ , and the possible values of  $Y$  are  $y=−1,1,2,3$. 


Joint pmf of $X$ and $Y$
<table>
    <thead>
        <tr>
            <th>p (x,y)</th>
            <th  colspan="4" rowspan="1" scope="row">\(X\)</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <th  scope="row">$Y$</th>
            <th >0</th>
            <th >1</th>
            <th >2</th>
            <th >3</th>
        </tr>
        <tr>
            <th  scope="row">-1</th>
            <td ><span  >1/8</span></td>
            <td >0</td>
            <td >0</td>
            <td >0</td>
        </tr>
        <tr>
            <th  scope="row">1</th>
            <td >0</td>
            <td ><span  >1/8</span></td>
            <td ><span  >2/8</span></td>
            <td ><span  >1/8</span></td>
        </tr>
        <tr>
            <th  scope="row">2</th>
            <td >0</td>
            <td ><span  >1/8</span></td>
            <td ><span  >1/8</span></td>
            <td >0</td>
        </tr>
        <tr>
            <th  scope="row">3</th>
            <td >0</td>
            <td ><span  >1/8</span></td>
            <td >0</td>
            <td >0</td>
        </tr>
    </tbody>
</table>




1. If, we define  $g(x,y)=xy$ , and compute the expected value of  $XY$ :
$\begin{align*} 
    \text{E}[XY] = \mathop{\sum\sum}_{(x,y)}xy\cdot p(x,y) &= (0)(-1)\left(\frac{1}{8}\right) \\ 
    &\ + (1)(1)\left(\frac{1}{8}\right) + (2)(1)\left(\frac{2}{8}\right) + (3)(1)\left(\frac{1}{8}\right) \\ 
    &\ + (1)(2)\left(\frac{1}{8}\right) + (2)(2)\left(\frac{1}{8}\right) \\ 
    &\ + (1)(3)\left(\frac{1}{8}\right) \\ 
    &= \frac{17}{8} = 2.125 
    \end{align*}$

2. Next, if we define  $g(x)=x$ , and compute the expected value of  $X$:
$\begin{align*} 
    \text{E}[X] = \mathop{\sum\sum}_{(x,y)}x\cdot p(x,y) &= (0)\left(\frac{1}{8}\right) \\ 
    &\ + (1)\left(\frac{1}{8}\right) + (2)\left(\frac{2}{8}\right) + (3)\left(\frac{1}{8}\right) \\ 
    &\ + (1)\left(\frac{1}{8}\right) + (2)\left(\frac{1}{8}\right) \\ 
    &\ + (1)\left(\frac{1}{8}\right)\\ 
    &= \frac{12}{8} = 1.5 
    \end{align*}$

## Conditional expectation of joint distribution

Refs: [1](https://web.stanford.edu/class/archive/cs/cs109/cs109.1196/lectures/13%20-%20ConditionalJoints.pdf)

## Subscript notation in expectations

When many random variables are involved, and there is no subscript in the 𝐸 symbol, the expected value is taken with respect to their joint distribution:

$E[h(X,Y)] = \int_{-\infty}^\infty \int_{-\infty}^\infty h(x,y) f_{XY}(x,y) \, dx \, dy$

When a subscript is present, it tells us on which variable we should condition.

$E_X[h(X,Y)] = E[h(X,Y)\mid X] = \int_{-\infty}^\infty h(x,y) f_{h(X,Y)\mid X}(h(x,y)\mid x)\,dy$


Refs: [1](https://stats.stackexchange.com/questions/72613/subscript-notation-in-expectations)



## Take the expectation with respect to a probability measure

In neural network architecture, the posterior probability of classes $\mathbf{y}=y_1,y_2,...,y_K]$ given an input feature vector $\mathbf{x}$ is $p(\mathbf{y}|\mathbf{x};\mathbf{w})$ where $\mathbf{w}$ are the parameters of the network. Note that $\mathbf{y}$ is in one-hot encoding.


This posterior probability is estimated using maximum likelihood estimation, and therefore the objective is to maximize $E_{p(\mathbf{x},\mathbf{y})}[log(p(\mathbf{y}|\mathbf{x};\mathbf{w}))]$



Let $f$ be a function and $\mu$ be a probability measure. A notation $\mathbb E_\mu[f]$ means 

$\mathbb E_\mu[f]=\int_\mu f=\int f(x)d(\mu(x))$





$\mathbb{E}_{\mathbf{x} \sim p(\mathbf{x}|\theta)}[X].$


Refs: [1](https://www.youtube.com/watch?v=9zKuYvjFFS8&ab_channel=ArxivInsights), [2](https://www.youtube.com/watch?v=2pEkWk-LHmU&ab_channel=JordanBoyd-GraberJordanBoyd-Graber)

## Expectation with respect to a probability distribution

$\mathbb{E}[X] = \int_x x \cdot p(x) \ dx$

$\mathbb{E}[g(X)]=\int_x g(x) \cdot p(x) \ dx$


$\mathbb{E}_{p(\mathbf{x};\mathbf{\theta})}[f(\mathbf{x};\mathbf{\phi})] = \int p(\mathbf{x};\mathbf{\theta}) f(\mathbf{x};\mathbf{\phi}) d\mathbf{x}$

$\mathbb{E}_{\mathbf{x}}[f(\mathbf{x};\mathbf{\phi})]$

$\mathbf{x} \sim p(\mathbf{x};\mathbf{\theta})$

## Marginal distribution and Expected Value

${\displaystyle p_{X}(x_{i})=\sum _{j}p(x_{i},y_{j})},$ and ${\displaystyle \ p_{Y}(y_{j})=\sum _{i}p(x_{i},y_{j})}$

A marginal probability can always be written as an expected value:

${\displaystyle p_{X}(x)=\int _{y}p_{X\mid Y}(x\mid y)\,p_{Y}(y)\,\mathrm {d} y=\operatorname {E} _{Y}[p_{X\mid Y}(x\mid y)]\;.}$
