# Week 10: Expectation and Variance

**Course**: BSMA1002 - Statistics for Data Science I  
**Topic**: Expected Value, Variance, Standard Deviation  
**Week**: 10

## ðŸŽ¯ Objectives
- Calculate Expected Value $E[X]$ (weighted average)
- Calculate Variance $Var(X)$ and Standard Deviation $\sigma$
- Understand properties of Expectation and Variance (Linearity)
- Apply concepts to risk analysis and decision making

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

plt.style.use('seaborn-v0_8-whitegrid')
print("Setup complete!")

## 1. Expected Value (The Center)

The expected value is the theoretical long-run average.
$$E[X] = \sum x \cdot P(X=x)$$

It is **not** necessarily the most likely value, nor even a possible value (e.g., 3.5 for a die).

In [None]:
# Example: Fair Die
x = np.arange(1, 7)
p = np.array([1/6] * 6)

# Manual calculation
expected_value = np.sum(x * p)
print(f"E[X] (Manual): {expected_value}")

# Using scipy.stats
die_rv = stats.rv_discrete(values=(x, p))
print(f"E[X] (Scipy): {die_rv.mean()}")

## 2. Variance (The Spread)

Variance measures how spread out the values are from the mean.
$$Var(X) = E[(X - \mu)^2] = \sum (x - \mu)^2 \cdot P(X=x)$$
Computational formula:
$$Var(X) = E[X^2] - (E[X])^2$$

In [None]:
# Manual calculation
variance_manual = np.sum((x - expected_value)**2 * p)
print(f"Var(X) (Manual): {variance_manual:.4f}")

# Computational formula
e_x2 = np.sum(x**2 * p)
variance_comp = e_x2 - expected_value**2
print(f"Var(X) (Computational): {variance_comp:.4f}")

# Scipy
print(f"Var(X) (Scipy): {die_rv.var():.4f}")
print(f"Std Dev (Scipy): {die_rv.std():.4f}")

## 3. Linearity Properties

- $E[aX + b] = aE[X] + b$
- $Var(aX + b) = a^2 Var(X)$

**Example**: Temperature conversion.
Let $C$ be temperature in Celsius. $F = 1.8C + 32$.

In [None]:
# Verify Linearity
# Let X be the die roll
a, b = 2, 5
Y = a * x + b  # Transformed values

# E[Y] direct
e_y_direct = np.sum(Y * p)
# E[Y] using property
e_y_prop = a * expected_value + b

print(f"E[2X+5] Direct: {e_y_direct}")
print(f"E[2X+5] Property: {e_y_prop}")

# Var[Y] direct
var_y_direct = np.sum((Y - e_y_direct)**2 * p)
# Var[Y] using property
var_y_prop = a**2 * variance_manual

print(f"Var(2X+5) Direct: {var_y_direct:.4f}")
print(f"Var(2X+5) Property: {var_y_prop:.4f}")

## 4. Application: Investment Portfolio

Asset A: Returns -10% (prob 0.2), 5% (prob 0.5), 20% (prob 0.3).
Asset B: Returns 2% (prob 1.0) - Risk free.

Which is better?

In [None]:
# Asset A
ret_a = np.array([-10, 5, 20])
prob_a = np.array([0.2, 0.5, 0.3])

mean_a = np.sum(ret_a * prob_a)
var_a = np.sum((ret_a - mean_a)**2 * prob_a)
std_a = np.sqrt(var_a)

print(f"Asset A: Mean Return = {mean_a}%, Risk (Std) = {std_a:.2f}%")
print(f"Asset B: Mean Return = 2.0%, Risk (Std) = 0.00%")

# Sharpe Ratio (Mean - RiskFree) / Risk
sharpe_a = (mean_a - 2) / std_a
print(f"Sharpe Ratio A: {sharpe_a:.4f}")