# FTML Project Exercise 2

For this exercise, we will study the behavior of the Bayes Risk with absolute loss.

## Question 0

For this exercise, we will use the following function:

$$
f(x) = S * (x - 7)^3
$$

With S a natural random variable following this specific distribution: 

$$
\begin{array}{c|ccccc}
s & 1 & 2 & 3 & 4 & 5 \\ \hline
P(S = s) & 0.05 & 0.05 & 0.20 & 0.30 & 0.40
\end{array}
$$

The expectancy of S is 3.95

---
This function is derivable as follow:
$$
f'(x) = 3S * (x - 7)^2
$$

It therefore admits a 0 derivative for x0 = 7, without x = 7 being a local extremum. Let us plot this function:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Define the function
def f(x):
    return (x - 7)**3

def f_derivative(x) :
    return 3 * (x - 7)**2

# Create a range of x values
x = np.linspace(0, 14, 40)
y = f(x)
z = f_derivative(x)

# Plot the function
plt.plot(x, y, label='f(x) = (x - 7)^3')
plt.plot(x, z, label='f\'(x) = 3(x - 7)^2')
plt.title('Plot of f(x) and f\'(x),   S=1')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.axhline(0, color='black', linewidth=0.5)  # x-axis
plt.axvline(0, color='black', linewidth=0.5)  # y-axis
plt.grid(True)
plt.legend()
plt.show()

## Question 1

Let us define an estimator h as follow:

$$
\LARGE
h(x) = 3 * (x - 7)^3
$$

Let us also define the Bayes predictor for the squared loss:

$$
\LARGE
f^*_{\ell_{\text{squared}}}(x) = 
\mathbb{E}[Y \mid X = x] = \bigl( x - 7 \bigr)^3 \cdot \mathbb{E}[S]
$$

Finally, we have:
$$
f^*_{\ell_{\text{squared}}}(x) = \bigl( x - 7 \bigr)^3 \cdot 3.95
$$

Let us compute their generalization error for both absolute and squared loss:

In [None]:
SAMPLE_SIZE = 1_000_000

# Bayes predictor for squared loss
bayes_pred = lambda x: 3.95 * (x - 7)**3

# bad estimator
h = lambda x: 3 * (x - 7)**3 

# Define S values
s_values = np.array([1, 2, 3, 4, 5])
s_probs = np.array([0.05, 0.05, 0.20, 0.30, 0.40])
s = np.random.choice(s_values, p=s_probs, size=SAMPLE_SIZE)

# Fix x
x = np.random.randint(low = -20, high = 20, size = 1)[0]

Y = s * (x - 7)**3

X_bayes = bayes_pred(x)
X_h = h(x)

# Absolute loss
R_absolute_bayes = np.mean(np.abs(Y - X_bayes))
R_absolute_h = np.mean(np.abs(Y - X_h))

# Squared loss
R_squared_bayes = np.mean((Y - X_bayes)**2)
R_squared_h = np.mean((Y - X_h)**2)

print(f"Settings: x = {x}")

assert R_absolute_bayes < R_absolute_h

plt.title("Risk comparison (absolute)")
plt.bar(x="Bayesian estimator)", height=R_absolute_bayes)
plt.bar(x="Bad estimator",      height=R_absolute_h)
plt.show()

plt.title("Risk comparison (squared)")
plt.bar(x="Bayesian estimator", height=R_squared_bayes)
plt.bar(x="Bad estimator",      height=R_squared_h)
plt.show()

pass

On constate que le risque empirique du predicteur de Bayes est inferieur a celui de l'autre predicteur. Cela implique donc que 
$$ 
\LARGE
f^*_{\ell_{\text{absolute}}} != f^*_{\ell_{\text{squared}}}$$

## Question 2

Supposons z minimisant l'esperance 
$$
\large E[abs(y - z) | X = x]
$$

Soit f(y) definit comme ceci:
$
f(y) = P(Y|X=x)(y)
$

---
$$
\large
g(z) = \int_{-\infty}^{\infty} |y - z| * f(y) dy \newline \\
$$
$$
\large
= \int_{-\infty}^{z} |y - z| * f(y) dy + 
\int_{z}^{\infty} |y - z| * f(y) dy
$$
$$
\large
= \int_{-\infty}^{z} (z - y) * f(y) dy + 
\int_{z}^{\infty} (y - z) * f(y) dy
$$

---
Admettons maintenant la valeur $z_0$ > z qui minimise l'integrale:
$$
= \int_{-\infty}^{z} (z - y) * f(y) dy + 
\int_{z}^{z_0} (y - z) * f(y) dy +  
\int_{z_0}^{\infty} (y - z) * f(y) dy
$$

---
On passe l'integrale de gauche entre $-\infty$ et $z_0$:
$$
= \int_{-\infty}^{z_0} (z - y) * f(y) dy  - 
\int_{z}^{z_0} (z - y) * f(y) dy + 
\int_{z}^{z_0} (y - z) * f(y) dy +  
\int_{z_0}^{\infty} (y - z) * f(y) dy
$$

---
On remarque un terme redondant donc on les fusionne:
$$
= \int_{-\infty}^{z_0} (z - y) * f(y) dy  + 
2 * \int_{z}^{z_0} (y - z) * f(y) dy +  
\int_{z_0}^{\infty} (y - z) * f(y) dy
$$

---
On introduit $z_0$ dans le coefficient avant $f(y)$ puis on separe:
$$
= \int_{-\infty}^{z_0} ((z - z_0) + (z_0 - y)) * f(y) dy  + 
2 * \int_{z}^{z_0} (y - z) * f(y) dy +  
\int_{z_0}^{\infty} ((y - z_0) + (z_0 - z)) * f(y) dy
$$
$$
= \int_{-\infty}^{z_0} (z - z_0) * f(y) dy + 
\int_{-\infty}^{z_0}(z_0 - y) * f(y) dy  + 
2 * \int_{z}^{z_0} (y - z) * f(y) dy +  
\int_{z_0}^{\infty} (y - z_0) * f(y) + 
\int_{z_0}^{\infty}(z_0 - z) * f(y) dy
$$

---
On reconnais l'expression de $E[|y - z_0|]$
$$
E[|y - z_0|] = 
\int_{-\infty}^{z_0}(z_0 - y) * f(y) dy  +
\int_{z_0}^{\infty} (y - z_0) * f(y) dy
$$



On a donc :
$$
E[|y - z_0|] + 
\int_{-\infty}^{z_0} (z - z_0) * f(y) dy + 
2 * \int_{z}^{z_0} (y - z) * f(y) dy +  
\int_{z_0}^{\infty}(z_0 - z) * f(y) dy
$$

---
On observe que 
$
\int_{-\infty}^{z_0} * f(y) dy + 
\int_{z_0}^{\infty}* f(y) dy
$ est simplifiable lorsque $z_0$ est la mediane des valeurs possibles de f(y) (supposant X = x), puisque la moitie est superieur a $z_0$ et l'autre moitie est inferieure. On obtient donc:

$$
E[|y - z_0|] + 
(z - z_0) * 1/2 + 
2 * \int_{z}^{z_0} (y - z) * f(y) dy +  
(z_0 - z) * 1/2
$$

$$
= E[|y - z_0|] + 
2 * \int_{z}^{z_0} (y - z) * f(y) dy +  
$$
Avec $z_0$ egal a la mediane des valeurs de y.

---
On cherche desormais a minimiser cet integrale: 
$$
= E[|y - z_0|] + 
2 * \int_{z}^{z_0} (y - z) * f(y) dy
$$

Le premier terme est une constante puisque y et $z_0$ sont fixes, il faut donc minimiser le second terme avec une valeur particuliere de z: 
$
2 * \int_{z}^{z_0} (y - z) * f(y) dy
$

Nous travaillons avec y entre z et $z_0$ donc l'integrale sera forcement positive (on rappelle f est une densite). La plus petite valeur que peut prendre l'integrale est donc 0, s'obtenant facilement en fixant z = $z_0$.

Ainsi,
$
\large E[abs(y - z) | X = x]
$ est minimisee lorsque z est egal a la mediane des valeurs possibles de f(y) sachant X = x:

$$f^*_{\ell_{\text{absolute}}} = Median(Y | X = x)$$