# Question 1

### Data

We represent an estimation of the grade of one student for an oral presentation.

X ∈ {1, 30} is the variable representing the choosen student. Y ∈ {1, 5} is the variable representing the grade of the participant.

We assume that :
- X is uniformly distributed.
- l is the 0− 1 loss (1 if there is a mistake, 0 otherwise)
- The random variable with the following distribution (knowing the severity of the professor and the previous score of the students of this class):
    - P(Y=1∣X=x)=0.15
    - P(Y=2∣X=x)=0.25
    - P(Y=3∣X=x)=0.35
    - P(Y=4∣X=x)=0.15
    - P(Y=5∣X=x)=0.1



### Bayes predictor

$$f^∗(x) = arg_{z∈Y}min~E[l(Y, z)|X = x] \\
= arg_{z∈Y}min~P(Y ≠ z | X= x)~because~l~is~the~0-1~loss \\
= arg_{z∈Y}min~1− P(Y = z|X = x) \\
= arg_{z∈Y}max~P(Y = z|X = x)$$
 
We have :
 - P(Y=1∣X=x)=0.15
 - P(Y=2∣X=x)=0.25
 - P(Y=3∣X=x)=0.35
 - P(Y=4∣X=x)=0.15
 - P(Y=5∣X=x)=0.1
 
So $$f^∗(x) = 3$$ for all X

### Bayes Risk

$$R^*(f^*)=E[l(Y,f^*(x))] \\
=0 * P(Y = f^*(X)) + 1 * P(Y ≠ f^*(X)) \\
= P(Y ≠ f^*(X)) \\
= P(Y ≠ f^*(X))∩(X = 1)) +...+P(Y ≠ f^*(X))∩(X = 30)) \\
= P ((Y ≠ f^*(X))|X = 1)P(X = 1) +....+ P((Y ≠ f^*(X))|X = 30)P(X = 30)$$

We have:
$$P(Y ≠ f^*(X))∩(X = 1))=P(Y ≠ f^*(X))) = 1 - P(Y=3)$$
and $$P(X=1)=...=P(X=30) = 1/30$$ because X follow an uniform law

So $$R^*(f^*)=30* 1/30 * (1-0.35) = 0.65$$

# Question 2

In [15]:
import numpy as np

In [18]:
def generate_data(n_samples):
    X = np.random.randint(1, 31, n_samples)
    #the +1 is necessary because values are generated between 0 and 4 and we want them between 1 and 5
    Y = np.random.choice(5, n_samples, p=[0.15, 0.25, 0.35, 0.15, 0.1]) + 1
    return X, Y

# Generate a test set
n_test = 100000
X_test, Y_test = generate_data(n_test)

# Compute empirical risk for Bayes estimator
Y_pred_bayes = np.full(n_test, 3)
empirical_risk_bayes = len(np.where(Y_test != Y_pred_bayes)[0]) / n_test

# Compute empirical risk for bad estimator
Y_pred_proposed = np.full(n_test, 2)  #We choose 2 for the bad estimator
empirical_risk_proposed = len(np.where(Y_test != Y_pred_proposed)[0]) / n_test

print(f"Test error for Bayes estimator: {empirical_risk_bayes:.4f}")
print(f"Test error for proposed estimator: {empirical_risk_proposed:.4f}")

Test error for Bayes estimator: 0.6514
Test error for proposed estimator: 0.7499


The generalization error is smaller for $f^∗(x)$ than for the estimator f(x) = 2 (0.6514 < 0.7499).

The test error for $f^*$ is 0.6514, which is pretty close of the bayes risk (0.65) for a large number od data.