# Subject 67 - Invariant Risk Minimization

In [5]:
import numpy as np
from sklearn.linear_model import LinearRegression

## What can go wrong with traditionnal machine learning training  

### Introduction 

### Let's take a look on a more concrete example

We consider here the three following random variables defined on a set of some environnement $e\in\xi_{all} $ as :
$$$$
$$X_1^e \sim \mathcal{N}(0,\,\sigma^{2}_e)$$
$$Y^e \sim X_1^e + \mathcal{N}(0,\,\sigma^{2}_e)$$
$$X_2^e \sim Y^e + \mathcal{N}(0,\,1)$$
$$$$
where $\xi_{all} $ denotes the set of all possible/existing envoironnement.
$$$$

According to you, what will happen if we decide to predict $Y$, knowing $X_1$ and $X_2$ ?

$$$$
Just before pursuing this case, let's compute now some statistics about these random variables which will be useful later :

$\sigma^2(X_1^e)=\sigma^{2}_e$,

$\sigma^2(Y^e)=2\sigma^{2}_e$,

$\sigma^2(X_2^e)=2\sigma^{2}_e+1$,

$\sigma(X_1^e,Y^e)=\sigma^{2}_e$,

$\sigma(X_2^e,Y^e)=2\sigma^{2}_e$,

$\sigma^2(X_1^e,X_2^e)=\sigma^{2}_e$


In order to predict $Y$, we will consider the three different linear regressions : $Y^e=aX_1^e+c$, $Y^e=bX_2^e+c$ and $Y^e=aX_1^e+bX_2^e+c$ and realize each regression on two distinct environnements.

In [6]:
#Define our random variables on two different environnements

n = 10000 # number of samples for each environnement 

sigma1 = 10 # variance for the environnement 1 noise 
X1e1 = np.random.normal(0,sigma1,n).reshape(-1,1)
Ye1 = X1e1 + np.random.normal(0,sigma1,n).reshape(-1,1)
X2e1 = Ye1 + np.random.normal(0,1,n).reshape(-1,1)

sigma2 = 0.1 # variance for the environnement 2 noise 
X1e2 = np.random.normal(0,sigma2,n).reshape(-1,1)
Ye2 = X1e2 + np.random.normal(0,sigma2,n).reshape(-1,1)
X2e2 = Ye2 + np.random.normal(0,1,n).reshape(-1,1)

In [7]:
def regressor(Y, Xs):
  lr = LinearRegression()
  lr.fit(np.concatenate(Xs, axis=1),Y)
  return lr.coef_[0]

In [8]:
reg1_env1 = regressor(Ye1,[X1e1])
reg2_env1 = regressor(Ye1,[X2e1])
reg3_env1 = regressor(Ye1,[X1e1,X2e1])

reg1_env2 = regressor(Ye2,[X1e2])
reg2_env2 = regressor(Ye2,[X2e2])
reg3_env2 = regressor(Ye2,[X1e2,X2e2])

In [9]:
print("By running such regressions (despite different environnements), one might naively expect to \nhave quite the same results for each pair of regressions on both environnement but this is not \nthe case. \nFor the first regression we obtain %f as coefficient on environnement 1 and %f \non environnement 2.\nFor the second regression we obtain %f as coefficient on environnement 1 and %f \non environnement 2.\nFor the third regression we obtain the couple (%f, %f) as coefficients on \nenvironnement 1 and (%f, %f) on environnement 2." % (reg1_env1, reg1_env2, reg2_env1, reg2_env2, reg3_env1[0], reg3_env1[1], reg3_env2[0], reg3_env2[1]))

By running such regressions (despite different environnements), one might naively expect to 
have quite the same results for each pair of regressions on both environnement but this is not 
the case. 
For the first regression we obtain 0.996474 as coefficient on environnement 1 and 1.004092 
on environnement 2.
For the second regression we obtain 0.995075 as coefficient on environnement 1 and 0.018855 
on environnement 2.
For the third regression we obtain the couple (0.009765, 0.990180) as coefficients on 
environnement 1 and (0.994990, 0.009961) on environnement 2.


In our study case, only the regression $Y^e=aX_1^e+c$ has given the same result on both environnement, let's see more in detail why we obtained these result.

Our linear regressor as it is defined solve the Least Square Criterion $$min \frac1n\sum\limits_{k=1}^n(y_k-a_1X_1-...-a_mX_m+b)^2$$
$$$$
The calculation of the annulation point of the gradient (with respect to $(a_i)_i,b$) gives :
$$$$
$$\begin{cases}
a_1\sigma^2(x_1)  &=\sigma(x_1,y) \\
b &=\bar{y}-a_1\bar{x_1}
\end{cases}\leftrightarrow\begin{cases}
a_1  &=\frac{\sigma(x_1,y)}{\sigma^2(x_1)} \\
b &=\bar{y}-a_1\bar{x_1} \\
\end{cases}$$ for the 1D case 
and : 
$$$$
$$\begin{cases}
a_1\sigma^2(x_1) + a_2\sigma(x_1,x_2)  &=\sigma(x_1,y) \\
a_1\sigma(x_1,x_2) + a_2\sigma^2(x_2)  &=\sigma(x_2,y) \\
b &=\bar{y}-a_1\bar{x_1}-a_2\bar{x_2} \\
\end{cases}\leftrightarrow\begin{cases}
a_1  &=\frac{\sigma^2(x_2)\sigma(x_1,y)-\sigma(x_1,x_2)\sigma(x_2,y)}{\sigma^2(x_1)\sigma^2(x_2)-\sigma(x_1,x_2)^2} \\
a_2  &=\frac{\sigma^2(x_1)\sigma(x_2,y)-\sigma(x_1,x_2)\sigma(x_1,y)}{\sigma^2(x_1)\sigma^2(x_2)-\sigma(x_1,x_2)^2} \\
b &=\bar{y}-a_1\bar{x_1}-a_2\bar{x_2} \\
\end{cases}$$ for the 2D case 





Using these results on our original regressions gives the following analytical solutions :

Coefficients for the three regressions are respectively $1, \frac{\sigma^2_e}{\sigma^2_e+0.5}$ and $(\frac{1}{\sigma^2_e+1},\frac{\sigma^2_e}{\sigma^2_e+1})$
