In [None]:
# default_exp monte_carlo_shapley

In [None]:
#export
import numpy as np
import pandas as pd
from tqdm import tqdm

In [None]:
#hide
%load_ext autoreload
%autoreload 2

# Monte Carlo Shapley

> Estimate the Shapley Values using an optimized Monte Carlo version.

## Theory

### Shapley Value definition

In Collaborative Game Theory, Shapley Values ([Shapley,1953]) can distribute a reward among players in a fairly way according to their contribution to the win in a cooperative game. We note $\mathcal{M}$ a set of $d$ players. Moreover, $v : P(\mathcal{M}) \rightarrow R_v$ a reward function such that $v(\emptyset) = 0$. The range $R_v$ can be $\Re$ or a subset of $\Re$. $P(\mathcal{M})$ is a family of sets over $\mathcal{M}$. If $S \subset \mathcal{M}\text{, } v(S)$ is the amount of wealth produced by coalition $S$ when they cooperate.

The Shapley Value of a player $j$ is a fair share of the global wealth $v(\mathcal{M})$ produced by all players together:

$$\phi_j(\mathcal{M},v) = \sum_{S \subset \mathcal{M}\backslash \{j\}}\frac{(d -|S| - 1)!|S|!}{d!}\left(v(S\cup \{j\}) - v(S)\right),$$

with $|S| = \text{cardinal}(S)$, i.e. the number of players in coalition $S$.

### Shapley Values as contrastive local attribute importance in Machine Learning

Let be $X^*\subset\Re^d$ a dataset of individuals where a Machine Learning model $f$ is trained and/or tested and $d$  the dimension of $X^*$. $d>1$ else we do not need to compute Shapley Value. We consider the attribute importance of an individual $\mathbf{x^*} = \{x_1^*, \dots, x_d^*\} \in X^*$ according to a given reference $\mathbf{r} = \{r_1, \dots, r_d\}\in X^*$.  We're looking for $\boldsymbol{\phi}=(\phi_j)_{j\in\{1, \dots, d\}}\in \Re^d$ such that:
$$ \sum_{j=1}^{d} \phi_j = f(\mathbf{x^*}) - f(\mathbf{r}), $$ 
where $\phi_j$ is the attribute contribution of feature indexed $j$.  We loosely identify each feature by its column number. Here the set of players $\mathcal{M}=\{1, \dots, d\}$ is the feature set.

In Machine Learning, a common choice for the reward is $ v(S) = \mathbb{E}[f(X) | X_S = \mathbf{x_S^*}]$, where $\mathbf{x_S^*}=(x_j^*)_{j\in S}$ and $X_S$ the element of $X$ for the coalition $S$. 
For any $S\subset\mathcal{M}$, let's define $ z(\mathbf{x^*},\mathbf{r},S)$ such that $z(\mathbf{x^*},\mathbf{r},\emptyset) = \mathbf{r}$, \ $z(\mathbf{x^*},\mathbf{r},\mathcal{M}) = \mathbf{x^*}$ and

$$ z(\mathbf{x^*},\mathbf{r},S) = (z_1,..., z_d) \text{ with } z_i =  \left\{
\begin{array}{ll}
x_i^* & \mbox{if} \ i \in S \\
r_i & \mbox{if} \ i \notin S
\end{array}
\right. .$$ 

As explain in [Merrick,2019], each reference $\textbf{r}$ sets a single-game with $ v(S) = f(z(\mathbf{x^*},\mathbf{r},S)) - f(\mathbf{r}) $, $v(\emptyset) = 0 $ and $v(\mathcal{M}) = f(\mathbf{x^*}) - f(\mathbf{r}) $.

### Optimized Monte Carlo Algorithm

**Inputs:** instance $\mathbf{x^*}$ and $\mathbf{r}$, the reward function $v$ and the number of iterations $\text{T}$.

**Result:** the Shapley Values $\boldsymbol{\widehat{\phi}} \in \Re^d$.

1.&emsp;Initialization: $\boldsymbol{\widehat{\phi}}=  \{0,\dots,0\} \text{ and } \boldsymbol{\widehat{\sigma^2}} = \{0,\dots,0\}$ \;

2.&emsp;For $t=1,\dots,T$:<br>
&emsp;&emsp;(a).&emsp;Choose the subset resulting of an uniform permutation $O\in\pi(\{1, \dots,d\})$ of the features values \;<br>
&emsp;&emsp;(b).&emsp; $v^{(1)} = v(\mathbf{r})$, $\mathbf{b} = \mathbf{r}$ \;<br>
&emsp;&emsp;(c).&emsp; For j in $O$:<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$\mathbf{b} = \left\{
\begin{array}{ll}
x_i^* & \mbox{if} \ i = j \\
b_i & \mbox{if} \ i \neq j
\end{array}
\right.$, with $i\in\{1, \dots, d\}$ \;<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$ v^{(2)} = v(\mathbf{b})$ \;<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$\phi_j =  v^{(2)} - v^{(1)}$ \;<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Update online $\widehat{\phi}$ and $\widehat{\sigma^2}$ (if $t > 1$):<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$\widehat{\sigma_j^2} = \dfrac{t-2}{t-1} \widehat{\sigma_j^2}+ (\phi_j - \widehat{\phi_j})^{2/t}$ \;<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$\widehat{\phi_j} = \dfrac{t-1}{t} \widehat{\phi_j} + \dfrac{1}{t} \phi_j$ \;<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$v^{(1)} = v^{(2)}$ \;<br>

### References

[Shapley,1953] _A value for n-person games_. Lloyd S Shapley. In Contributions to the Theory of Games, 2.28 (1953), pp. 307 - 317.

[Merrick,2019] _The Explanation Game: Explaining Machine Learning Models with Cooperative Game Theory_. Luke Merrick, Ankur Taly, 2019.

## Function 

__Parameters__

* `x`: pandas Series. The instance $\mathbf{x^*}$ for which we want to calculate Shapley value of each attribute,

* `fc`: python function. The reward function $v$,

* `r`: pandas Series. The reference $\mathbf{r}$. The Shapley values (attribute importance) is a contrastive explanation according to this individual,

* `n_iter`: integer. The number of iteration, 

* `callback`: An python object which can be called at each iteration to record distance to minimum for example. At each iteration, callback(Φ)

__Returns__

* `Φ`: pandas Series. Shapley values of each attribute

In [None]:
#export
def MonteCarloShapley(x, fc, r, n_iter, callback=None):
    """
    Estimate the Shapley Values using an optimized Monte Carlo version.
    """
    # Get general information
    f_r = fc(r.values)
    feature_names = list(x.index)
    d = len(feature_names) # dimension

    # Store Shapley Values in a pandas Series
    # Φ = pd.Series(np.zeros(d), index=feature_names)
    Φ_storage = np.empty((n_iter,d))
    # Store also the sample variance of the estimator
    # σ2 = pd.Series(np.zeros(d), index=feature_names)

    # Monte Carlo loop
    for m in tqdm(range(1, n_iter+1)):
        # Sample a random permutation order
        o = np.random.permutation(d)
        # init useful variables for this iteration
        f_less_j = f_r
        x_plus_j = r.values.copy()
        for j in o:
            x_plus_j[j] = x.values[j]
            f_plus_j = fc(x_plus_j)
            # update Φ and σ²
            Φ_j = f_plus_j - f_less_j
            # if m == 1:
            #     Φ[name_j] = Φ_j
            # else:
            #     # σ2[name_j] = (m-2)/(m-1) * σ2[name_j] + (Φ_j - Φ[name_j])**2/m
            #     Φ[name_j] = (m-1)/m * Φ[name_j] + 1/m * Φ_j
            # Φ[j] += 1/n_iter * Φ_j
            Φ_storage[m-1,j] = Φ_j
            # reassign f_less_j
            f_less_j = f_plus_j
        if callback:
            Φ = pd.Series(np.mean(Φ_storage[:m,:],axis=0), index=feature_names)
            callback(Φ)

    Φ = pd.Series(np.mean(Φ_storage,axis=0), index=feature_names)

    return Φ
    # return Φ #, σ2

## Example

We use a simulated dataset from the book _Elements of Statistical Learning_ ([hastie,2009], the Radial example). $X_1, \dots , X_{d}$ are standard independent Gaussian. The model is determined by:

$$ Y = \prod_{j=1}^{d} \rho(X_j), $$

where $\rho\text{: } t \rightarrow \sqrt{(0.5 \pi)} \exp(- t^2 /2)$. The regression function $f_{regr}$ is deterministic and simply defined by $f_r\text{: } \textbf{x} \rightarrow \prod_{j=1}^{d} \phi(x_j)$. For a reference $\mathbf{r^*}$ and a target $\mathbf{x^*}$, we define the reward function $v_r^{\mathbf{r^*}, \mathbf{x^*}}$ such as for each coalition $S$, $v_r^{\mathbf{r^*}, \mathbf{x^*}}(S) = f_{regr}(\mathbf{z}(\mathbf{x^*}, \mathbf{r^*}, S)) - f_{regr}(\mathbf{r^*}).$

 [hastie,2009] _The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition_. Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome. Springer Series in Statistics, 2009.
	

In [None]:
def generate_sample(dim, n_samples, rho=0):
    """
    Generate a dataset of independent Gaussian features
    """
    mu = np.zeros(dim)
    sigma = np.ones((dim, dim)) * rho
    np.fill_diagonal(sigma, [1] * dim)
    # Simulation
    X = np.random.multivariate_normal(mean=mu, cov=sigma, size=n_samples)
    df_X = pd.DataFrame(X, columns=['x'+str(i) for i in range(1, dim+1)])
    return df_X

In [None]:
d, n_samples = 5, 100
X = generate_sample(d, n_samples)
y = np.zeros(len(X))
for i in range(len(X)):
    phi_x = np.sqrt(.5 * np.pi) * np.exp(-0.5 * X.values[i] ** 2)
    y[i] = np.prod(phi_x)

In [None]:
n = 2**d - 2
def fc(x):
    phi_x = np.sqrt(.5 * np.pi) * np.exp(-0.5 * x ** 2)
    return np.prod(phi_x)
print("dimension = {0} ; nb of coalitions = {1}".format(str(d), str(n)))

dimension = 5 ; nb of coalitions = 30


In [None]:
idx_r, idx_x = np.random.choice(np.arange(len(X)), size=2, replace=False)
r = X.iloc[idx_r,:]
x = X.iloc[idx_x,:]

In [None]:
mc_shap = MonteCarloShapley(x=x, fc=fc, r=r, n_iter=100, callback=None)

100%|██████████| 100/100 [00:00<00:00, 6739.79it/s]


In [None]:
mc_shap

x1    0.241355
x2   -0.399226
x3    0.288679
x4    0.037454
x5    0.079285
dtype: float64

## Tests

In [None]:
r_pred = fc(r.values)
x_pred = fc(x.values)
v_M = x_pred - r_pred

In [None]:
assert np.abs(mc_shap.sum() - v_M) <= 1e-10 

## Export-

In [None]:
#hide
from nbdev.export import notebook2script
notebook2script()

Converted index.ipynb.
Converted monte_carlo_shapley.ipynb.
Converted shapley_values.ipynb.
