In [None]:
# default_exp monte_carlo_shapley

In [None]:
#export
import numpy as np
import pandas as pd
from tqdm import tqdm

# Monte Carlo Shapley

> Estimate the Shapley Values using an optimized Monte Carlo version.

## Theory

### Shapley Value definition

In Collaborative Game Theory, Shapley Values ([Shapley,1953]) can distribute a reward among players in a fairly way according to their contribution to the win in a cooperative game. We note $\mathcal{M}$ a set of $d$ players. Moreover, $v : P(\mathcal{M}) \rightarrow R_v$ a reward function such that $v(\emptyset) = 0$. The range $R_v$ can be $\Re$ or a subset of $\Re$. $P(\mathcal{M})$ is a family of sets over $\mathcal{M}$. If $S \subset \mathcal{M}\text{, } v(S)$ is the amount of wealth produced by coalition $S$ when they cooperate.

The Shapley Value of a player $j$ is a fair share of the global wealth $v(\mathcal{M})$ produced by all players together:

$$\phi_j(\mathcal{M},v) = \sum_{S \subset \mathcal{M}\backslash \{j\}}\frac{(d -|S| - 1)!|S|!}{d!}\left(v(S\cup \{j\}) - v(S)\right),$$

with $|S| = \text{cardinal}(S)$, i.e. the number of players in coalition $S$.

### Shapley Values as contrastive local attribute importance in Machine Learning

Let be $X^*\subset\Re^d$ a dataset of individuals where a Machine Learning model $f$ is trained and/or tested and $d$  the dimension of $X^*$. $d>1$ else we do not need to compute Shapley Value. We consider the attribute importance of an individual $\mathbf{x^*} = \{x_1^*, \dots, x_d^*\} \in X^*$ according to a given reference $\mathbf{r} = \{r_1, \dots, r_d\}\in X^*$.  We're looking for $\boldsymbol{\phi}=(\phi_j)_{j\in\{1, \dots, d\}}\in \Re^d$ such that:
$$ \sum_{j=1}^{d} \phi_j = f(\mathbf{x^*}) - f(\mathbf{r}), $$ 
where $\phi_j$ is the attribute contribution of feature indexed $j$.  We loosely identify each feature by its column number. Here the set of players $\mathcal{M}=\{1, \dots, d\}$ is the feature set.

In Machine Learning, a common choice for the reward is $ v(S) = \mathbb{E}[f(X) | X_S = \mathbf{x_S^*}]$, where $\mathbf{x_S^*}=(x_j^*)_{j\in S}$ and $X_S$ the element of $X$ for the coalition $S$. 
For any $S\subset\mathcal{M}$, let's define $ z(\mathbf{x^*},\mathbf{r},S)$ such that $z(\mathbf{x^*},\mathbf{r},\emptyset) = \mathbf{r}$, \ $z(\mathbf{x^*},\mathbf{r},\mathcal{M}) = \mathbf{x^*}$ and

$$ z(\mathbf{x^*},\mathbf{r},S) = (z_1,\dots, z_d) \text{ with } z_i =  x_i^* \text{ if } i \in S \text{ and } r_i  \text{ otherwise }$$ 

As explain in [Merrick,2019], each reference $\textbf{r}$ sets a single-game with $ v(S) = f(z(\mathbf{x^*},\mathbf{r},S)) - f(\mathbf{r}) $, $v(\emptyset) = 0 $ and $v(\mathcal{M}) = f(\mathbf{x^*}) - f(\mathbf{r}) $.

### Optimized Monte Carlo Algorithm

**Inputs:** instance $\mathbf{x^*}$ and $\mathbf{r}$, the reward function $v$ and the number of iterations $\text{T}$.

**Result:** the Shapley Values $\boldsymbol{\widehat{\phi}} \in \Re^d$.

1.&emsp;Initialization: $\boldsymbol{\widehat{\phi}}=  \{0,\dots,0\} \text{ and } \boldsymbol{\widehat{\sigma^2}} = \{0,\dots,0\}$ \;

2.&emsp;For $t=1,\dots,T$:<br>
&emsp;&emsp;(a).&emsp;Choose the subset resulting of an uniform permutation $O\in\pi(\{1, \dots,d\})$ of the features values \;<br>
&emsp;&emsp;(b).&emsp; $v^{(1)} = v(\mathbf{r})$, $\mathbf{b} = \mathbf{r}$ \;<br>
&emsp;&emsp;(c).&emsp; For j in $O$:<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$\mathbf{b} = x_i^* \text{ if } i = j \text{ and } b_i \text{ otherwise}$, with $i\in\{1, \dots, d\}$ \;<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$ v^{(2)} = v(\mathbf{b})$ \;<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$\phi_j =  v^{(2)} - v^{(1)}$ \;<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;Update online $\widehat{\phi}$ and $\widehat{\sigma^2}$ (if $t > 1$):<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$\widehat{\sigma_j^2} = \dfrac{t-2}{t-1} \widehat{\sigma_j^2}+ (\phi_j - \widehat{\phi_j})^{2/t}$ \;<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$\widehat{\phi_j} = \dfrac{t-1}{t} \widehat{\phi_j} + \dfrac{1}{t} \phi_j$ \;<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$v^{(1)} = v^{(2)}$ \;<br>

### References

[Shapley,1953] _A value for n-person games_. Lloyd S Shapley. In Contributions to the Theory of Games, 2.28 (1953), pp. 307 - 317.

[Merrick,2019] _The Explanation Game: Explaining Machine Learning Models with Cooperative Game Theory_. Luke Merrick, Ankur Taly, 2019.

## Function 

__Parameters__

* `x`: pandas Series. The instance $\mathbf{x^*}$ for which we want to calculate Shapley value of each attribute,

* `fc`: python function. The reward function $v$,

* `r`: pandas Series. The reference $\mathbf{r}$. The Shapley values (attribute importance) is a contrastive explanation according to this individual,

* `n_iter`: integer. The number of iteration, 

* `callback`: An python object which can be called at each iteration to record distance to minimum for example. At each iteration, callback(Φ)

__Returns__

* `Φ`: pandas Series. Shapley values of each attribute

In [None]:
#export
def MonteCarloShapley(x, fc, ref, n_iter, callback=None):
    """
    Estimate the Shapley Values using an optimized Monte Carlo version.
    """
    
    # Get general information
    feature_names = list(x.index)
    d = len(feature_names) # dimension

    # Individual reference or dataset of references 
    if isinstance(ref, pd.core.series.Series):
        individual_ref = True
        f_r = fc(ref.values)
    elif isinstance(ref, pd.core.frame.DataFrame):
        if ref.shape[0] == 1:
            ref = ref.iloc[0]
            individual_ref = True
            f_r = fc(ref.values)
        else:
            individual_ref = False
            n_ref = len(ref)
            
    if individual_ref:
        # If x[j] = r[j] => Φ[j] = 0 and we can reduce the dimension
        distinct_feature_names = list(x[x!=ref].index)
        if set(distinct_feature_names) == set(feature_names):
            distinct_feature_names = feature_names
            sub_d = d
            x_cp = x.copy()
            r_cp = ref.copy()
            reward = lambda z: fc(z)
            pass
        else:
            sub_d = len(distinct_feature_names) # new dimension
            x_cp = x[distinct_feature_names].copy()
            r_cp = ref[distinct_feature_names].copy()
            print("new dimension {0}".format(sub_d))
            def reward(z):
                z_tmp = ref.copy()
                z_tmp[distinct_feature_names] = z
                return fc(z_tmp.values)
    else:
        distinct_feature_names = feature_names
        sub_d = d
        x_cp = x.copy()
        reward = lambda z: fc(z)
            
    # Store all Shapley Values in a numpy array
    Φ_storage = np.empty((n_iter, sub_d))

    # Monte Carlo loop
    for m in tqdm(range(1, n_iter+1)):
        # Sample a random permutation order
        o = np.random.permutation(sub_d)
        # initiate useful variables for this iteration 
        # if several references select at random one new ref at each iter
        if individual_ref:
            f_less_j = f_r
            x_plus_j = r_cp.values.copy()
        else:
            r_cp = ref.values[np.random.choice(n_ref, size=1)[0],:].copy()
            f_less_j = fc(r_cp)
            x_plus_j = r_cp.copy()
        # iterate through the permutation of features
        for j in o:
            x_plus_j[j] = x_cp.values[j]
            f_plus_j = reward(x_plus_j)
            # update Φ
            Φ_j = f_plus_j - f_less_j
            Φ_storage[m-1,j] = Φ_j
            # reassign f_less_j
            f_less_j = f_plus_j
        if callback:
            Φ = pd.Series(np.mean(Φ_storage[:m,:],axis=0), index=feature_names)
            callback(Φ)
            
    Φ_mean = np.mean(Φ_storage,axis=0)
    Φ = pd.Series(np.zeros(d), index=feature_names)
    Φ[distinct_feature_names] = Φ_mean
    return Φ

## Example

We use a simulated dataset from the book _Elements of Statistical Learning_ ([hastie,2009], the Radial example). $X_1, \dots , X_{d}$ are standard independent Gaussian. The model is determined by:

$$ Y = \prod_{j=1}^{d} \rho(X_j), $$

where $\rho\text{: } t \rightarrow \sqrt{(0.5 \pi)} \exp(- t^2 /2)$. The regression function $f_{regr}$ is deterministic and simply defined by $f_r\text{: } \textbf{x} \rightarrow \prod_{j=1}^{d} \phi(x_j)$. For a reference $\mathbf{r^*}$ and a target $\mathbf{x^*}$, we define the reward function $v_r^{\mathbf{r^*}, \mathbf{x^*}}$ such as for each coalition $S$, $v_r^{\mathbf{r^*}, \mathbf{x^*}}(S) = f_{regr}(\mathbf{z}(\mathbf{x^*}, \mathbf{r^*}, S)) - f_{regr}(\mathbf{r^*}).$

 [hastie,2009] _The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition_. Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome. Springer Series in Statistics, 2009.
	

In [None]:
d, n_samples = 5, 100
mu = np.zeros(d)
Sigma = np.zeros((d,d))
np.fill_diagonal(Sigma, [1] * d)
X = np.random.multivariate_normal(mean=mu, cov=Sigma, size=n_samples)
X = pd.DataFrame(X, columns=['x'+str(i) for i in range(1, d+1)])
def fc(x):
    phi_x = np.sqrt(.5 * np.pi) * np.exp(-0.5 * x ** 2)
    return np.prod(phi_x)
y = np.zeros(len(X))
for i in range(len(X)):
    y[i] = fc(X.values[i])
n = 2**d - 2
print("dimension = {0} ; nb of coalitions = {1}".format(str(d), str(n)))

dimension = 5 ; nb of coalitions = 30


### Pick an individual x to explain

In [None]:
x = X.iloc[np.random.choice(len(X), size=1)[0],:]
x

x1    1.399541
x2    0.000105
x3    0.084210
x4   -1.057178
x5    1.122922
Name: 76, dtype: float64

### Single reference

In [None]:
reference = X.iloc[np.random.choice(len(X), size=1)[0],:]
reference

x1    1.286635
x2    0.269172
x3   -0.715402
x4    0.037090
x5   -1.110788
Name: 69, dtype: float64

In [None]:
mc_shap = MonteCarloShapley(x=x, fc=fc, ref=reference, n_iter=100, callback=None)

100%|██████████| 100/100 [00:00<00:00, 5619.45it/s]


In [None]:
mc_shap

x1   -0.070221
x2    0.016489
x3    0.119197
x4   -0.251128
x5   -0.006191
dtype: float64

### Several references 

In [None]:
references = X.iloc[np.random.choice(len(X), size=10, replace=False),:]
references

Unnamed: 0,x1,x2,x3,x4,x5
46,0.8993,0.222094,0.839331,-1.998841,-0.73777
83,0.818545,-0.361986,-0.014656,1.022367,-0.83845
61,-0.169111,0.10214,0.39796,1.241908,0.764572
60,-0.818126,1.400868,0.825624,-0.342146,0.187695
80,-0.670673,0.71971,0.304841,0.702144,0.333756
34,-0.141383,0.021533,1.745541,-0.407626,0.244311
29,1.221184,1.5599,0.074129,0.109264,1.148664
59,-0.468309,0.344122,-0.713913,-0.169124,0.035903
51,1.24105,-1.942694,0.118107,0.239774,0.222236
50,1.514379,-0.134645,-0.426708,-0.793782,-0.398505


In [None]:
mc_shaps = MonteCarloShapley(x=x, fc=fc, ref=references, n_iter=1000, callback=None)

100%|██████████| 1000/1000 [00:00<00:00, 4428.95it/s]


In [None]:
mc_shaps

x1   -0.367884
x2    0.190118
x3    0.167193
x4   -0.148092
x5   -0.266249
dtype: float64

In [None]:
#hide
from shapkit_nbdev.shapley_values import ShapleyValues

## Tests

In [None]:
x_pred = fc(x.values)
reference_pred = fc(reference.values)
fcs = []
for r in references.values:
    fcs.append(fc(r))
references_pred = np.mean(fcs)

In [None]:
assert np.abs(mc_shap.sum() - (x_pred - reference_pred)) <= 1e-10 

In [None]:
assert np.abs(mc_shaps.sum() - (x_pred - references_pred)) <= 1e-1

In [None]:
true_shap = ShapleyValues(x=x, fc=fc, ref=reference)
assert np.linalg.norm(mc_shap - true_shap, 2) <= 0.1

100%|██████████| 5/5 [00:00<00:00, 344.47it/s]


In [None]:
true_shaps = ShapleyValues(x=x, fc=fc, ref=references, K=len(references))
assert np.linalg.norm(mc_shaps - true_shaps, 2) <= 0.1

100%|██████████| 5/5 [00:00<00:00, 46.02it/s]


## Export-

In [None]:
#hide
from nbdev.export import notebook2script
notebook2script()

Converted index.ipynb.
Converted inspector.ipynb.
Converted monte_carlo_shapley.ipynb.
Converted sgd_shapley.ipynb.
Converted shapley_values.ipynb.
