The Shapley value is a cooperative game theoretic tool used to share a resource between players.
In this tutorial we will use it to identify the importance of different variables to a linear regression. This is commonly referred to as Shaply Value Regression.
With a working installation of Python, open a command line tool and type:
$ python -m pip install coopgt
In cooperative game theory a characteristic function is a mapping from all groups of players to a given value. In this case it will correspond to the R^2 value for a linear model for some data. The y variable is going to be predicted by fitting a linear model to three variables:
y = c_1 x_1 + c_2 x_2 + c_3 x_3
Here are the R^2 values (you are welcome to see :download:`main.py </_static/data-for-shapley-regression-tutorial/main.py>` for the code used to generate them):
Model | R^2 |
---|---|
y=c_1x_1 | 0.075 |
y= c_2x_2 | 0.086 |
y= c_3x_3 | 0.629 |
y=c_1x_1 + c_2x_2 | 0.163 |
y=c_1x_1 + c_3x_3 | 0.63 |
y= c_2x_2 + c_3x_3 | 0.906 |
y=c_1x_1 + c_2x_2 + c_3x_3 | 0.907 |
We can use that table of R^2 values to create the characteristic function:
>>> characteristic_function = {
... (): 0,
... (1,): 0.075,
... (2,): 0.086,
... (3,): 0.629,
... (1, 2): 0.163,
... (1, 3): 0.63,
... (2, 3): 0.906,
... (1, 2, 3): 0.907,
... }
We now compute the Shapley value:
>>> import coopgt.shapley_value >>> shapley_value = coopgt.shapley_value.calculate(characteristic_function=characteristic_function) >>> shapley_value.round(4) array([0.0383, 0.1818, 0.6868])
From this analysis we would conclude that the parameter that contributes the most is in fact x_3.