Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: TypeError: waterfall() got an unexpected keyword argument 'features' #3628

Open
2 of 4 tasks
iou0515 opened this issue Apr 26, 2024 · 1 comment
Open
2 of 4 tasks
Labels
awaiting feedback Indicates that further information is required from the issue creator bug Indicates an unexpected problem or unintended behaviour

Comments

@iou0515
Copy link

iou0515 commented Apr 26, 2024

Issue Description

I'm trying to use shap waterfall to plot a chart of deep learning data model for "regression" task. But it failed by "TypeError: waterfall() got an unexpected keyword argument 'features'"

Minimal Reproducible Example

from tensorflow.keras import models, layers, utils, backend as K
import matplotlib.pyplot as plt
import shap


model = models.Sequential(name="Perceptron", layers=[
    layers.Dense(             #a fully connected layer
          name="dense",
          input_dim=3,        #with 3 features as the input
          units=1,            #and 1 node because we want 1 output
          activation='linear' #f(x)=x
    )
])
model.summary()

# define the function to be called by the model following it. Not used just to manifest how to call tensorflow own functions
import tensorflow as tf
def binary_step_activation(x):
    ##return 1 if x>0 else 0 
    return K.switch(x>0, tf.math.divide(x,x), tf.math.multiply(x,0))

# build the model
model = models.Sequential(name="Perceptron", layers=[
      layers.Dense(             
          name="dense",
          input_dim=3,        
          units=1,            
          activation=binary_step_activation
      )
])

n_features = 10
model = models.Sequential(name="DeepNN", layers=[
    ### hidden layer 1
    layers.Dense(name="h1", input_dim=n_features,
                 units=int(round((n_features+1)/2)), 
                 activation='relu'),
    layers.Dropout(name="drop1", rate=0.2),
    
    ### hidden layer 2
    layers.Dense(name="h2", units=int(round((n_features+1)/4)), 
                 activation='relu'),
    layers.Dropout(name="drop2", rate=0.2),
    
    ### layer output
    layers.Dense(name="output", units=1, activation='sigmoid')
])
model.summary()

########## use the Model class to build our Perceptron and DeepNN
# Perceptron
inputs = layers.Input(name="input", shape=(3,))
outputs = layers.Dense(name="output", units=1, 
                       activation='linear')(inputs)
model = models.Model(inputs=inputs, outputs=outputs, 
                     name="Perceptron")

# DeepNN
### layer input
inputs = layers.Input(name="input", shape=(n_features,))
### hidden layer 1
h1 = layers.Dense(name="h1", units=int(round((n_features+1)/2)), activation='relu')(inputs)
h1 = layers.Dropout(name="drop1", rate=0.2)(h1)
### hidden layer 2
h2 = layers.Dense(name="h2", units=int(round((n_features+1)/4)), activation='relu')(h1)
h2 = layers.Dropout(name="drop2", rate=0.2)(h2)
### layer output
outputs = layers.Dense(name="output", units=1, activation='sigmoid')(h2)
model = models.Model(inputs=inputs, outputs=outputs, name="DeepNN")



########VISULIZATION
#'''
#Extract info for each layer in a keras model.
#'''
def utils_nn_config(model):
    lst_layers = []
    if "Sequential" in str(model): #-> Sequential doesn't show the input layer
        layer = model.layers[0]
        lst_layers.append({"name":"input", "in":int(layer.input.shape[-1]), "neurons":0, 
                           "out":int(layer.input.shape[-1]), "activation":None,
                           "params":0, "bias":0})
    for layer in model.layers:
        try:
            dic_layer = {"name":layer.name, "in":int(layer.input.shape[-1]), "neurons":layer.units, 
                         "out":int(layer.output.shape[-1]), "activation":layer.get_config()["activation"],
                         "params":layer.get_weights()[0], "bias":layer.get_weights()[1]}
        except:
            dic_layer = {"name":layer.name, "in":int(layer.input.shape[-1]), "neurons":0, 
                         "out":int(layer.output.shape[-1]), "activation":None,
                         "params":0, "bias":0}
        lst_layers.append(dic_layer)
    return lst_layers



#'''
#Plot the structure of a keras neural network.
#'''
def visualize_nn(model, description=False, figsize=(10,8)):
    ## get layers info
    lst_layers = utils_nn_config(model)
    layer_sizes = [layer["out"] for layer in lst_layers]
    
    ## fig setup
    fig = plt.figure(figsize=figsize)
    ax = fig.gca()
    ax.set(title=model.name)
    ax.axis('off')
    left, right, bottom, top = 0.1, 0.9, 0.1, 0.9
    x_space = (right-left) / float(len(layer_sizes)-1)
    y_space = (top-bottom) / float(max(layer_sizes))
    p = 0.025
    
    ## nodes
    for i,n in enumerate(layer_sizes):
        top_on_layer = y_space*(n-1)/2.0 + (top+bottom)/2.0
        layer = lst_layers[i]
        color = "green" if i in [0, len(layer_sizes)-1] else "blue"
        color = "red" if (layer['neurons'] == 0) and (i > 0) else color
        
        ### add description
        if (description is True):
            d = i if i == 0 else i-0.5
            if layer['activation'] is None:
                plt.text(x=left+d*x_space, y=top, fontsize=10, color=color, s=layer["name"].upper())
            else:
                plt.text(x=left+d*x_space, y=top, fontsize=10, color=color, s=layer["name"].upper())
                plt.text(x=left+d*x_space, y=top-p, fontsize=10, color=color, s=layer['activation']+" (")
                plt.text(x=left+d*x_space, y=top-2*p, fontsize=10, color=color, s="Σ"+str(layer['in'])+"[X*w]+b")
                out = " Y"  if i == len(layer_sizes)-1 else " out"
                plt.text(x=left+d*x_space, y=top-3*p, fontsize=10, color=color, s=") = "+str(layer['neurons'])+out)
        
        ### circles
        for m in range(n):
            color = "limegreen" if color == "green" else color
            circle = plt.Circle(xy=(left+i*x_space, top_on_layer-m*y_space-4*p), radius=y_space/4.0, color=color, ec='k', zorder=4)
            ax.add_artist(circle)
            
            ### add text
            if i == 0:
                plt.text(x=left-4*p, y=top_on_layer-m*y_space-4*p, fontsize=10, s=r'$X_{'+str(m+1)+'}$')
            elif i == len(layer_sizes)-1:
                plt.text(x=right+4*p, y=top_on_layer-m*y_space-4*p, fontsize=10, s=r'$y_{'+str(m+1)+'}$')
            else:
                plt.text(x=left+i*x_space+p, y=top_on_layer-m*y_space+(y_space/8.+0.01*y_space)-4*p, fontsize=10, s=r'$H_{'+str(m+1)+'}$')
    
    ## links
    for i, (n_a, n_b) in enumerate(zip(layer_sizes[:-1], layer_sizes[1:])):
        layer = lst_layers[i+1]
        color = "green" if i == len(layer_sizes)-2 else "blue"
        color = "red" if layer['neurons'] == 0 else color
        layer_top_a = y_space*(n_a-1)/2. + (top+bottom)/2. -4*p
        layer_top_b = y_space*(n_b-1)/2. + (top+bottom)/2. -4*p
        for m in range(n_a):
            for o in range(n_b):
                line = plt.Line2D([i*x_space+left, (i+1)*x_space+left], 
                                  [layer_top_a-m*y_space, layer_top_b-o*y_space], 
                                  c=color, alpha=0.5)
                if layer['activation'] is None:
                    if o == m:
                        ax.add_artist(line)
                else:
                    ax.add_artist(line)
    plt.show()
#indention is important to make plt.show() work here
visualize_nn(model, description=True, figsize=(10,8))

utils.plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True)
#import os # Move the png file from laptop
#os.remove('model.png')

# define metrics of accuracy and F1-score which needs to be implemented as below. The optimizer is Adam and loss function is binary_crossentropy
# On the other hand, in regression problems, I usually set the MAE as the loss and the R-squared as the metric
def Recall(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def Precision(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def F1(y_true, y_pred):
    precision = Precision(y_true, y_pred)
    recall = Recall(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))

# compile the neural network
model.compile(optimizer='adam', loss='binary_crossentropy', 
              metrics=['accuracy',F1])

# define metrics for MAE
def R2(y, y_hat):
    ss_res =  K.sum(K.square(y - y_hat)) 
    ss_tot = K.sum(K.square(y - K.mean(y))) 
    return ( 1 - ss_res/(ss_tot + K.epsilon()) )

# compile the neural network
model.compile(optimizer='adam', loss='mean_absolute_error', 
              metrics=[R2])

import numpy as np
X = np.random.rand(1000,10)
y = np.random.choice([1,0], size=1000)
print(X)
print(y)

# train/validation
training = model.fit(x=X, y=y, batch_size=32, epochs=100, shuffle=True, verbose=0, validation_split=0.3)

# plot
metrics = [k for k in training.history.keys() if ("loss" not in k) and ("val" not in k)]    
fig, ax = plt.subplots(nrows=1, ncols=2, sharey=True, figsize=(15,3))
       
## training    
ax[0].set(title="Training")    
ax11 = ax[0].twinx()    
ax[0].plot(training.history['loss'], color='black')
ax[0].set_xlabel('Epochs')    
ax[0].set_ylabel('Loss', color='black')    
for metric in metrics:        
    ax11.plot(training.history[metric], label=metric)
ax11.set_ylabel("Score", color='steelblue')    
ax11.legend()
        
## validation    
ax[1].set(title="Validation")    
ax22 = ax[1].twinx()    
ax[1].plot(training.history['val_loss'], color='black')
ax[1].set_xlabel('Epochs')    
ax[1].set_ylabel('Loss', color='black')    
for metric in metrics:          
    ax22.plot(training.history['val_'+metric], label=metric)
ax22.set_ylabel("Score", color="steelblue")    
plt.show()

#######Explainability

#'''
#Use shap to build an a explainer.
#:parameter
#    :param model: model instance (after fitting)
#    :param X_names: list
#    :param X_instance: array of size n x 1 (n,)
#    :param X_train: array - if None the model is simple machine learning, if not None then it's a deep learning model
#    :param task: string - "classification", "regression"
#    :param top: num - top features to display
#:return
#    dtf with explanations
#'''
def explainer_shap(model, X_names, X_instance, X_train=None, task="regression", top=10):
    ## create explainer
    ### machine learning
    if X_train is None:
        explainer = shap.TreeExplainer(model)
        shap_values = explainer.shap_values(X_instance)
    ### deep learning
    else:
        explainer = shap.DeepExplainer(model, data=X_train[:100])
        shap_values = explainer.shap_values(X_instance.reshape(1,-1))[0].reshape(-1)
        
    ## plot
    ### classification
    if task == "classification":
        shap.decision_plot(explainer.expected_value, shap_values, link='logit', feature_order='importance',
                           features=X_instance, feature_names=X_names, feature_display_range=slice(-1,-top-1,-1))
    ### regression
    else:     
        shap.waterfall_plot(explainer.expected_value[0], shap_values, 
                            features=X_instance, feature_names=X_names, max_display=top)
        
list_feature_names=["f1","f2","f3","f4","f5","f6","f7","f8","f9","f10"]
i = 1
explainer_shap(model, 
               X_names=list_feature_names, 
               X_instance=X[i], 
               X_train=X,  
               task="regression", #task="classification",
               top=10)

Traceback

Warning (from warnings module):
  File "C:\Users\Administrator\AppData\Roaming\Python\Python310\site-packages\shap\explainers\_deep\deep_tf.py", line 99
    warnings.warn("Your TensorFlow version is newer than 2.4.0 and so graph support has been removed in eager mode and some static graphs may not be supported. See PR #1483 for discussion.")
UserWarning: Your TensorFlow version is newer than 2.4.0 and so graph support has been removed in eager mode and some static graphs may not be supported. See PR #1483 for discussion.

Warning (from warnings module):
  File "C:\Program Files\Python310\lib\site-packages\keras\backend.py", line 452
    warnings.warn(
UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
Traceback (most recent call last):
  File "E:/SQL server machine learning inl SQLserver 2022 iso & Acrobat 2023 iso/SQL Server Scripts for machine learning/SQL Server Scripts for machine learning/DL with Python Tensorflow linear.py", line 279, in <module>
    explainer_shap(model,
  File "E:/SQL server machine learning inl SQLserver 2022 iso & Acrobat 2023 iso/SQL Server Scripts for machine learning/SQL Server Scripts for machine learning/DL with Python Tensorflow linear.py", line 274, in explainer_shap
    shap.waterfall_plot(explainer.expected_value[0], shap_values,
TypeError: waterfall() got an unexpected keyword argument 'features'

Expected Behavior

Expected to see the similar chart as attached
Screenshot 2024-04-26 053916

Py file is attached as text file format
code.py.txt

Bug report checklist

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest release of shap.
  • I have confirmed this bug exists on the master branch of shap.
  • I'd be interested in making a PR to fix this bug

Installed Versions

cloudpickle-3.0.0 llvmlite-0.42.0 numba-0.59.1 shap-0.45.0 slicer-0.0.7 tqdm-4.66.2

@iou0515 iou0515 added the bug Indicates an unexpected problem or unintended behaviour label Apr 26, 2024
@CloseChoice
Copy link
Collaborator

CloseChoice commented May 16, 2024

Thanks for reporting and the example. Unfortunately this is not reproducible, when I run your script I get

Traceback (most recent call last):
  File "/home/tobias/programming/github/shap_master/bugs/waterfall_unexpected_argument.py", line 168, in <module>
    visualize_nn(model, description=True, figsize=(10,8))
  File "/home/tobias/programming/github/shap_master/bugs/waterfall_unexpected_argument.py", line 103, in visualize_nn
    lst_layers = utils_nn_config(model)
  File "/home/tobias/programming/github/shap_master/bugs/waterfall_unexpected_argument.py", line 90, in utils_nn_config
    dic_layer = {"name":layer.name, "in":int(layer.input.shape[-1]), "neurons":0,
AttributeError: 'list' object has no attribute 'shape'

Even if I change that line to:

    dic_layer = {"name":layer.name, "in":int(linputs.shape[-1]), "neurons":0,

this throws an error at another line. Could you please fix this, I feel that this is close? Otherwise I am afraid that we'll put this on low priority since reverse engineering issues is very time consuming.

@CloseChoice CloseChoice added the awaiting feedback Indicates that further information is required from the issue creator label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting feedback Indicates that further information is required from the issue creator bug Indicates an unexpected problem or unintended behaviour
Projects
None yet
Development

No branches or pull requests

2 participants