<a id="top"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Adversarial Attack on XWhy</b></div>

<a id="1.2"></a>
<h2 style="font-family: Verdana; font-size: 20px; font-style: normal; font-weight: normal; text-decoration: none; text-transform: none; letter-spacing: 2px; color: #155D07; background-color: #ffffff;"><b>Fooling</b> LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods</h2>

<p style="text-align: justify;">"Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods" refers to a research paper that examines the vulnerability of two popular post-hoc explanation methods, LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive exPlanations), to adversarial attacks. Adversarial attacks refer to malicious manipulations of input data designed to trick machine learning models into making incorrect predictions. In the context of explainable AI, the goal of these attacks is to modify the inputs in such a way that the explanation generated by the post-hoc explanation method is misleading.</p>
    
<p style="text-align: justify;">The paper demonstrates that both LIME and SHAP are vulnerable to adversarial attacks and shows how attackers can manipulate the inputs to generate explanations that are inconsistent with the true behavior of the model. The findings of the paper highlight the importance of considering the robustness of post-hoc explanation methods to adversarial attacks when deploying these methods in security-sensitive applications. Overall, the results of the study suggest that while post-hoc explanation methods can provide valuable insights into the behavior of machine learning models, they should be used with caution and further research is needed to make them robust to adversarial attacks.</p>


Source: [https://github.com/dylan-slack/Fooling-LIME-SHAP](https://github.com/dylan-slack/Fooling-LIME-SHAP)

Paper: [Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods](https://arxiv.org/pdf/1911.02508.pdf)

<a id="top"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Table of content</b></div>

<div style="background-color:aliceblue; padding:30px; font-size:15px;color:#034914">
    
* [Import Libraries](#lib)
* [COMPAS Example](#compas) 
* [Defining Racist Model](#racist)
* [Defining XWhy Explainability Functions](#xwhy)
* [Comparing XWhy with LIME and SHAP - Before Attack](#before_AA)
* [Comparing XWhy with LIME and SHAP - Adversarial Attack](#AA)

<a id="2"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Import libraries</b></div>

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.express as px
import matplotlib

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from  matplotlib.colors import LinearSegmentedColormap
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestClassifier
%matplotlib inline

In [None]:
!git clone https://github.com/dylan-slack/Fooling-LIME-SHAP.git -quite

In [None]:
import sys
sys.path.insert(1, '/kaggle/working/Fooling-LIME-SHAP')

In [None]:
import warnings
warnings.filterwarnings('ignore') 

from adversarial_models import * 
from utils import *
from get_data import *

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import numpy as np
import pandas as pd

import lime
import lime.lime_tabular
import shap
from copy import deepcopy

import xgboost
import shap

import plotly.io as pio


<a id="compas"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>COMPAS Example</b></div>

<p style="text-align: justify;">The COMPAS Communities and Crime dataset is a popular dataset used in criminal justice research and machine learning studies. It contains information about defendants who have been charged with crimes in the US and their characteristics, including demographic information, criminal history, and the outcomes of their cases. The data was collected by the company Northpointe and is widely used to study the fairness and bias in risk assessment tools used in the criminal justice system. Despite its widespread use, the dataset has also been subject to controversy due to concerns over the accuracy of the data and the potential for algorithmic bias in the risk assessment scores used. Nevertheless, it remains an important resource for researchers and policymakers as they seek to better understand and address issues in the criminal justice system.</p>


In [None]:
def get_and_preprocess_compas_data(params):
    """Handle processing of COMPAS according to: https://github.com/propublica/compas-analysis

    Parameters
    ----------
    params : Params
    Returns
    ----------
    Pandas data frame X of processed data, np.ndarray y, and list of column names
    """
    PROTECTED_CLASS = params.protected_class
    UNPROTECTED_CLASS = params.unprotected_class
    POSITIVE_OUTCOME = params.positive_outcome
    NEGATIVE_OUTCOME = params.negative_outcome

    compas_df = pd.read_csv("./Fooling-LIME-SHAP/data/compas-scores-two-years.csv", index_col=0)
    compas_df = compas_df.loc[(compas_df['days_b_screening_arrest'] <= 30) &
                              (compas_df['days_b_screening_arrest'] >= -30) &
                              (compas_df['is_recid'] != -1) &
                              (compas_df['c_charge_degree'] != "O") &
                              (compas_df['score_text'] != "NA")]

    compas_df['length_of_stay'] = (
                pd.to_datetime(compas_df['c_jail_out']) - pd.to_datetime(compas_df['c_jail_in'])).dt.days
    X = compas_df[['age', 'two_year_recid', 'c_charge_degree', 'race', 'sex', 'priors_count', 'length_of_stay']]

    # if person has high score give them the _negative_ model outcome
    y = np.array([NEGATIVE_OUTCOME if score == 'High' else POSITIVE_OUTCOME for score in compas_df['score_text']])
    sens = X.pop('race')

    # assign African-American as the protected class
    X = pd.get_dummies(X)
    sensitive_attr = np.array(pd.get_dummies(sens).pop('African-American'))
    X['race'] = sensitive_attr

    # make sure everything is lining up
    assert all((sens == 'African-American') == (X['race'] == PROTECTED_CLASS))
    cols = [col for col in X]

    return X, y, cols


def get_and_preprocess_cc(params):
    """"Handle processing of Communities and Crime.  We exclude rows with missing values and predict
    if the violent crime is in the 50th percentile.
    Parameters
    ----------
    params : Params
    Returns:
    ----------
    Pandas data frame X of processed data, np.ndarray y, and list of column names
    """
    PROTECTED_CLASS = params.protected_class
    UNPROTECTED_CLASS = params.unprotected_class
    POSITIVE_OUTCOME = params.positive_outcome
    NEGATIVE_OUTCOME = params.negative_outcome

    X = pd.read_csv("./Fooling-LIME-SHAP/data/communities_and_crime_new_version.csv", index_col=0)

    # everything over 50th percentil gets negative outcome (lots of crime is bad)
    high_violent_crimes_threshold = 50
    y_col = 'ViolentCrimesPerPop numeric'

    X = X[X[y_col] != "?"]
    X[y_col] = X[y_col].values.astype('float32')

    # just dump all x's that have missing values 
    cols_with_missing_values = []
    for col in X:
        if len(np.where(X[col].values == '?')[0]) >= 1:
            cols_with_missing_values.append(col)

    y = X[y_col]
    y_cutoff = np.percentile(y, high_violent_crimes_threshold)
    X = X.drop(
        cols_with_missing_values + ['communityname string', 'fold numeric', 'county numeric', 'community numeric',
                                    'state numeric'] + [y_col], axis=1)

    # setup ys
    cols = [c for c in X]
    y = np.array([NEGATIVE_OUTCOME if val > y_cutoff else POSITIVE_OUTCOME for val in y])

    return X, y, cols

In [None]:
# Get the data set and do some preprocessing
params = Params("./Fooling-LIME-SHAP/model_configurations/experiment_params.json") 
np.random.seed(params.seed)
X, y, cols = get_and_preprocess_compas_data(params)

# Add a random column -- this is what we'll have LIME/SHAP explain.
X['unrelated_column'] = np.random.choice([0,1],size=X.shape[0])
features = [c for c in X]

categorical_feature_name = ['two_year_recid', 'c_charge_degree_F', 'c_charge_degree_M',\
                            'sex_Female', 'sex_Male', 'race', 'unrelated_column']

categorical_feature_indcs = [features.index(c) for c in categorical_feature_name]

race_indc = features.index('race')
unrelated_indcs = features.index('unrelated_column')
X = X.values

<a id="racist"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Defining Racist Model</b></div>

<p style="text-align: justify;"> In the next step, we will define our model parameters `f` and `psi`. `f` is the main model that we will use to classify the data, while `psi` is the model that we want to use to explain the predictions made by `f`. In other words, `f` will make predictions about the data, and `psi` will help us understand and interpret those predictions. </p>

<p style="text-align: justify;">It's important to have both `f` and `psi` in our analysis, as `f` provides us with accurate predictions about the data, while `psi` helps us understand how the predictions were made and what factors were most important in making them. By having both models, we can make informed decisions based on the predictions made by `f`, while also understanding the reasoning behind them. This approach is useful in a variety of applications, including risk assessment, fraud detection, and predictive maintenance.</p>


In [None]:
class racist_model_f:
    # Decision rule: classify negatively if race is black
    def predict(self,X):
        return np.array([params.negative_outcome if x[race_indc] > 0 else params.positive_outcome for x in X])

    def predict_proba(self, X): 
        return one_hot_encode(self.predict(X))

    def score(self, X,y):
        return np.sum(self.predict(X)==y) / len(X)
    
class innocuous_model_psi:
    # Decision rule: classify according to randomly drawn column 'unrelated column'
    def predict(self,X):
        return np.array([params.negative_outcome if x[unrelated_indcs] > 0 else params.positive_outcome for x in X])

    def predict_proba(self, X): 
        return one_hot_encode(self.predict(X))

    def score(self, X,y):
        return np.sum(self.predict(X)==y) / len(X)

In [None]:
# Split the data and normalize
xtrain,xtest,ytrain,ytest = train_test_split(X,y)
xtest_not_normalized = deepcopy(xtest)
ss = StandardScaler().fit(xtrain)
xtrain = ss.transform(xtrain)
xtest = ss.transform(xtest)

# Train the adversarial model for LIME with f and psi 
adv_lime = Adversarial_Lime_Model(racist_model_f(), innocuous_model_psi()).\
            train(xtrain, ytrain, feature_names=features, categorical_features=categorical_feature_indcs)

In [None]:
# Let's just look at a the first example in the test set
ex_indc = np.random.choice(xtest.shape[0])

# To get a baseline, we'll look at LIME applied to the biased model f
normal_explainer = lime.lime_tabular.LimeTabularExplainer(xtrain,feature_names=adv_lime.get_column_names(),
                                                          discretize_continuous=False,
                                                          categorical_features=categorical_feature_indcs)

normal_exp = normal_explainer.explain_instance(xtest[ex_indc], racist_model_f().predict_proba).as_list()

print ("Explanation on biased f:\n",normal_exp[:3],"\n\n")

# Now, lets look at the explanations on the adversarial model 
adv_explainer = lime.lime_tabular.LimeTabularExplainer(xtrain,feature_names=adv_lime.get_column_names(), 
                                                       discretize_continuous=False,
                                                       categorical_features=categorical_feature_indcs)

adv_exp = adv_explainer.explain_instance(xtest[ex_indc], adv_lime.predict_proba).as_list()

print ("Explanation on adversarial model:\n",adv_exp[:3],"\n")

print("Prediction fidelity: {0:3.2}".format(adv_lime.fidelity(xtest[ex_indc:ex_indc+1])))

In [None]:
# Train the adversarial model for SHAP
adv_shap = Adversarial_Kernel_SHAP_Model(racist_model_f(), innocuous_model_psi()).\
            train(xtrain, ytrain, feature_names=features)

In [None]:
# Set the background distribution for the shap explainer using kmeans
background_distribution = shap.kmeans(xtrain,10)

# Let's use the shap kernel explainer and grab a point to explain
to_examine = np.random.choice(xtest.shape[0])

# Explain the biased model
biased_kernel_explainer = shap.KernelExplainer(racist_model_f().predict, background_distribution)
biased_shap_values = biased_kernel_explainer.shap_values(xtest[to_examine:to_examine+1])

# Explain the adversarial model
adv_kerenel_explainer = shap.KernelExplainer(adv_shap.predict, background_distribution)
adv_shap_values = adv_kerenel_explainer.shap_values(xtest[to_examine:to_examine+1])

# Plot it using SHAP's plotting features.
shap.summary_plot(biased_shap_values, feature_names=features, plot_type="bar")
shap.summary_plot(adv_shap_values, feature_names=features, plot_type="bar")

print ("Fidelity: {0:3.2}".format(adv_shap.fidelity(xtest[to_examine:to_examine+1])))

<a id="xwhy"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Defining XWhy Explainability Functions</b></div>

In [None]:
def Wasserstein_Dist(XX, YY):
  
    import numpy as np
    nx = len(XX)
    ny = len(YY)
    n = nx + ny

    XY = np.concatenate([XX,YY])
    X2 = np.concatenate([np.repeat(1/nx, nx), np.repeat(0, ny)])
    Y2 = np.concatenate([np.repeat(0, nx), np.repeat(1/ny, ny)])

    S_Ind = np.argsort(XY)
    XY_Sorted = XY[S_Ind]
    X2_Sorted = X2[S_Ind]
    Y2_Sorted = Y2[S_Ind]

    Res = 0
    E_CDF = 0
    F_CDF = 0
    power = 1

    for ii in range(0, n-2):
        E_CDF = E_CDF + X2_Sorted[ii]
        F_CDF = F_CDF + Y2_Sorted[ii]
        height = abs(F_CDF-E_CDF)
        width = XY_Sorted[ii+1] - XY_Sorted[ii]
        Res = Res + (height ** power) * width;  
 
    return Res

def WasserstainLIME2(X_input, model, num_perturb = 500, L_num_perturb = 100, kernel_width2 = 0.75, epsilon = 0.1):
    '''
    WasserstainLIME is a statistical version of LIME (local interpretable model-agnostic explanations) 
    in which instead of Euclidean distance, the ECDF-based distance is used.
    
    X_input: should be a numpy array that represents one point in a n-dimensional space.
    
    num_perturb: Is the number of perturbations that the algorithm uses.
    
    L_num_perturb: Is the number of perturbations in the local areas that the algorithm uses.
    
    kernel_width: Is the Kernel Width. When the decision space is very dynamic, the kernel width should be low like 0.2, 
    otherwise kernel with around 0.75 would be ideal.
    
    model: It is the trained model that can be for a classification or regression. 
    
    epsilon: It is used to normalize the WD values.
    
    '''
    
    X_input = (X_input - np.mean(X_input,axis=0)) / np.std(X_input,axis=0) #Standarization of data

    X_lime = np.random.normal(0,1,size=(num_perturb,X_input.shape[0]))
    
    Xi2 = np.zeros((L_num_perturb,X_input.shape[0]))
    
    for jj in range(X_input.shape[0]):
        Xi2[:,jj] = X_input[jj] + np.random.normal(0,0.05,L_num_perturb)

    y_lime2  = np.zeros((num_perturb,1))
    WD       = np.zeros((num_perturb,1))
    weights2 = np.zeros((num_perturb,1))
    
    for ind, ii in enumerate(X_lime):
        
        df2 = pd.DataFrame()
        
        for jj in range(X_input.shape[0]):
            temp1 = ii[jj] + np.random.normal(0,0.3,L_num_perturb)
            df2[len(df2.columns)] = temp1

        temp3 = model.predict(df2.to_numpy())
#         print(temp3)

        y_lime2[ind] = np.mean(temp3)  # For classification: np.argmax(np.bincount(temp3))
        
        WD1 = np.zeros((X_input.shape[0],1))
        
        df2 = df2.to_numpy()
        
        for kk in range(X_input.shape[0]):
            #print( df2.shape)
            WD1[kk] = Wasserstein_Dist(Xi2[:,kk], df2[:,kk])
        
        #print(WD1)
        #print(ind)
        WD[ind] = sum(WD1)
        #print(WD)
    
        weights2[ind] = np.sqrt(np.exp(-((epsilon*WD[ind])**2)/(kernel_width2**2))) 
        #print(weights2[ind])
        
        del df2
    
    weights2 = weights2.flatten()
    #print(weights2)
    
    simpler_model2 = LinearRegression() 
    simpler_model2.fit(X_lime, y_lime2, sample_weight=weights2)
    y_linmodel2 = simpler_model2.predict(X_lime)
    y_linmodel2 = y_linmodel2 < 0.5 #Conver to binary class
    y_linmodel2 = y_linmodel2.flatten()
    
    return X_lime, y_lime2, weights2, y_linmodel2, simpler_model2.coef_.flatten()

In [None]:
def xwhy_tabular_wd(X_input, model, num_perturb = 500, L_num_perturb = 100, kernel_width2 = 0.75, epsilon = 0.1):
    '''
    WasserstainLIME is a statistical version of LIME (local interpretable model-agnostic explanations) 
    in which instead of Euclidean distance, the ECDF-based distance is used.
    
    X_input: should be a numpy array that represents one point in a n-dimensional space.
    
    num_perturb: Is the number of perturbations that the algorithm uses.
    
    L_num_perturb: Is the number of perturbations in the local areas that the algorithm uses.
    
    kernel_width: Is the Kernel Width. When the decision space is very dynamic, the kernel width should be low like 0.2, 
    otherwise kernel with around 0.75 would be ideal.
    
    model: It is the trained model that can be for a classification or regression. 
    
    epsilon: It is used to normalize the WD values.
    
    '''
    
    X_input = (X_input - np.mean(X_input,axis=0)) / np.std(X_input,axis=0) #Standarization of data

    X_lime = np.random.normal(0,1,size=(num_perturb,X_input.shape[0]))
    
    Xi2 = np.zeros((L_num_perturb,X_input.shape[0]))
    
    for jj in range(X_input.shape[0]):
        Xi2[:,jj] = X_input[jj] + np.random.normal(0,0.001,L_num_perturb)

    y_lime2  = np.zeros((num_perturb,1))
    WD       = np.zeros((num_perturb,1))
    weights2 = np.zeros((num_perturb,1))
    
    for ind, ii in enumerate(X_lime):
        
        df2 = pd.DataFrame()
        
        for jj in range(X_input.shape[0]):
            temp1 = ii[jj] + np.random.normal(0,0.001,L_num_perturb)
            df2[len(df2.columns)] = temp1

        temp3 = model.predict(df2.to_numpy())
#         print(temp3)

        y_lime2[ind] = np.mean(temp3)  # For classification: np.argmax(np.bincount(temp3))
        
        WD1 = np.zeros((X_input.shape[0],1))
        
        df2 = df2.to_numpy()
        
        for kk in range(X_input.shape[0]):
            #print( df2.shape)
            WD1[kk] = Wasserstein_Dist(Xi2[:,kk], df2[:,kk])
        
        #print(WD1)
        #print(ind)
        WD[ind] = sum(WD1)
        #print(WD)
    
        weights2[ind] = np.sqrt(np.exp(-((epsilon*WD[ind])**2)/(kernel_width2**2))) 
        #print(weights2[ind])
        
        del df2
    
    weights2 = weights2.flatten()
    #print(weights2)
    
    simpler_model2 = LinearRegression() 
    simpler_model2.fit(X_lime, y_lime2, sample_weight=weights2)
    y_linmodel2 = simpler_model2.predict(X_lime)
    y_linmodel2 = y_linmodel2 < 0.5 #Conver to binary class
    y_linmodel2 = y_linmodel2.flatten()
    
    return X_lime, y_lime2, weights2, y_linmodel2, simpler_model2.coef_.flatten()

In [None]:
%%time
X_lime, y_lime2, weights2, y_linmodel2, WLIME_Coef1 = xwhy_tabular_wd(xtest[ex_indc].flatten(), 
                                                                      num_perturb = 10000, 
                                                                      kernel_width2 = 0.1, 
                                                                      L_num_perturb = 1,
                                                                      model = racist_model_f(), 
                                                                      epsilon = 1.5)



df3 = pd.DataFrame()
temp0 = np.zeros((2,2))

print(WLIME_Coef1)

df3['WDL'] = WLIME_Coef1
df3['feature_names'] = features

import plotly.express as px
fig = px.bar(df3, x='WDL', y='feature_names', orientation='h', color='feature_names')
fig.show()

In [None]:
%%time
X_lime, y_lime2, weights2, y_linmodel2, WLIME_Coef2 = xwhy_tabular_wd(xtest[ex_indc].flatten(), 
                                                                      num_perturb = 1000, 
                                                                      kernel_width2 = 0.6, 
                                                                      L_num_perturb = 100,
                                                                      model = adv_lime, 
                                                                      epsilon = 0.3)
print(WLIME_Coef2)

df3 = pd.DataFrame()
temp0 = np.zeros((2,2))

df3['WDL'] = WLIME_Coef2 
df3['feature_names'] = features

import plotly.express as px
fig = px.bar(df3, x='WDL', y='feature_names', orientation='h', color='feature_names')
fig.show()

<a id="before_AA"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Comparing XWhy with LIME and SHAP - Before Attack</b></div>

In [None]:
xg_model = xgboost.XGBClassifier().fit(xtrain, ytrain)

explainer = shap.Explainer(xg_model)
xg_shap_values = explainer(xtest[to_examine:to_examine+1])

In [None]:
X_lime, y_lime2, weights2, y_linmodel2, WLIME_Coef_xg = xwhy_tabular_wd(xtest[ex_indc].flatten(), 
                                                                      num_perturb = 5000, 
                                                                      kernel_width2 = 0.75, 
                                                                      L_num_perturb = 10,
                                                                      model = xg_model, 
                                                                      epsilon = 0.3)

In [None]:
LIME_xg = lime.lime_tabular.LimeTabularExplainer(xtrain,feature_names=adv_lime.get_column_names(),
                                                          discretize_continuous=False,
                                                          categorical_features=categorical_feature_indcs)

LIME_xg_exp = normal_explainer.explain_instance(xtest[ex_indc], xg_model.predict_proba).as_list()

In [None]:
lime_features = ['priors_count', 'c_charge_degree_F','two_year_recid','sex_Male','length_of_stay','age','c_charge_degree_M','unrelated_column','sex_Female','race']

LIME_xg_exp
exp4 = []
for ii, jj in LIME_xg_exp:
    exp4.append(jj)
exp_simple = exp4[::-1]

In [None]:
!pip install -U kaleido -q

In [None]:
fig = make_subplots(rows=1, cols=3,
                    subplot_titles=("SMILE","SHAP","LIME"),
                    horizontal_spacing = 0.1)

fig.add_trace(
    go.Bar(x=WLIME_Coef_xg, y=features, orientation='h', 
           marker=dict(color=np.argsort(WLIME_Coef1), coloraxis="coloraxis")),#, color=df3['feature_names']),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=xg_shap_values[0].values, y=features, orientation='h', 
           marker=dict(color=np.argsort(xg_shap_values[0].values), coloraxis="coloraxis")),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=exp_simple, y=lime_features, orientation='h', 
           marker=dict(color=np.argsort(exp_simple), coloraxis="coloraxis")),
    row=1, col=3
)

fig.update_layout(height=600, width=1500, title_text="Comparing SHAP, LIME and SMILE without Adversarial Attack on COMPAS Dataset",
                  coloraxis=dict(colorscale='Bluered_r'), showlegend=False) # px.colors.sequential.Viridis
fig.show()

# fig.write_image("images/before_AA.png")

<a id="AA"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Comparing XWhy with LIME and SHAP - After Adversarial Attack</b></div>

In [None]:
adv_shap_values[0]

normal_exp
exp4 = []
for ii, jj in normal_exp:
    exp4.append(jj)
exp_racist = exp4[::-1]
exp_racist

fig = make_subplots(rows=1, cols=3,
                    subplot_titles=("SMILE","SHAP","LIME"),
                    horizontal_spacing = 0.1)

fig.add_trace(
    go.Bar(x=WLIME_Coef1, y=features, orientation='h', 
           marker=dict(color=np.argsort(WLIME_Coef1), coloraxis="coloraxis")),#, color=df3['feature_names']),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=biased_shap_values[0], y=features, orientation='h', 
           marker=dict(color=np.argsort(biased_shap_values[0]), coloraxis="coloraxis")),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=exp_racist, y=lime_features, orientation='h', 
           marker=dict(color=np.argsort(exp_racist), coloraxis="coloraxis")),
    row=1, col=3
)

fig.update_layout(height=600, width=1500, title_text="Comparing SHAP, LIME and SMILE for A Racist Model",
                  coloraxis=dict(colorscale='Bluered_r'), showlegend=False) # px.colors.sequential.Viridis

fig.update_layout(
    title=dict(
        text=("Comparing SHAP, LIME and SMILE for A Racist Model"),
        font=dict(
            family="Courier New Bold, monospace",
            size=22,
            color="RebeccaPurple"
        )
    ),
    xaxis=dict(
        tickfont=dict(
            family="Courier New Bold, monospace",
            size=22,
            color="RebeccaPurple"
        )
    ),
    xaxis2=dict(
        tickfont=dict(
            family="Courier New Bold, monospace",
            size=22,
            color="RebeccaPurple"
        )
    ),
    xaxis3=dict(
        tickfont=dict(
            family="Courier New Bold, monospace",
            size=22,
            color="RebeccaPurple"
        )
    ),
    coloraxis=dict(colorscale='Bluered_r'),
    showlegend=False
)

fig.show()

# fig.write_image("images/Racist.png")

In [None]:
# adv_exp
exp3 = []
for ii, jj in adv_exp:
    exp3.append(jj)
exp_adv = exp3[::-1]
exp_adv

fig = make_subplots(rows=1, cols=3,
                    subplot_titles=("SMILE","SHAP","LIME"),
                    horizontal_spacing = 0.1)

fig.add_trace(
    go.Bar(x=WLIME_Coef2, y=features, orientation='h', 
           marker=dict(color=np.argsort(WLIME_Coef2), coloraxis="coloraxis")),#, color=df3['feature_names']),
    row=1, col=1
)

fig.add_trace(
    go.Bar(x=np.array(adv_shap_values[0]), y=features, orientation='h', 
           marker=dict(color=np.argsort(adv_shap_values[0]), coloraxis="coloraxis")),
    row=1, col=2
)

fig.add_trace(
    go.Bar(x=exp_adv, y=features, orientation='h', 
           marker=dict(color=np.argsort(exp_adv), coloraxis="coloraxis")),
    row=1, col=3
)

fig.update_layout(height=600, width=1500, title_text="Comparing SHAP, LIME and SMILE against Adversarial Attack - Unrealated Feature",
                  coloraxis=dict(colorscale='Bluered_r'), showlegend=False) # px.colors.sequential.Viridis


fig.update_layout(
    title=dict(
        text="Comparing SHAP, LIME and SMILE against Adversarial Attack - Unrealated Feature",
        font=dict(
            family="Courier New Bold, monospace",
            size=22,
            color="RebeccaPurple"
        )
    ),
    xaxis=dict(
        tickfont=dict(
            family="Courier New Bold, monospace",
            size=22,
            color="RebeccaPurple"
        )
    ),
    xaxis2=dict(
        tickfont=dict(
            family="Courier New Bold, monospace",
            size=22,
            color="RebeccaPurple"
        )
    ),
    xaxis3=dict(
        tickfont=dict(
            family="Courier New Bold, monospace",
            size=22,
            color="RebeccaPurple"
        )
    ),
    coloraxis=dict(colorscale='Bluered_r'),
    showlegend=False
)

fig.show()

# fig.write_image("images/Unrealated.png")

In [None]:
index_unrelated = features.index("unrelated_column")

# Calculate the summation of the values of "Unrelated_Column" for the three parameters
sum_unrelated = WLIME_Coef2[index_unrelated] + adv_shap_values[0][index_unrelated] + exp_adv[index_unrelated]

# Update the values in the three data arrays at the identified index
print(np.abs(WLIME_Coef2[index_unrelated]) / sum(np.abs(WLIME_Coef2)))
print(np.abs(adv_shap_values[0][index_unrelated]) / sum(np.abs(adv_shap_values[0])))
print(np.abs(exp_adv[index_unrelated]) / sum(np.abs(exp_adv)))

<center> <a href="#top" role="button" aria-pressed="true" >⬆️Back to Table of Contents ⬆️</a>

<div style="border-radius:10px;border:#034914 solid;padding: 15px;background-color:aliceblue;font-size:90%;text-align:left">

<h4><b>Author :</b> Koorosh Aslansefat </h4>

<h4> <b>Some information:</b> </h4>

<b>👉Check my Kaggle Notebooks :</b> https://www.kaggle.com/kooaslansefat <br>
<b>👉Contact Me :</b> <a href="mailto:koo.ec2008@gmail.com">koo.ec2008@gmail.com</a><br>
<b>👉Find me LinkedIn :</b> www.linkedin.com/in/koorosh-aslansefat <br>
<b>👉Find me Github :</b> https://github.com/koo-ec <br>
    
    
<center> <strong> If you liked this Notebook, please do upvote. </strong>
    
<center> <strong> If you have any questions, feel free to contact me! </strong>
    
<center> <strong> ✨Best Wishes✨ </strong>

<center> <img src="https://raw.githubusercontent.com/ntclai/PictureForMyProject/main/87481-of-thanks-letter-text-logo-calligraphy-drawing%20(1).png" style='width: 600px; height: 300px;'>