# ONE HOT NEURAL NETWORK (OHNN)

We present here One Hot Neural Network, which works as following

There are $K$ dense blocks that receive input data, each with the same number of neurons $N$. Each neuron has $m$ weights (one for each input feature of the data); however, only the one with highest absolute value is kept, while others are set to zero. So, for each block, each neuron selects a feature and assigns a weight to it. At this point, each block is summed up, with the addition of a bias, o form a rule as:

Block $k$ has rule

$r_k = w_{k,1} * a_{\beta_1} + w_{k, 2} * a_{\beta_2} + ... + w_{k,N} * a_{\beta_N} + b_k$

Where $a_{\beta_n}$ indicates the feature selected by neuron $n$, associated with the respective weight $w_{k,n}$.

Each rule passes through a sigmoid activation function, which we assume it captures the probability to respect the rule.

$p(r_k = 1) = Sigmoid(r_k)$

Each rule probability from $K$ blocks is then weighted by a single weight, called $W_{f_k}$ for probability of rule $k$. After assigning weight to each rule, all rules are summed:

$Out = W_{f_1} * p(r_1) + W_{f_2} * p(r_2) + ... + W_{f_K} * p(r_K)$

Result is finally passed through a sigmoid function to obtain probability of output being 1

$p(Out = 1) = Sigmoid(Out)$

### Dense Block

The dense block is set with N neurons, so, defined with $W$ as the weight matrix, it will have dimensions $m \times N$, with $m$ being the number of input features. In the forward pass, instead of having $m$ weights with different values, each neuron will have the weight with the highest absolute value that retains its value, while the other $m-1$ weights are set to zero. The output of a single neuron changes from:

$w_1 * a_1 + w_2 * a_2 + ... + w_m * a_m$

to

$w_i * a_i$

Where $i$ corresponds to the weight with the highest absolute value. However, it should be noted that during backpropagation, this modification is ignored, so it is possible that at some point during training, a weight that surpasses the absolute value of weight $i$ takes its place, while the latter becomes zero.


### Aggregation of the Dense Block: Rule

Through the aggregation of the output of the dense block, it is possible to obtain a rule of the form:

$r_k = w_{k,1} * a_{\beta_1} + w_{k,2} * a_{\beta_2} + ... + w_{k,N} * a_{\beta_N} + b_k$

Where $a_{\beta_n}$ indicates the feature selected by neuron $n$, associated with the respective weight $w_{k,n}$.

Each rule passes through a sigmoid activation function, which we assume it captures the probability to respect the rule.

$p(r_k = 1) = Sigmoid(r_k)$


### Weighting of Rules

At this point we have $K$ probabilities of comply respective rule. For each of these probabilities, a weight is assigned, called $W_{f_k}$ for rule $k$. 

$Out = W_{f_1} * p(r_1) + W_{f_2} * p(r_2) + ... + W_{f_K} * p(r_K)$

Notice we applied L1 regularization to rule's weights, such that rules adding little information or redundant rules are invited to be ignored (i.e. zero weight).


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import fede_code.torch_functions as tof
import fede_code.objects as obj
import fede_code.model_trainer as mt
import fede_code.graphics_functions as grafun
from torch.optim import AdamW, RMSprop
from sklearn import svm
from sklearn.metrics import f1_score, accuracy_score
import seaborn as sns
import matplotlib.pyplot as plt
from fede_code.loss_functions import penalizedLoss
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier, plot_tree
from tabulate import tabulate

## 1. HEARTH DATASET

In [2]:
data_path = 'C:\\Users\\drikb\\Desktop\\CRISP\\XAI project\\nn\\data\\heart.csv'

headers = ['age', 'sex', 'chest_pain', 'resting_blood_pressure',
           'serum_cholestoral', 'fasting_blood_sugar', 'resting_ecg_results',
           'max_heart_rate_achieved', 'exercise_induced_angina', 'oldpeak', "slope of the peak",
           'num_of_major_vessels', 'thal', 'heart_disease']

df_heart = pd.read_csv(data_path)

print(df_heart.head())

   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   63    1   3       145   233    1        0      150      0      2.3      0   
1   37    1   2       130   250    0        1      187      0      3.5      0   
2   41    0   1       130   204    0        0      172      0      1.4      2   
3   56    1   1       120   236    0        1      178      0      0.8      2   
4   57    0   0       120   354    0        1      163      1      0.6      2   

   ca  thal  target  
0   0     1       1  
1   0     2       1  
2   0     2       1  
3   0     2       1  
4   0     2       1  


In [3]:
x = df_heart.drop(columns=['target'])
y = df_heart['target'].values

hearth_x_train, hearth_x_test, hearth_y_train, hearth_y_test = train_test_split(x, y, test_size=0.3, random_state=42)

#standardize the dataset
sc = StandardScaler()
sc.fit(hearth_x_train)
hearth_x_train = sc.transform(hearth_x_train)
hearth_x_test = sc.transform(hearth_x_test)

In [4]:
hearth_train_set = mt.TrainDataset(torch.tensor(hearth_x_train, dtype=torch.float32),
                     torch.tensor(hearth_y_train, dtype=torch.float32))

hearth_test_set = mt.TrainDataset(torch.tensor(hearth_x_test, dtype=torch.float32),
                           torch.tensor(hearth_y_test, dtype=torch.float32))

hearth_train_loader = DataLoader(hearth_train_set, batch_size=32, shuffle=True)
hearth_test_loader = DataLoader(hearth_test_set, batch_size=32, shuffle=False)


## 2. PREDICTIVE MODELS

### 2.1 Support Vector Machine (SVM)

In [5]:
svm_clf = svm.SVC(kernel='linear')
svm_clf.fit(hearth_x_train,hearth_y_train)
predicted_svm = svm_clf.predict(hearth_x_test)
f1_0_svm, f1_1_svm = f1_score(hearth_y_test,predicted_svm,
                                 average = None,
                                zero_division = 0,
                                labels = np.array([0, 1]))
accuracy_svm = accuracy_score(hearth_y_test, predicted_svm)

print('f1 on 0 score SVM:', f1_1_svm)
print('f1 on 1 score SVM:', f1_0_svm)
print('accuracy SVM:', accuracy_svm)

f1 on 0 score SVM: 0.8316831683168315
f1 on 1 score SVM: 0.7901234567901235
accuracy SVM: 0.8131868131868132


In [6]:
# build the dataset of predictions to perform explanation
pred_set_svm = mt.TrainDataset(torch.tensor(hearth_x_test, dtype=torch.float32),
                               torch.tensor(predicted_svm, dtype=torch.float32))

pred_loader_svm = DataLoader(pred_set_svm, batch_size=16, shuffle=True)

### 2.2 Deep Neural Network (DNN)

In [7]:
class deepNN(nn.Module):
    def __init__(self, in_features):
        super(deepNN, self).__init__()
        self.linear_1 = nn.Linear(in_features, 128)
        self.act_1 = nn.LeakyReLU()
        self.linear_2 = nn.Linear(128, 64)
        self.act_2 = nn.LeakyReLU()
        self.linear_out = nn.Linear(64, 1)
        self.final_act = nn.Sigmoid()
        self.num_rules = 0 #<--- this allows to use model trainer, must be fixed

    def forward(self, x):
        x = self.linear_1(x)
        x = self.act_1(x)
        x = self.linear_2(x)
        x = self.act_2(x)
        x = self.linear_out(x)
        x = self.final_act(x)
        return x

In [8]:
dnn = deepNN(in_features = 13)

loss_fn = penalizedLoss(loss_fn = nn.BCELoss(reduction='sum'), #<--- reduction='sum' is mandatory
                        parameters = dnn.parameters(),
                        l1_lambda = 0.5,
                        l2_lambda = 0.5)

if torch.cuda.is_available():
    device = torch.device('cuda:0')
    print('GPU')
else:
    device = torch.device('cpu')
    print('CPU')


trainer = mt.modelTrainer(model=dnn,
                          loss_fn=loss_fn,
                          optimizer=AdamW)

trainer.train_model(train_data_loader=hearth_train_loader,
                    num_epochs=1000,
                    device=device,
                    learning_rate=1e-5,
                    display_log='epoch',
                   store_weights_history=False,
                   store_weights_grad=False)


CPU


  0%|          | 0/1000 [00:00<?, ?it/s]

In [9]:
loss_dnn, accuracy_dnn, f1_0_dnn, f1_1_dnn = trainer.eval_model(hearth_test_loader, device=device)
print(f'Average loss: {loss_dnn}')
print(f'Accuracy: {accuracy_dnn}')
print(f'F1 score on label 0: {f1_0_dnn}')
print(f'F1 score on label 1: {f1_1_dnn}')

  0%|          | 0/3 [00:00<?, ?it/s]

Average loss: 0.4506212590814947
Accuracy: 0.8461538461538461
F1 score on label 0: 0.815311004784689
F1 score on label 1: 0.8612993584530586


In [10]:
predicted_dnn = trainer.predict(hearth_test_loader, device=device, prob=False)

  0%|          | 0/3 [00:00<?, ?it/s]

In [11]:
# build the dataset of predictions to perform explanation
pred_set_dnn = mt.TrainDataset(torch.tensor(hearth_x_test, dtype=torch.float32),
                               torch.tensor(predicted_dnn.detach().numpy(), dtype=torch.float32))

pred_loader_dnn = DataLoader(pred_set_dnn, batch_size=16, shuffle=True)

## 3. EXPLANATION MODELS

### 3.1 LOGISTIC REGRESSION (LR)

In [12]:
log_reg = LogisticRegression()

log_reg_svm = log_reg.fit(hearth_x_test, predicted_svm)
log_reg_dnn = log_reg.fit(hearth_x_test, predicted_dnn)

log_reg_pred_svm = log_reg_svm.predict(hearth_x_test)
log_reg_pred_dnn = log_reg_dnn.predict(hearth_x_test)

In [13]:
# SVM
log_reg_accuracy_svm = (log_reg_pred_svm == predicted_svm).sum()/len(predicted_svm)
log_reg_f1_0_svm, log_reg_f1_1_svm = f1_score(predicted_svm,log_reg_pred_svm,
                                              average = None,
                                              zero_division = 0,
                                              labels = np.array([0, 1]))

# DNN
log_reg_accuracy_dnn = (log_reg_pred_dnn == predicted_dnn.detach().numpy()).sum()/len(predicted_dnn)
log_reg_f1_0_dnn, log_reg_f1_1_dnn = f1_score(predicted_dnn,log_reg_pred_dnn,
                                              average = None,
                                              zero_division = 0,
                                              labels = np.array([0, 1]))

print('Explanation accuracy on SVM:', log_reg_accuracy_svm)
print('Explanation f1 of 0 on SVM:', log_reg_f1_0_svm)
print('Explanation f1 of 1 on SVM:', log_reg_f1_1_svm)

print('Explanation accuracy on DNN:', log_reg_accuracy_dnn)
print('Explanation f1 of 0 on DNN:', log_reg_f1_0_dnn)
print('Explanation f1 of 1 on DNN:', log_reg_f1_1_dnn)

Explanation accuracy on SVM: 0.967032967032967
Explanation f1 of 0 on SVM: 0.9629629629629629
Explanation f1 of 1 on SVM: 0.9702970297029702
Explanation accuracy on DNN: 1.0
Explanation f1 of 0 on DNN: 1.0
Explanation f1 of 1 on DNN: 1.0


### 3.2 DECISION TREE (DT)

In [14]:
dt = DecisionTreeClassifier(criterion='gini',
                            max_depth = 4,
                            max_leaf_nodes = 16)

dt_svm = dt.fit(hearth_x_test, predicted_svm)
dt_dnn = dt.fit(hearth_x_test, predicted_dnn)

dt_pred_svm = dt_svm.predict(hearth_x_test)
dt_pred_dnn = dt_dnn.predict(hearth_x_test)

In [15]:
# SVM
dt_accuracy_svm = (dt_pred_svm == predicted_svm).sum()/len(predicted_svm)
dt_f1_0_svm, dt_f1_1_svm = f1_score(predicted_svm,dt_pred_svm,
                                              average = None,
                                              zero_division = 0,
                                              labels = np.array([0, 1]))

# DNN
dt_accuracy_dnn = (dt_pred_dnn == predicted_dnn.detach().numpy()).sum()/len(predicted_dnn)
dt_f1_0_dnn, dt_f1_1_dnn = f1_score(predicted_dnn,dt_pred_dnn,
                                              average = None,
                                              zero_division = 0,
                                              labels = np.array([0, 1]))

print('Explanation accuracy on SVM:', dt_accuracy_svm)
print('Explanation f1 of 0 on SVM:', dt_f1_0_svm)
print('Explanation f1 of 1 on SVM:', dt_f1_1_svm)

print('Explanation accuracy on DNN:', dt_accuracy_dnn)
print('Explanation f1 of 0 on DNN:', dt_f1_0_dnn)
print('Explanation f1 of 1 on DNN:', dt_f1_1_dnn)

Explanation accuracy on SVM: 0.967032967032967
Explanation f1 of 0 on SVM: 0.9629629629629629
Explanation f1 of 1 on SVM: 0.9702970297029702
Explanation accuracy on DNN: 0.9560439560439561
Explanation f1 of 0 on DNN: 0.9512195121951219
Explanation f1 of 1 on DNN: 0.96


### ONE HOT NEURAL NETWORK (OHNN)

In [16]:
ohnn_svm = obj.oneHotRuleNN(input_size=13,
                        num_neurons=4, #tree max depth
                        num_rules=16, #tree max leaves
                        final_activation=nn.Sigmoid,
                        rule_weight_constraint=tof.getMax,
                        out_weight_constraint=None,
                        rule_bias=True,
                        neuron_bias=False,
                        rule_activation=nn.Sigmoid,
                        neuron_activation=None,
                        out_bias=False,
                        force_positive_hidden_init=False,
                        force_positive_out_init=False,
                        dtype=torch.float32)

#loss_fn = nn.BCELoss(reduction='sum')
loss_fn = penalizedLoss(loss_fn = nn.BCELoss(reduction='sum'), #<--- reduction='sum' is mandatory
                        parameters = ohnn_svm.rule_weight.linear.weight,
                        l1_lambda = 0.3,
                        l2_lambda = 0.0)

if torch.cuda.is_available():
    device = torch.device('cuda:0')
    print('GPU')
else:
    device = torch.device('cpu')
    print('CPU')


trainer_svm = mt.modelTrainer(model=ohnn_svm,
                          loss_fn=loss_fn,
                          optimizer=AdamW)

trainer_svm.train_model(train_data_loader=pred_loader_svm,
                    num_epochs=1000,
                    device=device,
                    learning_rate=1e-3,
                    display_log='epoch')


CPU


  0%|          | 0/1000 [00:00<?, ?it/s]

In [17]:
ohnn_dnn = obj.oneHotRuleNN(input_size=13,
                        num_neurons=4, #tree max depth
                        num_rules=16, #tree max leaves
                        final_activation=nn.Sigmoid,
                        rule_weight_constraint=tof.getMax,
                        out_weight_constraint=None,
                        rule_bias=True,
                        neuron_bias=False,
                        rule_activation=nn.Sigmoid,
                        neuron_activation=None,
                        out_bias=False,
                        force_positive_hidden_init=False,
                        force_positive_out_init=False,
                        dtype=torch.float32)

#loss_fn = nn.BCELoss(reduction='sum')
loss_fn = penalizedLoss(loss_fn = nn.BCELoss(reduction='sum'), #<--- reduction='sum' is mandatory
                        parameters = ohnn_dnn.rule_weight.linear.weight,
                        l1_lambda = 0.3,
                        l2_lambda = 0.0)

if torch.cuda.is_available():
    device = torch.device('cuda:0')
    print('GPU')
else:
    device = torch.device('cpu')
    print('CPU')


trainer_dnn = mt.modelTrainer(model=ohnn_dnn,
                          loss_fn=loss_fn,
                          optimizer=AdamW)

trainer_dnn.train_model(train_data_loader=pred_loader_dnn,
                    num_epochs=1000,
                    device=device,
                    learning_rate=1e-3,
                    display_log='epoch')

CPU


  0%|          | 0/1000 [00:00<?, ?it/s]

In [18]:
ohnn_loss_svm, ohnn_accuracy_svm, ohnn_f1_0_svm, ohnn_f1_1_svm = trainer_svm.eval_model(pred_loader_svm, device='cpu')
ohnn_loss_dnn, ohnn_accuracy_dnn, ohnn_f1_0_dnn, ohnn_f1_1_dnn = trainer_dnn.eval_model(pred_loader_dnn, device='cpu')

print('Explanation accuracy on SVM:', ohnn_accuracy_svm)
print('Explanation f1 of 0 on SVM:', ohnn_f1_0_svm)
print('Explanation f1 of 1 on SVM:', ohnn_f1_1_svm)

print('Explanation accuracy on DNN:', ohnn_accuracy_dnn)
print('Explanation f1 of 0 on DNN:', ohnn_f1_0_dnn)
print('Explanation f1 of 1 on DNN:', ohnn_f1_1_dnn)

  0%|          | 0/6 [00:00<?, ?it/s]

  0%|          | 0/6 [00:00<?, ?it/s]

Explanation accuracy on SVM: 1.0
Explanation f1 of 0 on SVM: 1.0
Explanation f1 of 1 on SVM: 1.0
Explanation accuracy on DNN: 0.967032967032967
Explanation f1 of 0 on DNN: 0.9633699633699634
Explanation f1 of 1 on DNN: 0.9727095516569202


## 4. RESULTS

In [24]:
scores = {
    'Logistic regression': {
        'SVM': {'Accuracy': log_reg_accuracy_svm, 'F1 Score (Class 0)': log_reg_f1_0_svm, 'F1 Score (Class 1)': log_reg_f1_1_svm},
        'DNN': {'Accuracy': log_reg_accuracy_dnn, 'F1 Score (Class 0)': log_reg_f1_0_dnn, 'F1 Score (Class 1)': log_reg_f1_1_dnn},
    },
    'Decision tree': {
        'SVM': {'Accuracy': dt_accuracy_svm, 'F1 Score (Class 0)': dt_f1_0_svm, 'F1 Score (Class 1)': dt_f1_1_svm},
        'DNN': {'Accuracy': dt_accuracy_dnn, 'F1 Score (Class 0)': dt_f1_0_dnn, 'F1 Score (Class 1)': dt_f1_1_dnn},
    },
    'OHNN': {
        'SVM': {'Accuracy': ohnn_accuracy_svm, 'F1 Score (Class 0)': ohnn_f1_0_svm, 'F1 Score (Class 1)': ohnn_f1_1_svm},
        'DNN': {'Accuracy': ohnn_accuracy_dnn, 'F1 Score (Class 0)': ohnn_f1_0_dnn, 'F1 Score (Class 1)': ohnn_f1_1_dnn},
    }
}

In [25]:
df_scores = pd.DataFrame.from_dict({(i, j): scores[i][j] 
                              for i in scores.keys() 
                              for j in scores[i].keys()}, orient='index')

# Rename index and columns
df_scores.index.names = ['Model', 'Explained Predictions']
df_scores.reset_index(inplace=True)

# Format the DataFrame to round the values to a certain number of decimal places (e.g., 2)
df_scores = df_scores.round(4)

# Print the table using tabulate
table = tabulate(df_scores, headers='keys', tablefmt='pretty', showindex=False)

print(table)

+---------------------+-----------------------+----------+--------------------+--------------------+
|        Model        | Explained Predictions | Accuracy | F1 Score (Class 0) | F1 Score (Class 1) |
+---------------------+-----------------------+----------+--------------------+--------------------+
| Logistic regression |          SVM          |  0.967   |       0.963        |       0.9703       |
| Logistic regression |          DNN          |   1.0    |        1.0         |        1.0         |
|    Decision tree    |          SVM          |  0.967   |       0.963        |       0.9703       |
|    Decision tree    |          DNN          |  0.956   |       0.9512       |        0.96        |
|        OHNN         |          SVM          |   1.0    |        1.0         |        1.0         |
|        OHNN         |          DNN          |  0.967   |       0.9634       |       0.9727       |
+---------------------+-----------------------+----------+--------------------+------------