# Title: Wisconsin Breast Cancer Database (January 8, 1991)

This breast cancer databases was obtained from the University of Wisconsin
   Hospitals, Madison from Dr. William H. Wolberg.  If you publish results
   when using this database, then please include this information in your
   acknowledgements.  Also, please cite one or more of:

   1. O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via linear 
      programming", SIAM News, Volume 23, Number 5, September 1990, pp 1 & 18.

   2. William H. Wolberg and O.L. Mangasarian: "Multisurface method of 
      pattern separation for medical diagnosis applied to breast cytology", 
      Proceedings of the National Academy of Sciences, U.S.A., Volume 87, 
      December 1990, pp 9193-9196.

   3. O. L. Mangasarian, R. Setiono, and W.H. Wolberg: "Pattern recognition 
      via linear programming: Theory and application to medical diagnosis", 
      in: "Large-scale numerical optimization", Thomas F. Coleman and Yuying
      Li, editors, SIAM Publications, Philadelphia 1990, pp 22-30.

   4. K. P. Bennett & O. L. Mangasarian: "Robust linear programming 
      discrimination of two linearly inseparable sets", Optimization Methods
      and Software 1, 1992, 23-34 (Gordon & Breach Science Publishers).

2. Sources:
   -- Dr. WIlliam H. Wolberg (physician)
      University of Wisconsin Hospitals
      Madison, Wisconsin
      USA
   -- Donor: Olvi Mangasarian (mangasarian@cs.wisc.edu)
      Received by David W. Aha (aha@cs.jhu.edu)
   -- Date: 15 July 1992

3. Past Usage:

   Attributes 2 through 10 have been used to represent instances.
   Each instance has one of 2 possible classes: benign or malignant.

   1. Wolberg,~W.~H., \& Mangasarian,~O.~L. (1990). Multisurface method of 
      pattern separation for medical diagnosis applied to breast cytology. In
      {\it Proceedings of the National Academy of Sciences}, {\it 87},
      9193--9196.
      -- Size of data set: only 369 instances (at that point in time)
      -- Collected classification results: 1 trial only
      -- Two pairs of parallel hyperplanes were found to be consistent with
         50% of the data
         -- Accuracy on remaining 50% of dataset: 93.5%
      -- Three pairs of parallel hyperplanes were found to be consistent with
         67% of data
         -- Accuracy on remaining 33% of dataset: 95.9%

   2. Zhang,~J. (1992). Selecting typical instances in instance-based
      learning.  In {\it Proceedings of the Ninth International Machine
      Learning Conference} (pp. 470--479).  Aberdeen, Scotland: Morgan
      Kaufmann.
      -- Size of data set: only 369 instances (at that point in time)
      -- Applied 4 instance-based learning algorithms 
      -- Collected classification results averaged over 10 trials
      -- Best accuracy result: 
         -- 1-nearest neighbor: 93.7%
         -- trained on 200 instances, tested on the other 169
      -- Also of interest:
         -- Using only typical instances: 92.2% (storing only 23.1 instances)
         -- trained on 200 instances, tested on the other 169

# Import libraries

In [None]:
pip install pyod

In [None]:
pip install mlconfig

In [None]:
pip install segmentation_models_3D

In [None]:
!pip install git+https://github.com/jonbarron/robust_loss_pytorch


In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np 
import pandas as pd 
import seaborn as sns; sns.set()
import numpy as np
import pandas as pd
import torch
import mlconfig
import random as rn
import keras
import segmentation_models_3D as sm
import robust_loss_pytorch 
import shutil


from pathlib import Path
from time import strftime
from sklearn.metrics import accuracy_score, precision_score, recall_score, precision_recall_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import normalize
from tensorflow.keras import layers, losses
from tensorflow.keras.models import Model
from sklearn.metrics import (confusion_matrix, precision_recall_curve, auc,
                             roc_curve, recall_score, classification_report, f1_score,
                             precision_recall_fscore_support)
from sklearn.metrics import confusion_matrix,accuracy_score,cohen_kappa_score,roc_auc_score,f1_score,auc

Achtung: Alle Dateien wurden in einem Ordner auf Drive gespeichert. Um den Code zu replizieren empfehle ich, ebenfalls einen Ordner auf Drive zu erstellen und folgendermaßen das working directory anzupassen:

```
from google.colab import drive
drive.mount('/content/drive')
```

```
cd /content/drive/MyDrive/<specify here your path>
```



In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import sys

assert sys.version_info >= (3, 7) #python 3.7 or higher recommended

In [None]:
from packaging import version
import sklearn

assert version.parse(sklearn.__version__) >= version.parse("1.0.1") #sklearn 1.01 or higher remmonded

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers


assert version.parse(tf.__version__) >= version.parse("2.8.0") #tensorflow 2.8.0 or higher remmonded 

Standardeinstellungen für Plots anpassen

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.rc('font', size=14)
plt.rc('axes', labelsize=14, titlesize=14)
plt.rc('legend', fontsize=14)
plt.rc('xtick', labelsize=10)
plt.rc('ytick', labelsize=10)

In [None]:
cd /content/drive/MyDrive/Seminar2/ADAPL #specify here your path

In [None]:
ls

Erstellen eines Bildordners zum Sammeln aller Bildausgaben

Erstellen Sie eine Funktion zum automatischen Speichern eines Plots in einem bestimmten Pfad.

In [None]:
from pathlib import Path

IMAGES_PATH = Path() / "images" / "breastW"
IMAGES_PATH.mkdir(parents=True, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = IMAGES_PATH / f"{fig_id}.{fig_extension}"
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

In [None]:
ls

In [None]:
tf.keras.backend.clear_session()

# Data Wrapping 

Relevant Information:

   Samples arrive periodically as Dr. Wolberg reports his clinical cases.
   The database therefore reflects this chronological grouping of the data.
   This grouping information appears immediately below, having been removed
   from the data itself:

     Group 1: 367 instances (January 1989)
     Group 2:  70 instances (October 1989)
     Group 3:  31 instances (February 1990)
     Group 4:  17 instances (April 1990)
     Group 5:  48 instances (August 1990)
     Group 6:  49 instances (Updated January 1991)
     Group 7:  31 instances (June 1991)
     Group 8:  86 instances (November 1991)
     -----------------------------------------
     Total:   699 points (as of the donated datbase on 15 July 1992)

Number of Instances: 699 (as of 15 July 1992)

Number of Attributes: 10 plus the class attribute

Attribute Information: (class attribute has been moved to last column)

   #  Attribute                     Domain
   -- -----------------------------------------
   1. Sample code number            id number
   2. Clump Thickness               1 - 10
   3. Uniformity of Cell Size       1 - 10
   4. Uniformity of Cell Shape      1 - 10
   5. Marginal Adhesion             1 - 10
   6. Single Epithelial Cell Size   1 - 10
   7. Bare Nuclei                   1 - 10
   8. Bland Chromatin               1 - 10
   9. Normal Nucleoli               1 - 10
  10. Mitoses                       1 - 10
  11. Class:                        (2 for benign, 4 for malignant)

In [None]:
PATH = '/content/drive/MyDrive/Seminar2/ADAPL/breastW_data/breast-cancer-wisconsin.txt'
df = pd.read_csv(PATH)
df.head()

In [None]:
df.describe()

In [None]:
df.info()

# Data Cleaning

Drop unnecessary columns

In [None]:
drop_cols = ["Sample code number"]
df.drop(drop_cols, axis=1, inplace=True)

Convert to dummy variable

In [None]:
rep_dict = {2: 0.0, 4: 1.0}
df['class outlier'].replace(rep_dict, inplace=True);

In [None]:
print(f'Data size is {df.shape}.')

In [None]:
df.dtypes

Missing attribute values: 16

There are 16 instances in Groups 1 to 6 that contain a single missing (i.e., unavailable) attribute value, now denoted by "?" in column "Bare Nuclei"


In [None]:
df.loc[df['Bare Nuclei'] == "?"]

In [None]:
df.drop(df.loc[df['Bare Nuclei']=="?"].index, inplace=True)

In [None]:
df['Bare Nuclei'] = pd.to_numeric(df['Bare Nuclei'])

In [None]:
print(f'Data size is {df.shape}.')

In [None]:
#check for NA-Values 
df.isnull().sum()

# EDA and Visualization

In [None]:
#print boxplot for each variable
fig, axs = plt.subplots(ncols=5, nrows=2, figsize=(20, 10))
index = 0
axs = axs.flatten()
for k,v in df.items():
    sns.boxplot(y=k, data=df, ax=axs[index])
    index += 1
save_fig("boxplots")
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=5.0)

In [None]:
for k, v in df.items():
        q1 = v.quantile(0.25)
        q3 = v.quantile(0.75)
        irq = q3 - q1
        v_col = v[(v <= q1 - 1.5 * irq) | (v >= q3 + 1.5 * irq)]
        perc = np.shape(v_col)[0] * 100.0 / np.shape(df)[0]
        print("Column %s outliers = %.2f%%" % (k, perc))

In [None]:
#print distributions
fig, axs = plt.subplots(ncols=5, nrows=2, figsize=(20, 10))
index = 0
axs = axs.flatten()
for k,v in df.items():
    sns.distplot(v, ax=axs[index])
    index += 1
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=5.0)
save_fig("distrubutions")

Class distribution:
 
   Benign: 458 (65.5%)
   Malignant: 241 (34.5%)

In [None]:
df['class outlier'].value_counts().plot(kind='bar', figsize=(8, 4));
plt.title('Distrubution of outliers');
plt.xlabel('Diagnosis');
plt.ylabel('Frequency');
plt.xticks([0.0, 1.0], ['Benign', 'Malignant'], rotation=45);
save_fig("distrubution outliers")

In [None]:
from sklearn.manifold import TSNE
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import seaborn as sn

sns.set(style='whitegrid', context='notebook')

def tsne_scatter(features, labels, dimensions=2, save_as='graph.png', RANDOM_SEED = 42):
    if dimensions not in (2, 3):
        raise ValueError('tsne_scatter can only plot in 2d or 3d')

    # t-SNE dimensionality reduction
    features_embedded = TSNE(n_components=dimensions, random_state=RANDOM_SEED).fit_transform(features)
    
    # initialising the plot
    fig, ax = plt.subplots(figsize=(8,8))
    
    # counting dimensions
    if dimensions == 3: ax = fig.add_subplot(111, projection='3d')

    # plotting data
    ax.scatter(
        *zip(*features_embedded[np.where(labels==1)]),
        marker='o',
        color='r',
        s=2,
        alpha=0.7,
        label='Bösartig'
    )
    ax.scatter(
        *zip(*features_embedded[np.where(labels==0)]),
        marker='o',
        color='g',
        s=2,
        alpha=0.3,
        label='Gutartig'
    )

    # storing it to be displayed later
    sns.set(style='whitegrid', context='notebook')
    save_fig(save_as)
    plt.show;

In [None]:
df['anomaly'] = df['class outlier'] == 1.0
anomaly = df[df['anomaly'] == True]
normal = df[df['anomaly'] == False]

In [None]:
sns.distplot(normal);
sns.distplot(anomaly);

plt.title('normal vs  anomaly Dist.');
plt.ylabel('Dist.');
save_fig("normal vs anomaly distrubution")

# Normalize The Data

In [None]:
data = df.iloc[:, 0:11]
data

In [None]:
# The last element contains the labels
labels_bool = df.iloc[:,-1]
labels  = df.iloc[:,-2]
# The other data points are the features
data = df.iloc[:, 0:9]

data = normalize(data) #normalize data
n_features = 9

In [None]:
tsne_scatter(data, labels, dimensions=2, save_as='tsne_initial_2d')
tsne_scatter(data, labels, dimensions=3, save_as='tsne_initial_3d')

# Split the data

Generiere Trainings- und Testdaten

In [None]:
train_data, test_data, train_labels, test_labels = train_test_split(
    data, labels_bool,
    test_size=0.2,
    
)

Generiere das Validation-Set

In [None]:
train_data_no_validation, train_data_validation, train_data_no_validation_labels, train_data_validation_labels = train_test_split(
    train_data,
    train_labels,
    test_size = 0.2,
)
# only train_data_validation and train_data_validation_labels needed

In [None]:
anomalous_train_data = train_data[train_labels] #training data with outlier-label == True
anomalous_test_data = test_data[test_labels] #test data with outlier-label == True

normal_train_data = train_data[~train_labels] #training data with outlier-label == False
normal_test_data = test_data[~test_labels] #test data with outlier-label == False

In [None]:
for k in [train_data, train_labels, train_data_validation, train_data_validation_labels]: 
 print(k.shape)

In [None]:
for k in [train_data, test_data, train_labels, test_labels]: 
 print(k.shape)

In [None]:
for k in [anomalous_train_data, anomalous_test_data, normal_train_data, normal_test_data]: 
 print(k.shape)

In [None]:
print("Anomaly Share for training data equals ={:10.4f}".format(len(anomalous_train_data)/len(train_data)))
print("Anomaly Share for test data equals ={:10.4f}".format(len(anomalous_test_data)/len(test_data)))

# Train the model with train & validation data

## Randomized autoencoders

Die Implementierung für die RandAE-Klasse stammt aus folgendem [Repository](https://github.com/danieltsoukup/autoencoders):

Aktivierungsfunktionen

*   Hidden Layer: Relu
*   Output Layer: Sigmoid

Es wird ein Drop_ratio von 0.5 festgelegt. Ein Drop_ratio von 0.0 entspricht einer *fully connected architecture*



In [None]:
class RandAE(tf.keras.Sequential):
    def __init__(self, input_dim, hidden_dims, drop_ratio=0.5, **kwargs):
        super(RandAE, self).__init__(**kwargs)
        
        self.input_dim = input_dim
        self.hidden_dims = hidden_dims
        self.drop_ratio = drop_ratio
        
        self.layer_masks = dict()
        
        self.build_model()
                
    def build_model(self) -> None:
        """
        Adds the layers and records masks.
        """
        
        self.add(layers.Input(self.input_dim, name="input"))
        
        for i, dim in enumerate(self.hidden_dims):
            layer_name = f"hidden_{i}"
            layer = layers.Dense(dim, 
                                 activation="relu" if i > 0 else "sigmoid", 
                                 name=layer_name)
            self.add(layer)
            
            # add layer mask
            self.layer_masks[layer_name] = self.get_mask(layer)
        
        layer_name = "output"
        output_layer = layers.Dense(self.input_dim, activation="sigmoid", name=layer_name)
        self.add(output_layer)
        self.layer_masks[layer_name] = self.get_mask(output_layer)
            
    def get_mask(self, layer) -> np.ndarray:
        """
        Build mask for a layer.
        """
        
        shape = layer.input_shape[1], layer.output_shape[1]
        
        return np.random.choice([0., 1.], size=shape, p=[self.drop_ratio, 1-self.drop_ratio])
        
    def load_masks(self, mask_pickle_path) -> None:
        """
        Load the masks from a pickled dictionary.
        """
        
        with open(mask_pickle_path, 'rb') as handle:
            self.layer_masks = pickle.load(handle)    
            
    def get_encoder(self) -> keras.Sequential:
        """
        Get the encoder from the full model.
        """
        
        n_layers = (len(self.hidden_dims)+1)//2
        encoder_layers = [layers.Input(self.input_dim)] + self.layers[:n_layers]

        return keras.Sequential(encoder_layers)
        
    
    def mask_weights(self) -> None:
        """
        Apply the masks to each layer in the encoder and decoder.
        """
        
        for layer in self.layers:
            layer_name = layer.name
            if layer_name in self.layer_masks:
                masked_w = layer.weights[0].numpy()*self.layer_masks[layer_name]
                b = layer.weights[1].numpy()
                layer.set_weights((masked_w, b))        

    def call(self, data, training=True) -> tf.Tensor:
        
        # mask the weights before original forward pass
        self.mask_weights()
        
        return super().call(data)

In [None]:
model_mse = RandAE(9,[24,12,6,3,6,12,24])
model_mae = RandAE(9,[24,12,6,3,6,12,24])
model_ce = RandAE(9,[24,12,6,3,6,12,24])
model_focal = RandAE(9,[24,12,6,3,6,12,24])
model_huber = RandAE(9,[24,12,6,3,6,12,24])

## Training

In [None]:
#load checkpoints
# checkpoint_mse = tf.keras.callbacks.ModelCheckpoint("checkpoint_mse",
#                                                    save_weights_only=True) #generate checkpoints


#define early stopping
early_stopping_mse = tf.keras.callbacks.EarlyStopping(patience=20,
                                                     restore_best_weights=True) 



                                                                                                                                                       
model_mse.compile(loss="mean_squared_error", optimizer="adam", run_eagerly=True)


history_mse = model_mse.fit(train_data,
                    train_data,
                    batch_size=16,
                    epochs = 1000,
                    validation_data=(train_data_validation, train_data_validation),
                    callbacks=[
                        # checkpoint_mse,
                        early_stopping_mse,
                        ],
                    verbose = 0,
                    validation_batch_size = 16,
                    shuffle=True)
model_mse.summary()

In [None]:
# checkpoint_mae = tf.keras.callbacks.ModelCheckpoint("checkpoint_mae",
#                                                    save_weights_only=True) #generate checkpoints
early_stopping_mae = tf.keras.callbacks.EarlyStopping(patience=20,
                                                     restore_best_weights=True) #define early stopping                                                  
model_mae.compile(loss="mean_absolute_error", optimizer="adam", run_eagerly=True)
history_mae = model_mae.fit(train_data,
                    train_data,
                    batch_size=16,
                    epochs = 1000,
                    validation_data=(train_data_validation, train_data_validation),
                    callbacks=[
                        # checkpoint_mae,
                        early_stopping_mae],
                    verbose = 0,
                    validation_batch_size = 16,
                    shuffle=True)
model_mae.summary()

In [None]:

# checkpoint_ce = tf.keras.callbacks.ModelCheckpoint("checkpoint_ce",
#                                                    save_weights_only=True) #generate checkpoints
early_stopping_ce = tf.keras.callbacks.EarlyStopping(patience=20,
                                                     restore_best_weights=True) #define early stopping  

model_ce.compile(loss="binary_crossentropy", optimizer="adam",  run_eagerly=True)

history_ce = model_ce.fit(train_data,
                    train_data,
                    batch_size=16,
                    epochs = 1000,
                    validation_data=(train_data_validation, train_data_validation),
                    callbacks=[
                        # checkpoint_ce,
                        early_stopping_ce],
                    verbose = 0,
                    validation_batch_size = 16,
                    shuffle=True)
model_ce.summary()

Der Focal Loss wird mit γ = 2 initalisiert.

[Quelle](https://github.com/qubvel/segmentation_models/blob/master/segmentation_models/losses.py)

In [None]:
focal_loss = sm.losses.BinaryFocalLoss()
# checkpoint_focal = tf.keras.callbacks.ModelCheckpoint("checkpoint_focal",
#                                                    save_weights_only=True) #generate checkpoints
early_stopping_focal = tf.keras.callbacks.EarlyStopping(patience=20,
                                                     restore_best_weights=True) #define early stopping  

model_focal.compile(loss=focal_loss, optimizer="adam",  run_eagerly=True)


history_focal = model_focal.fit(train_data,
                                train_data,
                                batch_size=16,
                                epochs = 1000,
                    validation_data=(train_data_validation, train_data_validation),
                    callbacks=[
                        # checkpoint_focal,
                        early_stopping_focal],
                    verbose = 0,
                    validation_batch_size = 16,
                    shuffle=True)
model_focal.summary()

Huber wird mit δ = 0,5 initalisiert

In [None]:
huber = tf.keras.losses.Huber(delta= 0.5, reduction="auto", name="huber_loss")
# checkpoint_huber = tf.keras.callbacks.ModelCheckpoint("checkpoint_huber",
#                                                    save_weights_only=True) #generate checkpoints
early_stopping_huber = tf.keras.callbacks.EarlyStopping(patience=20,
                                                     restore_best_weights=True) #define early stopping  

model_huber.compile(loss=huber, optimizer="adam",  run_eagerly=True) 

history_huber = model_huber.fit(train_data,
                    train_data,
                    batch_size=16,
                    epochs = 1000,
                    validation_data=(train_data_validation, train_data_validation),
                    callbacks=[
                        #checkpoint_huber,
                        early_stopping_huber],
                    verbose = 0,
                    validation_batch_size = 16,
                    shuffle=True)
model_huber.summary()

Plotte den Trainingsverlauf für jedes Modell

In [None]:
def plot_training_performance():
    '''
    Plot the training performance of each model 
    '''
    figure, axis = plt.subplots(ncols=1,nrows=5, figsize=(10, 20))
    # first row: complicated architecture
    axis[0].plot(history_mse.history["loss"], label="Training Loss")
    axis[0].plot(history_mse.history["val_loss"], label="Validation Loss")
    axis[0].set_title('Performance Training with model_mse')
    axis[0].set(xlabel='Epochs', ylabel='Loss')
    axis[0].legend()
    
    axis[1].plot(history_mae.history["loss"], label="Training Loss")
    axis[1].plot(history_mae.history["val_loss"], label="Validation Loss")
    axis[1].set_title('Performance Training with model_mae')
    axis[1].set(xlabel='Epochs', ylabel='Loss')
    axis[1].legend()
    
    axis[2].plot(history_huber.history["loss"], label="Training Loss")
    axis[2].plot(history_huber.history["val_loss"], label="Validation Loss")
    axis[2].set_title('Performance Training with model_huber')
    axis[2].set(xlabel='Epochs', ylabel='Loss')
    axis[2].legend()
    
    #second row: simple architectures 
    axis[3].plot(history_ce.history["loss"], label="Training Loss")
    axis[3].plot(history_ce.history["val_loss"], label="Validation Loss")
    axis[3].set_title('Performance Training by model_ce')
    axis[3].set(xlabel='Epochs', ylabel='Loss')
    axis[3].legend()
    
    axis[4].plot(history_focal.history["loss"], label="Training Loss")
    axis[4].plot(history_focal.history["val_loss"], label="Validation Loss")
    axis[4].set_title('Performance Training by model_focal')
    axis[4].set(xlabel='Epochs', ylabel='Loss')
    axis[4].legend()

    save_fig("Performances for training")
    plt.show()

plot_training_performance()

Diese Funktion extrahiert die letzte ausgeführte Epoche für ein Model

In [None]:
def get_last_epoch(history, max = "max") -> int:
  epochs = np.array([])
  stopped_epoch = np.array([])
  for key, value in enumerate(history.history["loss"]):
    epochs = np.append(epochs,key)
  stopped_epoch = np.append(stopped_epoch,int(np.max(epochs)))
  if max == "max":
    return int(np.max(epochs))
  else:
    return epochs


In [None]:
for history in [
history_mse,
    history_mae,
    history_ce,
    history_focal,
    history_huber ]:
    print(get_last_epoch(history))

Plotte den TNSE-Graph für den latenten Raum in einem Modell

In [None]:
def plot_tnse_performance_bottleneck_layer():
    '''
    Plot the training performance of each model 
    '''
    figure, axis = plt.subplots(ncols=1,nrows=5, figsize=(10, 20))
    # first row: complicated architecture
    encoder = model_mse.get_encoder()
    vis_data_latent = encoder.predict(train_data)
    tsne = TSNE(n_components=2)
    tsne_data = tsne.fit_transform(vis_data_latent)
    axis[0].scatter(tsne_data[train_labels == 0, 0], 
            tsne_data[train_labels == 0, 1], c="grey", alpha=0.1, label="inlier")
    axis[0].scatter(tsne_data[train_labels == 1, 0], 
            tsne_data[train_labels == 1, 1], c="crimson", alpha=1, label="outlier")
    axis[0].set_title('t-SNE for outliers/inliers on the latent manifold for model_mse')
    axis[0].legend()
    
    encoder = model_mae.get_encoder()
    vis_data_latent = encoder.predict(train_data)
    tsne = TSNE(n_components=2)
    tsne_data = tsne.fit_transform(vis_data_latent)
    axis[1].scatter(tsne_data[train_labels == 0, 0], 
            tsne_data[train_labels == 0, 1], c="grey", alpha=0.1, label="inlier")
    axis[1].scatter(tsne_data[train_labels == 1, 0], 
            tsne_data[train_labels == 1, 1], c="crimson", alpha=1, label="outlier")
    axis[1].set_title('t-SNE for outliers/inliers on the latent manifold for model_mae')
    axis[1].legend()
    
    encoder = model_huber.get_encoder()
    vis_data_latent = encoder.predict(train_data)
    tsne = TSNE(n_components=2)
    tsne_data = tsne.fit_transform(vis_data_latent)
    axis[2].scatter(tsne_data[train_labels == 0, 0], 
            tsne_data[train_labels == 0, 1], c="grey", alpha=0.1, label="inlier")
    axis[2].scatter(tsne_data[train_labels == 1, 0], 
            tsne_data[train_labels == 1, 1], c="crimson", alpha=1, label="outlier")
    axis[2].set_title('t-SNE for outliers/inliers on the latent manifold for model_huber')
    axis[2].legend()
    
    encoder = model_ce.get_encoder()
    vis_data_latent = encoder.predict(train_data)
    tsne = TSNE(n_components=2)
    tsne_data = tsne.fit_transform(vis_data_latent)
    axis[3].scatter(tsne_data[train_labels == 0, 0], 
            tsne_data[train_labels == 0, 1], c="grey", alpha=0.1, label="inlier")
    axis[3].scatter(tsne_data[train_labels == 1, 0], 
            tsne_data[train_labels == 1, 1], c="crimson", alpha=1, label="outlier")
    axis[3].set_title('t-SNE for outliers/inliers on the latent manifold for model_ce')
    axis[3].legend()
    
    encoder = model_focal.get_encoder()
    vis_data_latent = encoder.predict(train_data)
    tsne = TSNE(n_components=2)
    tsne_data = tsne.fit_transform(vis_data_latent)
    axis[4].scatter(tsne_data[train_labels == 0, 0], 
            tsne_data[train_labels == 0, 1], c="grey", alpha=0.1, label="inlier")
    axis[4].scatter(tsne_data[train_labels == 1, 0], 
            tsne_data[train_labels == 1, 1], c="crimson", alpha=1, label="outlier")
    axis[4].set_title('t-SNE for outliers/inliers on the latent manifold for model_focal')
    axis[4].legend()

    save_fig("TNSE-Distrubution in bottleneck-layer for all models")
    plt.show()

plot_tnse_performance_bottleneck_layer()

# Evaluation on test data

## Generiere den Rekonstruktionsfehler

Um den Rekonstruktionsfehler zu berechnen, wird als Metrik der MSE verwendet.

In [None]:
def gen_error_df(model, metric:str = "mean_squared_error", data = test_data):
  if model in [model_mse,model_mae,model_ce,model_focal,model_huber]:
    print("Calculate reconstruction error for model: ", model)
    if metric == "mean_squared_error":
      mse = np.mean(np.power(data - model.predict(data), 2), axis=1)
      return(pd.DataFrame({'reconstruction_error': mse,
                              'anomaly': test_labels}))
    else:
      print("No further metric functions are implemented yet")
      return
  else:
    raise("Choosen model is not supported yet. Please choose an model in [model_mse,model_mae,model_ce,model_focal,model_huber]")




In [None]:
mse_error_df = gen_error_df(model_mse)
mae_error_df = gen_error_df(model_mae)
ce_error_df = gen_error_df(model_ce)
focal_error_df = gen_error_df(model_focal)
huber_error_df = gen_error_df(model_huber)

## ROC und AUC

Visualisiere das Ergebnis für die Testdaten für alle Modelle

In [None]:
def plot_test_performance_roc(error_df = [mse_error_df,
                                          mae_error_df,
                                          huber_error_df,
                                          ce_error_df,
                                          focal_error_df]):
    '''
    Plot the ROC curve performance of each model 
    '''
    figure, axis = plt.subplots(ncols=1,nrows=5, figsize=(10, 20))
    # first row: complicated architecture
    fpr, tpr, thresholds = roc_curve(error_df[0].anomaly, error_df[0].reconstruction_error)
    roc_auc = auc(fpr, tpr)
    axis[0].plot(fpr, tpr, label='AUC = %0.4f'% roc_auc)
    axis[0].plot([0,1],[0,1],'r--')
    axis[0].legend(loc='lower right')
    axis[0].set_title('Receiver Operating Characteristic for Model_MSE')
    axis[0].set_xlim([-0.001, 1])
    axis[0].set_ylim([0, 1.001])
    axis[0].set(xlabel='False Positive Rate', ylabel='True Positive Rate')
    axis[0].legend()
    
    fpr, tpr, thresholds = roc_curve(error_df[1].anomaly, error_df[1].reconstruction_error)
    roc_auc = auc(fpr, tpr)
    axis[1].plot(fpr, tpr, label='AUC = %0.4f'% roc_auc)
    axis[1].plot([0,1],[0,1],'r--')
    axis[1].legend(loc='lower right')
    axis[1].set_title('Receiver Operating Characteristic for Model_MAE')
    axis[1].set_xlim([-0.001, 1])
    axis[1].set_ylim([0, 1.001])
    axis[1].set(xlabel='False Positive Rate', ylabel='True Positive Rate')
    axis[1].legend()
    
    fpr, tpr, thresholds = roc_curve(error_df[2].anomaly, error_df[2].reconstruction_error)
    roc_auc = auc(fpr, tpr)
    axis[2].plot(fpr, tpr, label='AUC = %0.4f'% roc_auc)
    axis[2].plot([0,1],[0,1],'r--')
    axis[2].legend(loc='lower right')
    axis[2].set_title('Receiver Operating Characteristic for Model_Huber')
    axis[2].set_xlim([-0.001, 1])
    axis[2].set_ylim([0, 1.001])
    axis[2].set(xlabel='False Positive Rate', ylabel='True Positive Rate')
    axis[2].legend()
    
    #second row: simple architectures 
    fpr, tpr, thresholds = roc_curve(error_df[3].anomaly, error_df[3].reconstruction_error)
    roc_auc = auc(fpr, tpr)
    axis[3].plot(fpr, tpr, label='AUC = %0.4f'% roc_auc)
    axis[3].plot([0,1],[0,1],'r--')
    axis[3].legend(loc='lower right')
    axis[3].set_title('Receiver Operating Characteristic for Model_CE')
    axis[3].set_xlim([-0.001, 1])
    axis[3].set_ylim([0, 1.001])
    axis[3].set(xlabel='False Positive Rate', ylabel='True Positive Rate')
    axis[3].legend()
    
    fpr, tpr, thresholds = roc_curve(error_df[4].anomaly, error_df[4].reconstruction_error)
    roc_auc = auc(fpr, tpr)
    axis[4].plot(fpr, tpr, label='AUC = %0.4f'% roc_auc)
    axis[4].plot([0,1],[0,1],'r--')
    axis[4].legend(loc='lower right')
    axis[4].set_title('Receiver Operating Characteristic for Model_Focal')
    axis[4].set_xlim([-0.001, 1])
    axis[4].set_ylim([0, 1.001])
    axis[4].set(xlabel='False Positive Rate', ylabel='True Positive Rate')
    axis[4].legend()


    save_fig("Roc-Curves for test data")
    plt.show()

plot_test_performance_roc()

## Precision, Recall und F1-Score

Da diese Metriken davon abhängig sind, welche Threshold man für die Klassifikation wählt, wird die Threshold wie folgt gewählt, um den F1-Score zu maximieren:


*   Bestimme die Threshold, für die die Precision und der Recall gleich sind
*   Da es sich bei dem F1-Score um ein harmonisches Mittel handelt, wird so der F1-Score maximiert



In [None]:
prec, recall, thresholds = precision_recall_curve(focal_error_df["anomaly"], focal_error_df["reconstruction_error"])

best_idx = np.argmin(np.abs(prec-recall)[:int(len(np.abs(prec-recall))*-0.1)]) #to resolve an bug, the last four obsvervations have to be excluded
best_prec, best_recall = prec[best_idx], recall[best_idx]
print(f"Best precision {np.round(best_prec, 2)}, "\
      f"recall: {np.round(best_recall, 2)} at {np.round(thresholds[best_idx], 2)} threshold.")

In [None]:
from sklearn.metrics import precision_recall_curve, auc
auc_score = auc(recall, prec)
auc_score

Visualisiere das Ergebnis für alle Ergebnisse

In [None]:
## Recall Vs Precision
def plot_test_performance_recall_vs_precision(error_df = [mse_error_df,
                                          mae_error_df,
                                          huber_error_df,
                                          ce_error_df,
                                          focal_error_df]):
    figure, axis = plt.subplots(ncols=2,nrows=5, figsize=(30, 30))
    # first row: complicated architecture
    precision, recall, thresholds = precision_recall_curve(error_df[0].anomaly, error_df[0].reconstruction_error)
    best_idx = np.argmin(np.abs(precision-recall)[:-4])
    best_prec, best_recall = precision[best_idx], recall[best_idx]
    precision_recall_auc = auc(recall, precision)
    axis[0,0].plot(recall, precision, label='AUC = %0.4f'% precision_recall_auc)
    axis[0,0].legend(loc='lower right')
    axis[0,0].set_title('Recall vs Precision for Model_MSE')
    axis[0,0].set(xlabel='Recall', ylabel='Precision')
    axis[0,1].set_title('Determine the threshold for Model_MSE')
    axis[0,1].plot(thresholds, precision[:-1], label="precision")
    axis[0,1].plot(thresholds, recall[:-1], label="recall")
    axis[0,1].axvline(thresholds[best_idx], 0, 1, c="grey", linestyle="--")
    axis[0,1].set_xlabel("thresholds")
    axis[0,1].legend()

    precision, recall, thresholds = precision_recall_curve(error_df[1].anomaly, error_df[1].reconstruction_error)
    best_idx = np.argmin(np.abs(precision-recall)[:-4])
    best_prec, best_recall = precision[best_idx], recall[best_idx]
    precision_recall_auc = auc(recall, precision)
    axis[1,0].plot(recall, precision, label='AUC = %0.4f'% precision_recall_auc)
    axis[1,0].legend(loc='lower right')
    axis[1,0].set_title('Recall vs Precision for Model_MAE')
    axis[1,0].set(xlabel='Recall', ylabel='Precision')
    axis[1,1].set_title('Determine the threshold for Model_MAE')
    axis[1,1].plot(thresholds, precision[:-1], label="precision")
    axis[1,1].plot(thresholds, recall[:-1], label="recall")
    axis[1,1].axvline(thresholds[best_idx], 0, 1, c="grey", linestyle="--")
    axis[1,1].set_xlabel("thresholds")
    axis[1,1].legend()

    precision, recall, thresholds = precision_recall_curve(error_df[2].anomaly, error_df[2].reconstruction_error)
    best_idx = np.argmin(np.abs(precision-recall)[:-4])
    best_prec, best_recall = precision[best_idx], recall[best_idx]
    precision_recall_auc = auc(recall, precision)
    axis[2,0].plot(recall, precision, label='AUC = %0.4f'% precision_recall_auc)
    axis[2,0].legend(loc='lower right')
    axis[2,0].set_title('Recall vs Precision for Model_Huber')
    axis[2,0].set(xlabel='Recall', ylabel='Precision')
    axis[2,1].set_title('Determine the threshold for Model_Huber')
    axis[2,1].plot(thresholds, precision[:-1], label="precision")
    axis[2,1].plot(thresholds, recall[:-1], label="recall")
    axis[2,1].axvline(thresholds[best_idx], 0, 1, c="grey", linestyle="--")
    axis[2,1].set_xlabel("thresholds")
    axis[2,1].legend()

    precision, recall, thresholds = precision_recall_curve(error_df[3].anomaly, error_df[3].reconstruction_error)
    best_idx = np.argmin(np.abs(precision-recall)[:-4])
    best_prec, best_recall = precision[best_idx], recall[best_idx]
    precision_recall_auc = auc(recall, precision)
    axis[3,0].plot(recall, precision, label='AUC = %0.4f'% precision_recall_auc)
    axis[3,0].legend(loc='lower right')
    axis[3,0].set_title('Recall vs Precision for Model_CE')
    axis[3,0].set(xlabel='Recall', ylabel='Precision')
    axis[3,1].set_title('Determine the threshold for Model_CE')
    axis[3,1].plot(thresholds, precision[:-1], label="precision")
    axis[3,1].plot(thresholds, recall[:-1], label="recall")
    axis[3,1].axvline(thresholds[best_idx], 0, 1, c="grey", linestyle="--")
    axis[3,1].set_xlabel("thresholds")
    axis[3,1].legend()

    precision, recall, thresholds = precision_recall_curve(error_df[4].anomaly, error_df[4].reconstruction_error)
    best_idx = np.argmin(np.abs(precision-recall)[:-4])
    best_prec, best_recall = precision[best_idx], recall[best_idx]
    precision_recall_auc = auc(recall, precision)
    axis[4,0].plot(recall, precision, label='AUC = %0.4f'% precision_recall_auc)
    axis[4,0].legend(loc='lower right')
    axis[4,0].set_title('Recall vs Precision for Model_FL')
    axis[4,0].set(xlabel='Recall', ylabel='Precision')
    axis[4,1].set_title('Determine the threshold for Model_Focal')
    axis[4,1].plot(thresholds, precision[:-1], label="precision")
    axis[4,1].plot(thresholds, recall[:-1], label="recall")
    axis[4,1].axvline(thresholds[best_idx], 0, 1, c="grey", linestyle="--")
    axis[4,1].set_xlabel("thresholds")
    axis[4,1].legend()

    save_fig("Recall vs Precision curves for test data")
    sns.despine()
    plt.show()

plot_test_performance_recall_vs_precision()

In [None]:
def get_precision_threshold(error_df):
  '''
  Return an array, which includes the thresholds with the second highest precision score for each model
  '''
  prec, recall, threshold = precision_recall_curve(error_df["anomaly"], error_df["reconstruction_error"])
  best_idx = np.argmin(np.abs(prec-recall)[:-4])
  best_prec, best_recall = prec[best_idx], recall[best_idx]

  return threshold[best_idx]

for i in [mse_error_df, mae_error_df, huber_error_df, ce_error_df, focal_error_df]:   
  print(get_precision_threshold(i))

In [None]:
def plot_thresholds_data(index:int,save_name:str,error_df = [mse_error_df,
                                          mae_error_df,
                                          huber_error_df,
                                          ce_error_df,
                                          focal_error_df]):
  if index > len(error_df) - 1 or index < 0:
    raise("Index out of range implemented. Please correct your statement")
  else:
    groups = error_df[index].groupby('anomaly')
    threshold = get_precision_threshold(error_df[index])
    fig, ax = plt.subplots()
    for name, group in groups:
      ax.plot(group.index, group.reconstruction_error, marker='o', ms=3.5, linestyle='',
              label= "Anomaly" if name == 1 else "Normal")
    ax.hlines(threshold, ax.get_xlim()[0], ax.get_xlim()[1], colors="r", zorder=100, label='Threshold')
    ax.legend()
    plt.title("Reconstruction error visualisation to identify anomalies")
    plt.ylabel("Reconstruction error")
    plt.xlabel("Data point index")
    save_fig(save_name)
    plt.show();


In [None]:
plot_thresholds_data(0, "Reconstruction error visualisation to identify anomalies with model_mse")
plot_thresholds_data(1, "Reconstruction error visualisation to identify anomalies with model_mae")
plot_thresholds_data(2, "Reconstruction error visualisation to identify anomalies with model_huber")
plot_thresholds_data(3, "Reconstruction error visualisation to identify anomalies with model_ce")
plot_thresholds_data(4, "Reconstruction error visualisation to identify anomalies with model_focal")

In [None]:
## Predict class based on error threshold

# Weitermachen
def plot_cm_matrix(index:int,save_name:str,error_df = [mse_error_df,
                                          mae_error_df,
                                          huber_error_df,
                                          ce_error_df,
                                          focal_error_df],
                   LABELS = ["Non Fraud", "Fraud"]):
  if index > len(error_df) - 1 or index < 0:
    raise("Index out of range implemented. Please correct your statement")
  else:
    error_df = error_df[index]
    threshold = get_precision_threshold(error_df)
    y_pred = [1 if e > threshold else 0 for e in error_df.reconstruction_error.values]
    conf_matrix = confusion_matrix(error_df.anomaly, y_pred,labels=[0,1])
    plt.figure(figsize=(12, 12))
    sns.heatmap(conf_matrix,xticklabels=LABELS, yticklabels=LABELS, annot=True, fmt="d",cmap='Blues');
    plt.title("Confusion matrix")
    plt.ylabel('True class')
    plt.xlabel('Predicted class')
    save_fig(save_name)
    plt.show()


In [None]:
plot_cm_matrix(0, "CM-matrix visualisation to identify anomalies with model_mse")
plot_cm_matrix(1, "CM-matrix visualisation to identify anomalies with model_mae")
plot_cm_matrix(2, "CM-matrix visualisation to identify anomalies with model_huber")
plot_cm_matrix(3, "CM-matrix visualisation to identify anomalies with model_ce")
plot_cm_matrix(4, "CM-matrix visualisation to identify anomalies with model_focal")

In [None]:
def calculate_f1_scores(index, error_df =  
                        [mse_error_df,
                         mae_error_df,
                         huber_error_df,
                         ce_error_df,
                         focal_error_df]
                        ):
  if index > len(error_df) - 1 or index < 0:
    raise("Index out of range implemented. Please correct your statement")
  error_df = error_df[index]
  threshold = get_precision_threshold(error_df)
  y_pred =  [1 if e > threshold else 0 for e in error_df.reconstruction_error.values]
  print('F1_Score with threshold:', threshold)
  return(f1_score(error_df.anomaly, y_pred))
calculate_f1_scores(0)

In [None]:
error_df = [mse_error_df,
             mae_error_df,
             huber_error_df,
             ce_error_df,
             focal_error_df]
for i in error_df:
  threshold = get_precision_threshold(i) #specify here the index 
  y_pred =  [1 if e > threshold else 0 for e in i.reconstruction_error.values]

  cm1 = confusion_matrix(i.anomaly, y_pred,labels=[1,0])
  print('Confusion Matrix Val: \n', cm1)


  total1=sum(sum(cm1))
  #####from confusion matrix calculate accuracy

  accuracy1=(cm1[0,0]+cm1[1,1])/total1
  print ('Accuracy Val: ', accuracy1)


  sensitivity1 = cm1[0,0]/(cm1[0,0]+cm1[0,1])
  print('Sensitivity Val: ', sensitivity1 )


  specificity1 = cm1[1,1]/(cm1[1,0]+cm1[1,1])
  print('Specificity Val: ', specificity1)

  KappaValue=cohen_kappa_score(i.anomaly, y_pred)
  print("Kappa Value :",KappaValue)
  AUC=roc_auc_score(i.anomaly, y_pred)

  print("AUC         :",AUC)

  print("F1-Score Val  : ",f1_score(i.anomaly, y_pred))

  print("_______________________________________")

# Simulation

In [None]:
def simulation(
    loss,
    model,
    patience,
    epochs,
    n:int = 0,
    end:int = 30,
    optimizer="adam",
    test_size = 0.2,
    batch_size=16,
    validation_batch_size = 16,
    delta = 0.5
    ):
  
  stopped_epochs = []
  auc_values_roc = []
  auc_values_recall_prec = []
  f1_values = []

  if loss == "focal_loss":
    loss = sm.losses.BinaryFocalLoss()
    print("Start the Simulation:")
    while(True):
      print("Lossfunction: ", loss),
      print("Iteration: ",n)
      #Step 1: Splitt data
      train_data, test_data, train_labels, test_labels = train_test_split(
        data,
        labels_bool,
        test_size=test_size
        )
      train_data_no_validation, train_data_validation, train_data_no_validation_labels, train_data_validation_labels = train_test_split(
        train_data,
        train_labels,
        test_size = test_size
        )
      
      #Step2: Train data
      model = model
      early_stopping = tf.keras.callbacks.EarlyStopping(patience=patience,
                                                      restore_best_weights=True) #generate callback
      model.compile(loss=loss, optimizer=optimizer, run_eagerly=True)
      history = model.fit(train_data,
                      train_data,
                      batch_size=batch_size,
                      epochs = epochs,
                      validation_data=(train_data_validation, train_data_validation),
                      callbacks=[
                          early_stopping,
                          ],
                      verbose = 0,
                      validation_batch_size = validation_batch_size,
                      shuffle=True)
      stopped_epochs.append(get_last_epoch(history)) #append stopped epoch 

      #Step3: Evaluate test data with AUC
      error_df = gen_error_df(model)
      fpr, tpr, thresholds = roc_curve(error_df.anomaly, error_df.reconstruction_error)
      auc_values_roc.append(auc(fpr, tpr))

      #Step4: Evaluate test data with AUC
      prec, recall, thresholds = precision_recall_curve(error_df["anomaly"], error_df["reconstruction_error"])
      auc_values_recall_prec.append(auc(recall, prec))
      best_idx = np.argmin(np.abs(prec-recall)[:-4])
      best_prec, best_recall = prec[best_idx], recall[best_idx]
      threshold = thresholds[best_idx]
      y_pred =  [1 if e > threshold else 0 for e in error_df.reconstruction_error.values]
      f1_values.append(f1_score(error_df.anomaly, y_pred))

      if (n >= end):
        print("Final Output:")
        print("Return stopped epochs per iteration, ")
        print("Return auc values per iteration, ")
        print("Return precision scores with best thresholds per iteration, ")
        print("Return recall scores with best thresholds per iteration, ")
        print("Return f1 scores with best thresholds per iteration, ")
        return(stopped_epochs,auc_values_roc,auc_values_recall_prec,f1_values)
        break
      n+=1
  
  elif loss == "huber_loss":
    loss = tf.keras.losses.Huber(delta=delta, reduction="auto", name="huber_loss")
    print("Start the Simulation:")
    while(True):
      print("Lossfunction: ", loss),
      print("Iteration: ",n)
      #Step 1: Splitt data
      train_data, test_data, train_labels, test_labels = train_test_split(
        data,
        labels_bool,
        test_size=test_size
        )
      train_data_no_validation, train_data_validation, train_data_no_validation_labels, train_data_validation_labels = train_test_split(
        train_data,
        train_labels,
        test_size = test_size
        )
      
      #Step2: Train data
      model = model
      early_stopping = tf.keras.callbacks.EarlyStopping(patience=patience,
                                                      restore_best_weights=True) #generate callback
      model.compile(loss=loss, optimizer=optimizer, run_eagerly=True)
      history = model.fit(train_data,
                      train_data,
                      batch_size=batch_size,
                      epochs = epochs,
                      validation_data=(train_data_validation, train_data_validation),
                      callbacks=[
                          early_stopping,
                          ],
                      verbose = 0,
                      validation_batch_size = validation_batch_size,
                      shuffle=True)
      stopped_epochs.append(get_last_epoch(history)) #append stopped epoch 

      #Step3: Evaluate test data with AUC
      error_df = gen_error_df(model)
      fpr, tpr, thresholds = roc_curve(error_df.anomaly, error_df.reconstruction_error)
      auc_values_roc.append(auc(fpr, tpr))

      #Step4: Evaluate test data with AUC
      prec, recall, thresholds = precision_recall_curve(error_df["anomaly"], error_df["reconstruction_error"])
      auc_values_recall_prec.append(auc(recall, prec))
      best_idx = np.argmin(np.abs(prec-recall)[:-4])
      best_prec, best_recall = prec[best_idx], recall[best_idx]
      threshold = thresholds[best_idx]
      y_pred =  [1 if e > threshold else 0 for e in error_df.reconstruction_error.values]
      f1_values.append(f1_score(error_df.anomaly, y_pred))

      if (n >= end):
        print("Final Output:")
        print("Return stopped epochs per iteration, ")
        print("Return auc values per iteration, ")
        print("Return precision scores with best thresholds per iteration, ")
        print("Return recall scores with best thresholds per iteration, ")
        print("Return f1 scores with best thresholds per iteration, ")
        return(stopped_epochs,auc_values_roc,auc_values_recall_prec,f1_values)
        break
      n+=1
      
  else:
    print("Start the Simulation:")
    while(True):
      print("Lossfunction: ", loss),
      print("Iteration: ",n)
      #Step 1: Splitt data
      train_data, test_data, train_labels, test_labels = train_test_split(
        data,
        labels_bool,
        test_size=test_size
        )
      train_data_no_validation, train_data_validation, train_data_no_validation_labels, train_data_validation_labels = train_test_split(
        train_data,
        train_labels,
        test_size = test_size
        )
      #Step2: Train data
      model = model
      early_stopping = tf.keras.callbacks.EarlyStopping(patience=patience,
                                                      restore_best_weights=True) #generate callback
      model.compile(loss=loss, optimizer=optimizer, run_eagerly=True)
      history = model.fit(train_data,
                      train_data,
                      batch_size=batch_size,
                      epochs = epochs,
                      validation_data=(train_data_validation, train_data_validation),
                      callbacks=[
                          early_stopping,
                          ],
                      verbose = 0,
                      validation_batch_size = validation_batch_size,
                      shuffle=True)
      stopped_epochs.append(get_last_epoch(history)) #append stopped epoch
      
      #Step3: Evaluate test data with AUC
      error_df = gen_error_df(model)
      fpr, tpr, thresholds = roc_curve(error_df.anomaly, error_df.reconstruction_error)
      auc_values_roc.append(auc(fpr, tpr))

      #Step4: Evaluate test data with AUC
      prec, recall, thresholds = precision_recall_curve(error_df["anomaly"], error_df["reconstruction_error"])
      auc_values_recall_prec.append(auc(recall, prec))
      best_idx = np.argmin(np.abs(prec-recall)[:-4])
      best_prec, best_recall = prec[best_idx], recall[best_idx]
      threshold = thresholds[best_idx]
      y_pred =  [1 if e > threshold else 0 for e in error_df.reconstruction_error.values]
      f1_values.append(f1_score(error_df.anomaly, y_pred))

      if (n >= end):
        print("Final Output:")
        print("Return stopped epochs per iteration, ")
        print("Return auc values per iteration, ")
        print("Return precision scores with best thresholds per iteration, ")
        print("Return recall scores with best thresholds per iteration, ")
        print("Return f1 scores with best thresholds per iteration, ")
        return(stopped_epochs,auc_values_roc,auc_values_recall_prec,f1_values)
        break
      n+=1

In [None]:
simulation_mse = simulation(model = model_mse,
                            end = 30,
                            patience = 20, #EaryStopping after 20 epochs,
                            epochs = 1000,
                            loss = "mean_squared_error")
with open("simulation_mse_breastW_.txt", "w") as f:
   # Convert the list to a string and write it to the file
    f.write(str(simulation_mse))

In [None]:
simulation_mae = simulation(model = model_mae,
                            end = 30,
                            patience = 20, #EaryStopping after 20 epochs,
                            epochs = 1000,
                            loss = "mean_absolute_error")
with open("simulation_mae_breastW_.txt", "w") as f:
   # Convert the list to a string and write it to the file
    f.write(str(simulation_mae))

In [None]:
simulation_huber = simulation(model = model_huber,
                            end = 30,
                            patience = 20, #EaryStopping after 20 epochs,
                            epochs = 1000,
                            loss = "huber_loss",
                            delta = 0.5)
with open("simulation_huber_breastW_.txt", "w") as f:
   # Convert the list to a string and write it to the file
    f.write(str(simulation_huber))

In [None]:
simulation_ce= simulation(model = model_ce,
                          end = 30,
                          patience = 20, #EaryStopping after 20 epochs,
                          epochs = 1000,
                          loss = "binary_crossentropy")
with open("simulation_ce_breastW_.txt", "w") as f:
   # Convert the list to a string and write it to the file
    f.write(str(simulation_ce))

In [None]:
simulation_focal =simulation(model = model_focal,
                             end = 30,
                             patience = 20, #EaryStopping after 20 epochs,
                             epochs = 1000,
                             loss ="focal_loss")
with open("simulation_focal_breastW_.txt", "w") as f:
   # Convert the list to a string and write it to the file
    f.write(str(simulation_focal))

In [None]:
for i in [simulation_mse, simulation_mae, simulation_huber, simulation_ce, simulation_focal]:
  print("_______________________________________________________")
  print("Mean and standard deviation of last epochs")
  print(np.asarray(i[0]).mean())
  print(np.asarray(i[0]).std())
  print("Mean and standard deviation of ROC-AUC")
  print(np.asarray(i[1]).mean())
  print(np.asarray(i[1]).std())
  print("Mean and standard deviation of Recall-Precision-AUC")
  print(np.asarray(i[2]).mean())
  print(np.asarray(i[2]).std())



# Statistische Evaluation

Die Ergebnisse aus der Simulation wurden in Textfiles gespeichert. Um die Daten zu laden, wurden diese manuell hier herein kopiert. 

Quelle:


*   simulation_mse_breastW_.txt

*   simulation_mae_breastW_.txt

*   simulation_huber_breastW_.txt

*   simulation_ce_breastW_.txt

*   simulation_focal_breastW_.txt

Um die Resultate der Ausarbeitung zu sehen, den folgenden Block auskommentieren. Ansonsten kann eine neue Simulation verwendet werden (GPU-Unterstützung sollte dafür aktiv sein).

In [None]:

# simulation_mse = [[83, 215, 57, 338, 44, 115, 71, 98, 41, 34, 97, 47, 28, 50, 47, 87, 92, 40, 34, 73, 77, 29, 26, 80, 63, 39, 60, 50, 22, 35, 32], [0.7263576779026217, 0.7146535580524345, 0.7205056179775281, 0.7315074906367041, 0.7256554307116105, 0.7357209737827715, 0.7312734082397004, 0.723314606741573, 0.7233146067415731, 0.7308052434456929, 0.7305711610486891, 0.7350187265917604, 0.7326779026217228, 0.7322097378277154, 0.7308052434456928, 0.7317415730337079, 0.7359550561797752, 0.7359550561797753, 0.7305711610486891, 0.7296348314606741, 0.7350187265917603, 0.7333801498127341, 0.7347846441947566, 0.7378277153558053, 0.7357209737827716, 0.7336142322097378, 0.7422752808988764, 0.7354868913857677, 0.7432116104868913, 0.7350187265917603, 0.7411048689138577], [0.48176857922048094, 0.4670740675305246, 0.4775567095162599, 0.4740908307338667, 0.4717914709485081, 0.476076902544321, 0.4748747898065486, 0.4698974893962126, 0.46709910802737914, 0.47229246844955336, 0.47117515637650004, 0.47749521431167735, 0.472484780642743, 0.47422774372666937, 0.4728186625328763, 0.47333476032415756, 0.47660213797134116, 0.47974132090976895, 0.47343942129970523, 0.4704574952104742, 0.47337968765219246, 0.47748451066693526, 0.47676490235837227, 0.4824133913904798, 0.48030668774202157, 0.4843403914346208, 0.4844864309003886, 0.48003156565189886, 0.48711794904534494, 0.4800151318751841, 0.4842461099644249], [0.46315789473684216, 0.46315789473684216, 0.4842105263157895, 0.5052631578947369, 0.5052631578947369, 0.5052631578947369, 0.46315789473684216, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.46315789473684216, 0.46315789473684216, 0.46315789473684216, 0.4842105263157895, 0.46315789473684216, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.4842105263157895, 0.5052631578947369, 0.4842105263157895, 0.4842105263157895]]
# simulation_mae = [[128, 26, 70, 140, 22, 58, 101, 51, 36, 68, 58, 38, 24, 43, 20, 30, 43, 52, 62, 45, 36, 26, 37, 52, 39, 21, 23, 24, 22, 56, 50], [0.7078651685393258, 0.7116104868913857, 0.7158239700374532, 0.7202715355805244, 0.7186329588014981, 0.7109082397003745, 0.7286985018726592, 0.7272940074906367, 0.7249531835205992, 0.7219101123595506, 0.7256554307116105, 0.7258895131086143, 0.7244850187265918, 0.7205056179775281, 0.7244850187265918, 0.7305711610486891, 0.716994382022472, 0.7265917602996255, 0.7212078651685394, 0.7258895131086143, 0.7216760299625469, 0.7289325842696629, 0.7263576779026217, 0.7207397003745318, 0.7216760299625468, 0.7202715355805244, 0.722378277153558, 0.726123595505618, 0.7221441947565543, 0.7289325842696629, 0.7219101123595506], [0.45901317444488926, 0.4619196223201732, 0.4627237874503759, 0.4666265842335252, 0.4650070262117747, 0.45851249836724484, 0.47635534214147834, 0.4750241210448929, 0.4716508800866052, 0.46903469394525377, 0.4723533092192342, 0.47283404692277986, 0.471353590662186, 0.46799211503789206, 0.4712292213811213, 0.4747044461547346, 0.4645095370792167, 0.47353115856730893, 0.4678296348758231, 0.4738615875491491, 0.46922193610331026, 0.4748045446342627, 0.4733699413791063, 0.46876850377452717, 0.4690500513836704, 0.4680798457542617, 0.46983641508631485, 0.47286838051314706, 0.4686185712180406, 0.47458842935518186, 0.4684168894820359], [0.5319148936170214, 0.5473684210526316, 0.0, 0.0, 0.0, 0.0, 0.5473684210526316, 0.5473684210526316, 0.5473684210526316, 0.0, 0.5473684210526316, 0.5473684210526316, 0.0, 0.0, 0.5473684210526316, 0.5473684210526316, 0.0, 0.5473684210526316, 0.0, 0.5473684210526316, 0.0, 0.5473684210526316, 0.5473684210526316, 0.5473684210526316, 0.0, 0.0, 0.5473684210526316, 0.5473684210526316, 0.0, 0.5473684210526316, 0.0]]
# simulation_huber = [[22, 38, 104, 47, 85, 49, 22, 55, 33, 23, 35, 39, 32, 36, 84, 66, 69, 46, 45, 63, 32, 36, 30, 36, 27, 39, 22, 41, 46, 26, 48], [0.7055243445692884, 0.6914794007490637, 0.7010767790262172, 0.7057584269662921, 0.7106741573033708, 0.7043539325842697, 0.7080992509363295, 0.696629213483146, 0.7062265917602997, 0.7029494382022472, 0.7045880149812734, 0.6952247191011236, 0.7038857677902621, 0.7062265917602997, 0.7010767790262172, 0.7071629213483146, 0.7073970037453183, 0.7055243445692884, 0.7137172284644194, 0.7057584269662921, 0.7027153558052435, 0.7092696629213483, 0.7066947565543071, 0.7052902621722846, 0.7045880149812734, 0.7085674157303371, 0.7020131086142322, 0.7052902621722846, 0.7055243445692884, 0.7024812734082397, 0.7071629213483146], [0.48511685602285015, 0.4480950747034451, 0.4555284598253575, 0.45877104863888, 0.461021367751396, 0.47958902705348505, 0.46354960643998144, 0.4528752617026056, 0.45890730822589154, 0.4565499774702527, 0.46173628429241165, 0.45211201088517294, 0.4572496369132849, 0.4583010441593189, 0.453886154777847, 0.4618191677894363, 0.4632236357758274, 0.4595699686124987, 0.4857400122070571, 0.45789591490167236, 0.45765476878348255, 0.48556623361873247, 0.4592732580467916, 0.46305588212561893, 0.45744858299427427, 0.46016559096892384, 0.4556737282740109, 0.4597732182194534, 0.45909577716143274, 0.4539226879092227, 0.4596671143555374], [0.5473684210526316, 0.5263157894736842, 0.5052631578947369, 0.5263157894736842, 0.5473684210526316, 0.5052631578947369, 0.5263157894736842, 0.5263157894736842, 0.5052631578947369, 0.5052631578947369, 0.4842105263157895, 0.4842105263157895, 0.5263157894736842, 0.5263157894736842, 0.5263157894736842, 0.5052631578947369, 0.5263157894736842, 0.5263157894736842, 0.5052631578947369, 0.5263157894736842, 0.5263157894736842, 0.5263157894736842, 0.5052631578947369, 0.5263157894736842, 0.5263157894736842, 0.5473684210526316, 0.5052631578947369, 0.5052631578947369, 0.5052631578947369, 0.5052631578947369, 0.4842105263157895]]
# simulation_ce = [[485, 189, 78, 62, 22, 27, 68, 39, 39, 108, 80, 54, 91, 56, 36, 32, 60, 33, 27, 58, 51, 81, 34, 41, 51, 92, 107, 32, 30, 21, 26], [0.7120300751879699, 0.7255639097744362, 0.7313283208020049, 0.7233082706766917, 0.7436090225563909, 0.7413533834586467, 0.712531328320802, 0.7208020050125313, 0.7360902255639098, 0.7255639097744361, 0.7263157894736842, 0.7345864661654136, 0.7343358395989975, 0.7378446115288221, 0.7328320802005012, 0.7340852130325815, 0.7413533834586467, 0.7418546365914787, 0.7466165413533834, 0.737593984962406, 0.7491228070175439, 0.7483709273182957, 0.7448621553884711, 0.7335839598997493, 0.7451127819548873, 0.756641604010025, 0.7546365914786968, 0.7526315789473683, 0.7493734335839598, 0.7506265664160401, 0.7421052631578947], [0.47619047619047616, 0.5238095238095238, 0.5, 0.5, 0.5238095238095238, 0.5238095238095238, 0.5, 0.5238095238095238, 0.5238095238095238, 0.5, 0.47619047619047616, 0.5238095238095238, 0.5, 0.5238095238095238, 0.5, 0.5, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5], [0.47619047619047616, 0.5238095238095238, 0.5, 0.5, 0.5238095238095238, 0.5238095238095238, 0.5, 0.5238095238095238, 0.5238095238095238, 0.5, 0.47619047619047616, 0.5238095238095238, 0.5, 0.5238095238095238, 0.5, 0.5, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5238095238095238, 0.5], [0.4819277108433735, 0.5060240963855421, 0.5060240963855421, 0.5060240963855421, 0.5301204819277109, 0.5301204819277109, 0.4819277108433735, 0.5060240963855421, 0.5060240963855421, 0.4819277108433735, 0.4819277108433735, 0.5301204819277109, 0.5060240963855421, 0.5060240963855421, 0.5060240963855421, 0.5060240963855421, 0.5301204819277109, 0.5301204819277109, 0.5301204819277109, 0.5060240963855421, 0.5301204819277109, 0.5301204819277109, 0.5301204819277109, 0.5301204819277109, 0.5301204819277109, 0.5301204819277109, 0.5301204819277109, 0.5301204819277109, 0.5060240963855421, 0.5301204819277109, 0.5060240963855421]]
# simulation_focal = [[32, 26, 44, 35, 71, 21, 46, 74, 48, 34, 24, 39, 30, 27, 20, 44, 20, 25, 22, 38, 20, 26, 81, 31, 20, 27, 20, 41, 29, 30, 23], [0.4740168539325842, 0.4691011235955056, 0.4676966292134831, 0.46980337078651685, 0.47425093632958804, 0.47191011235955055, 0.4679307116104869, 0.4662921348314606, 0.46956928838951306, 0.4700374531835205, 0.4777621722846442, 0.46652621722846443, 0.46722846441947563, 0.4735486891385768, 0.47425093632958804, 0.4726123595505618, 0.4719101123595505, 0.4665262172284643, 0.46863295880149813, 0.4726123595505618, 0.4691011235955056, 0.46816479400749056, 0.47659176029962547, 0.47659176029962547, 0.47354868913857684, 0.4698033707865169, 0.46863295880149813, 0.47448501872659177, 0.4737827715355805, 0.4691011235955056, 0.4669943820224719], [0.31915164371855187, 0.31658236479639157, 0.3159374131222968, 0.3168479031200666, 0.31908087560415554, 0.31769398481776634, 0.31632000546305017, 0.3153905225025381, 0.31718954444088937, 0.3172862206179785, 0.3212815320978825, 0.3156302684987513, 0.3157176067165215, 0.3189039476561454, 0.31910487418090017, 0.31852958677993504, 0.3180342331985332, 0.3156481398496002, 0.31683704307058663, 0.3184592858328014, 0.31677252895686214, 0.3162018731406295, 0.320709574079507, 0.32025286140578596, 0.31846127498792043, 0.3175745337847127, 0.3167964246710562, 0.3196675170673019, 0.3189920353355094, 0.31743411013544826, 0.31570847905756383], [0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105, 0.3157894736842105]]


In [None]:
summary = [
    simulation_mse[0],
    simulation_mae[0],
    simulation_huber[0],
    simulation_ce[0],
    simulation_focal[0],
    simulation_mse[1],
    simulation_mae[1],
    simulation_huber[1],
    simulation_ce[1],
    simulation_focal[1],
    simulation_mse[2],
    simulation_mae[2],
    simulation_huber[2],
    simulation_ce[2],
    simulation_focal[2]   
]

In [None]:
results = pd.DataFrame(zip(*summary),
                       columns=[
                           "last_epochs_mse",
                           "last_epochs_mae",
                           "last_epochs_huber",
                           "last_epochs_ce",
                           "last_epochs_focal",
                           "auc_mse",
                           "auc_mae",
                           "auc_huber",
                           "auc_ce",
                           "auc_focal",
                           "auprc_mse",
                           "auprc_mae",
                           "auprc_huber",
                           "auprc_ce",
                           "auprc_focal"
                       ]

)

results

In [None]:
#print boxplot for each variable
fig, axs = plt.subplots(ncols=5, nrows=3, figsize=(20, 10))
index = 0
axs = axs.flatten()
for k,v in results.items():
    sns.boxplot(y=k, data=results, ax=axs[index])
    index += 1
save_fig("boxplots_results")
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=5.0)

## Kruskal-Wallis-Test

Der [Kruskal-Wallis-Test](https:https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html//) ist ein nichtparametrischer Test, der zum Vergleich der Mittelwerte von drei oder mehr unabhängigen Gruppen verwendet werden kann. Er ähnelt der einseitigen ANOVA, setzt aber nicht voraus, dass die Daten normal verteilt sind.

Zur Durchführung des Kruskal-Wallis-Tests werden die Daten zunächst vom niedrigsten zum höchsten Wert geordnet. Die Ränge werden dann zur Berechnung einer Teststatistik verwendet, die mit einem kritischen Wert verglichen wird, um festzustellen, ob der Unterschied zwischen den Gruppenmitteln statistisch signifikant ist.

Der Kruskal-Wallis-Test ist geeignet, wenn die zu vergleichenden Populationen unabhängig sind und die Daten ordinal (d. h., sie können in eine Rangfolge gebracht werden) oder kontinuierlich (aber nicht normalverteilt) sind. Er ist auch geeignet, wenn die Stichprobengröße klein oder ungleich ist.



Last-Epochs: Gibt es Unterschiede bezüglich Konvergenz? 

In [None]:
for i in [simulation_mse, simulation_mae, simulation_huber, simulation_ce, simulation_focal]:
  print("_______________________________________________________")
  print("Mean and standard deviation of last epochs")
  print(np.asarray(i[0]).mean())
  print(np.asarray(i[0]).std())

In [None]:
import scipy.stats as stats
'''
Null hypothesis that the population median of all of the groups are equal

Here: stopped epochs to evaluate the convergence ability 
'''
stats.kruskal(
    np.hstack(simulation_mse[0]),
    np.hstack(simulation_mae[0]),
    np.hstack(simulation_huber[0]),
    np.hstack(simulation_ce[0]),
    np.hstack(simulation_focal[0]),
     )
# KruskalResult(statistic=22.23847083667068, pvalue=0.0001796605135019599)


AUC: Gibt es Unterschiede für die Performance in den Testdaten?

In [None]:
for i in [simulation_mse, simulation_mae, simulation_huber, simulation_ce, simulation_focal]:
  print("_______________________________________________________")
  print("Mean and standard deviation of roc_auc")
  print(np.asarray(i[1]).mean())
  print(np.asarray(i[1]).std())

In [None]:
import scipy.stats as stats
'''
Null hypothesis that the population median of all of the groups are equal

Here: stopped epochs to evaluate the convergence ability 
'''
stats.kruskal(
    np.hstack(simulation_mse[1]),
    np.hstack(simulation_mae[1]),
    np.hstack(simulation_huber[1]),
    np.hstack(simulation_ce[1]),
    np.hstack(simulation_focal[1]),
     )
 # KruskalResult(statistic=131.99214948486943, pvalue=1.459883736690209e-27)


AUPRC: Gibt es Unterschiede für die Performance in den Testdaten?

In [None]:
for i in [simulation_mse, simulation_mae, simulation_huber, simulation_ce, simulation_focal]:
  print("_______________________________________________________")
  print("Mean and standard deviation of roc_recall_precision")
  print(np.asarray(i[2]).mean())
  print(np.asarray(i[2]).std())

In [None]:
import scipy.stats as stats
'''
Null hypothesis that the population median of all of the groups are equal

Here: stopped epochs to evaluate the convergence ability 
'''
stats.kruskal(
    np.hstack(simulation_mse[2]),
    np.hstack(simulation_mae[2]),
    np.hstack(simulation_huber[2]),
    np.hstack(simulation_ce[2]),
    np.hstack(simulation_focal[2]),
     )
# KruskalResult(statistic=131.86117955510082, pvalue=1.557160300415864e-27)


Wenn die Nullhypothese des Kruskal-Wallis-Tests abgelehnt wird, bedeutet dies, dass sich mindestens einer der Gruppenmittelwerte signifikant von den anderen unterscheidet. Der Kruskal-Wallis-Test sagt jedoch nicht aus, welche Gruppe(n) sich unterscheiden. Um festzustellen, welche Gruppe(n) sich unterscheidet/unterscheiden, muss eine zusätzliche Analyse durchgeführt werden, z. B. ein Post-hoc-Test.

Es gibt mehrere Post-hoc-Tests, die zum Vergleich der Mittelwerte bestimmter Gruppenpaare verwendet werden können. Zu den gängigen Tests gehören der [Dunn-Test](https://scikit-posthocs.readthedocs.io/en/latest/generated/scikit_posthocs.posthoc_dunn), der [Conover-Test](https://scikit-posthocs.readthedocs.io/en/latest/generated/scikit_posthocs.posthoc_conover) und der [Steel-Dwass-Test](https://scikit-posthocs.readthedocs.io/en/latest/generated/scikit_posthocs.posthoc_dscf). Diese Tests verwenden die Rangdaten aus dem Kruskal-Wallis-Test, um festzustellen, welche Gruppenpaare signifikant unterschiedliche Mittelwerte aufweisen.

Alle diese Post-hoc-Tests können verwendet werden, um festzustellen, welche Gruppenpaare nach Durchführung des Kruskal-Wallis-Tests signifikant unterschiedliche Mittelwerte aufweisen.



In [None]:
pip install scikit-posthocs

### [Dunn-Test](https://scikit-posthocs.readthedocs.io/en/latest/generated/scikit_posthocs.posthoc_dunn)

Der Dunn-Test ist ein Post-hoc-Test, der die Mittelwerte aller Gruppenpaare unter Verwendung der Rangdaten aus dem Kruskal-Wallis-Test vergleicht. Er basiert auf der Differenz der Ränge zwischen den beiden zu vergleichenden Gruppen und passt sich durch Kontrolle der Falschentdeckungsrate an Mehrfachvergleiche an.

In [None]:
# last-epochs
import scikit_posthocs as sp

last_epochs = [np.hstack(simulation_mse[0]),
    np.hstack(simulation_mae[0]),
    np.hstack(simulation_huber[0]),
    np.hstack(simulation_ce[0]),
    np.hstack(simulation_focal[0])]

sp.posthoc_dunn(last_epochs, p_adjust = 'holm')

Beobachtung:


*   Der Focal-Loss entscheidet sich signifikant bzgl. Konvergenz von MSE und der CE 
*   Ansonsten wurde keine signifikanten Ergebnisse erzielt



In [None]:
# roc_auc
import scikit_posthocs as sp

roc_auc = [np.hstack(simulation_mse[1]),
    np.hstack(simulation_mae[1]),
    np.hstack(simulation_huber[1]),
    np.hstack(simulation_ce[1]),
    np.hstack(simulation_focal[1])]

sp.posthoc_dunn(roc_auc, p_adjust = 'holm')

In [None]:
# recall_precision_auc
import scikit_posthocs as sp

recall_precision_auc = [np.hstack(simulation_mse[2]),
    np.hstack(simulation_mae[2]),
    np.hstack(simulation_huber[2]),
    np.hstack(simulation_ce[2]),
    np.hstack(simulation_focal[2])]

sp.posthoc_dunn(recall_precision_auc, p_adjust = 'holm')

# Endresultat für breastW

| Performance                  | Loss       |
|------------------------------|------------|
| Schnellste Konvergenz?       | FL (gegenüber MSE und CE)         |
| Bester ROC-AUC?              | MSE & CE   |
| Bester ROC-Recall-Precision? | CE         |

Anhand dieser Simulation hat die CE die beste Performance gezeigt mit AUPRC und AUC. Der MSE ist in der Tabelle gelistet, da dieser sich mit der AUC-Metrik sich nicht signifikant zum CE-Modell unterscheidet. Der CE erzielt im Schnitt  die langsamste Konvergenz.

Aufgrund der Performance mit der AUPRC-Metrik hat die CE die beste Performance insgesamt gezeigt und wird als finalles Modell in dieser Simulation ausgewählt.

Der Focal-Loss hat zwar die schnellste Konvergenz aufgezeigt, performt im Gegensatz zu den anderen Modellen signifikant am schlechtesten.

