***
<center>Research Project</center>
<h1><center>Research Project</center></h1>
<h2><center>Enhancing actuarial non-life pricing models via transformers</center></h2>
<center>by Alexej Brauer </center>
<center>M.Sc. (TUM) / Aktuar DAV / CADS </center>

***

# This notebook will provide the code to reproduce the data cleaning/preparation and results of the paper: Enhancing actuarial non-life pricing models via transformers.

I used here the following work as a foundation:     
* Ronald Richman, Mario V. Wüthrich "LocalGLMnet: interpretable deep learning for tabular data" 2023
* Mario V. Wüthrich, M. Merz, "Statistical Foundations of Actuarial Learning and its Applications" 2023
* Gorishniy, Rubachev, Khrulkov, Babenko "Revisiting Deep Learning Models for Tabular Data" 2021
* Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, "Attention Is All You Need", NeurIPS 2017

# 1. Basic Setting:

## 1.1 Load packages:

In [1]:
# sys and os imports
import os
import sys
import platform
import subprocess
import re
import warnings

# display and plotting
from IPython.display import Image
import plotly.express as px
import plotly.graph_objects as go

# data
import numpy as np
import pandas as pd
from dataclasses import dataclass, field

# modelling
import random
# modelling scikit-learn
import sklearn as sk
from sklearn.linear_model import PoissonRegressor
from sklearn.preprocessing import StandardScaler
# modelling tensorflow
import tensorflow as tf
import keras
from keras.activations import (tanh, exponential, gelu)

# saveing & time
import pickle
import time
import datetime

# for multiprocessing
from multiprocessing import Process
import logging



## 1.2 Storage settings:

In [2]:
# set the path to the storage folder:
storage_path = "."

# import my classes for the ft-transformer models:
# ----------------------
sys.path.insert(1, f'{storage_path}/helper')
import main_model_classes as EnhActuar
import helper as helper

Mounted at /content/drive


## 1.3 Display settings:

In [3]:
pd.set_option('display.max_columns', None)

## 1.4 Get information about the system:

In [4]:
# if one has a kernel installed that supports gpu's uncomment this:
# -----------
# Printing the versions of the main packages:
print(f"Version of Python: ")
print("--------------------")
print(f"Python {sys.version}")
print()
print(f"Version of main Packages (full list below): ")
print("--------------------")
print(f"Tensor Flow Version: {tf.__version__}")
print(f"Keras Version: {keras.__version__}")
print(f"Pandas Version: {pd.__version__}")
print(f"Scikit-Learn: {sk.__version__}")

print()
print("Information about CPU: ")
print("--------------------")
# code from: https://stackoverflow.com/a/13078519
def get_processor_name():
    if platform.system() == "Windows":
        return platform.processor()
    elif platform.system() == "Darwin":
        os.environ['PATH'] = os.environ['PATH'] + os.pathsep + '/usr/sbin'
        command ="sysctl -n machdep.cpu.brand_string"
        return subprocess.check_output(command).strip()
    elif platform.system() == "Linux":
        command = "cat /proc/cpuinfo"
        all_info = subprocess.check_output(command, shell=True).decode().strip()
        for line in all_info.split("\n"):
            if "model name" in line:
                return re.sub( ".*model name.*:", "", line,1)
    return ""
print(get_processor_name())

print()
print("Information about GPU: ")
print("--------------------")
print("GPU is", "available" if len(tf.config.list_physical_devices('GPU'))>0 else "NOT AVAILABLE")
gpus = tf.config.list_physical_devices('GPU')
if len(gpus) != 0:
  tf.config.set_visible_devices(gpus[0], 'GPU')
  # Set GPU Device:
  tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
  tf.config.experimental.set_memory_growth(gpus[0], True)
  print(gpus)
  !nvidia-smi

Version of Python: 
--------------------
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]

Version of main Packages (full list below): 
--------------------
Tensor Flow Version: 2.14.0
Keras Version: 2.14.0
Pandas Version: 1.5.3
Scikit-Learn: 1.2.2

Information about CPU: 
--------------------
 Intel(R) Xeon(R) CPU @ 2.00GHz

Information about GPU: 
--------------------
GPU is available
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Thu Nov  9 22:26:17 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4       

For further details regarding the environment, see last chapter of this notebook.

## 1.5 Set random seeds:

In [5]:
def set_random_seeds(seed_nr):
    tf.random.set_seed(seed_nr)
    np.random.seed(seed_nr)
    random.seed(seed_nr)
    os.environ['PYTHONHASHSEED']=str(seed_nr)

set_random_seeds(42)

# create 15 random seeds for the 15 models
random_seeds = np.random.randint(0, 1000000000, 15)


# 2. Load & prepare the Data:

## 2.1 Load-File

We are using the same data and data preperation as described in this Paper/Book:
* 2023 Book by  M. V. Wüthrich, M. Merz, "Statistical Foundations of Actuarial Learning and its Applications"
* 2023 Richmann & Wüthrich: "LocalGLMnet: interpretable deep learning for tabular data"

We refer here to section 3.4 in then LocalGLMnet paper.
Note that we are not just downloading the French Motor Third Party Liability Data files from CASdatasets.
They are downloading another version and they describe why in the Footnote 2 of page 553 of the 2023 Book by  M. V. Wüthrich, M. Merz.  

So we download the data in the same way (the R code looks like this):
```R
--------------------------
library(OpenML)
library(farff)
library(feather)
freMTPL2freq <- getOMLDataSet(data.id = 41214)$data
freMTPL2sev<-getOMLDataSet(data.id = 41215)$data

str(freMTPL2freq)
str(freMTPL2sev)

# Save the Datasets as feather files
write_feather(freMTPL2freq, "./Data/freMTPL2freq.feather")
write_feather(freMTPL2sev, "./Data/freMTPL2sev.feather")
```

In Python we are now loading in the feather files:

In [6]:
df_freq = pd.read_feather(f'{storage_path}/Data/freMTPL2freq.feather')
df_sev = pd.read_feather(f'{storage_path}/Data/freMTPL2sev.feather')

## 2.2 Data Cleaning

Now it gets a bit complicated:

If one wants to replicate the Results of these Papers (1):
* 2018 Noll Case Study: "French Motor Third-Party Liability Claims"
* 2019 Schelldorfer Paper: "Nesting Classical Actuarial Models into Neural Networks"
* 2020 Wüthrich Paper: "From Generalized Linear Models to Neural Networks, and Back"

Then:
1. One does not delete raws/lines of this data set
2. One uses the ClaimNb as it was in the original dataset

Whereas if one wants to replicate the Results of these Papers (2):
* 2023 Book by  M. V. Wüthrich, M. Merz, "Statistical Foundations of Actuarial Learning and its Applications"
* 2023 Richmann & Wüthrich: "LocalGLMnet: interpretable deep learning for tabular data"

Then:
1. One uses ClaimNb as aggregation of claim from the freMTPL2sev dataset
2. One does delete raws/lines of this data set that have more than 4 claim

Since this notebook focuses on the second 2 papers the data-prep part is done in the same why as described there.

They are doing here some basic data cleaning that we will also do before we go into the actually data preperation chapter.
For the original R-Code for the data cleaning we refer here to Listing 13.1 of 2023 Book by  M. V. Wüthrich, M. Merz. Regarding the summary of the data please see there Listing 13.2.

In [7]:
# drop the column ClaimNb from df_freq:
df_freq = df_freq.drop(columns=["ClaimNb"])
# convert the column "VehGas" into categorical:
df_freq["VehGas"] = df_freq["VehGas"].astype("category")
# create a temporary dataframe with the column IDpol and the number of claims per policy:
temp_df_ClaimNb_from_sev_df = pd.DataFrame(df_sev["IDpol"].value_counts()).reset_index()
temp_df_ClaimNb_from_sev_df.columns = ["IDpol", "ClaimNb_from_sev_df"]
# we merge the two dataframes so that we have the column "ClaimNb_from_sev_df" in df_freq:
df_freq = pd.merge(df_freq, temp_df_ClaimNb_from_sev_df, on='IDpol', how='left')
df_freq["ClaimNb_from_sev_df"] = df_freq["ClaimNb_from_sev_df"].fillna(0)
# rename the column "ClaimNb_from_sev_df" to "ClaimNb":
df_freq = df_freq.rename(columns={"ClaimNb_from_sev_df":"ClaimNb"})
# replace all nan values of numerical columns in the dataframe with 0:
for col in df_freq.select_dtypes(include=['number']).columns:
    df_freq[col] = df_freq[col].fillna(0)
# restrict the dataframe to those raws that have a ClaimNb smaller or equal to 5:
df_freq = df_freq[df_freq["ClaimNb"]<=5]
# if exposure is bigger then 1 set it to 1:
df_freq.loc[df_freq["Exposure"]>1,"Exposure"] = 1
# reordering the categories of the column VehBrand to "B1","B2","B3","B4","B5","B6","B10","B11","B12","B13","B14":
df_freq["VehBrand"] = df_freq["VehBrand"].cat.reorder_categories(["B1","B2","B3","B4","B5","B6","B10","B11","B12","B13","B14"])


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_freq["VehBrand"] = df_freq["VehBrand"].cat.reorder_categories(["B1","B2","B3","B4","B5","B6","B10","B11","B12","B13","B14"])


They described in the paper that they have after data cleaning the claim counts, time exposures and feature information, with  
> six continuous feature components (called ‘Area Code’, ‘Bonus-Malus Level’, ‘Density’, ‘Driver’s Age’, ‘Vehicle Age’, ‘Vehicle Power’), 1 binary component (called ‘Vehicle Gas’) and two categorical components with more than two levels (called ‘Vehicle Brand’ and ‘Region’).

Note that in the listing of the book they change area code at another stage, but will transform already here since the LocalGLMnet paper says so:  

In [8]:
df_freq["Area"] = df_freq["Area"].map({"A":1,"B":2,"C":3,"D":4,"E":5,"F":6}).astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_freq["Area"] = df_freq["Area"].map({"A":1,"B":2,"C":3,"D":4,"E":5,"F":6}).astype(int)


Sort the Dataset:

In [9]:
# sort the dataframe by IDpol:
df_freq = df_freq.sort_values(by=["IDpol"])

## 2.3 Data Exploration

After saving and loading the dataframe as a feather file in python and doing the same small datacleaning part as described in the book the summary looks like this:

In [10]:
df_freq.head()

Unnamed: 0,IDpol,Exposure,Area,VehPower,VehAge,DrivAge,BonusMalus,VehBrand,VehGas,Density,Region,ClaimNb
0,1.0,0.1,4,5.0,0.0,55.0,50.0,B12,Regular,1217.0,R82,0.0
1,3.0,0.77,4,5.0,0.0,55.0,50.0,B12,Regular,1217.0,R82,0.0
2,5.0,0.75,2,6.0,2.0,52.0,50.0,B12,Diesel,54.0,R22,0.0
3,10.0,0.09,2,7.0,0.0,46.0,50.0,B12,Diesel,76.0,R72,0.0
4,11.0,0.84,2,7.0,0.0,46.0,50.0,B12,Diesel,76.0,R72,0.0


In [11]:
df_freq.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 678007 entries, 0 to 678012
Data columns (total 12 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   IDpol       678007 non-null  float64 
 1   Exposure    678007 non-null  float64 
 2   Area        678007 non-null  int64   
 3   VehPower    678007 non-null  float64 
 4   VehAge      678007 non-null  float64 
 5   DrivAge     678007 non-null  float64 
 6   BonusMalus  678007 non-null  float64 
 7   VehBrand    678007 non-null  category
 8   VehGas      678007 non-null  category
 9   Density     678007 non-null  float64 
 10  Region      678007 non-null  category
 11  ClaimNb     678007 non-null  float64 
dtypes: category(3), float64(8), int64(1)
memory usage: 53.7 MB


In [12]:
df_freq.describe(include="all")

Unnamed: 0,IDpol,Exposure,Area,VehPower,VehAge,DrivAge,BonusMalus,VehBrand,VehGas,Density,Region,ClaimNb
count,678007.0,678007.0,678007.0,678007.0,678007.0,678007.0,678007.0,678007,678007,678007.0,678007,678007.0
unique,,,,,,,,11,2,,22,
top,,,,,,,,B12,Regular,,R24,
freq,,,,,,,,166024,345871,,160601,
mean,2621857.0,0.528547,3.289692,6.454653,7.044218,45.499061,59.761588,,,1792.430975,,0.038913
std,1641789.0,0.364081,1.382689,2.050902,5.666235,14.137492,15.6367,,,3958.663031,,0.204752
min,1.0,0.002732,1.0,4.0,0.0,18.0,50.0,,,1.0,,0.0
25%,1157948.0,0.18,2.0,5.0,2.0,34.0,50.0,,,92.0,,0.0
50%,2272153.0,0.49,3.0,6.0,6.0,44.0,50.0,,,393.0,,0.0
75%,4046278.0,0.99,4.0,7.0,11.0,55.0,64.0,,,1658.0,,0.0


In [13]:
for column in df_freq.columns.drop(["IDpol","Exposure","Density"]):
    value_counts = df_freq[column].value_counts(dropna=False).sort_index()
    fig = px.bar(x=value_counts.index, y=value_counts.values)
    fig.update_layout(
        title=f"Value Counts Bar Chart of: {column}",
        xaxis_title=f"{column}",
        yaxis_title="Count",
        showlegend=False)
    fig.show()
column="Density"
fig = px.histogram(df_freq[column])
fig.update_layout(
    title=f"Histogram {column}",
    xaxis_title="Values",
    yaxis_title="Frequency",
    showlegend=False
)


Output hidden; open in https://colab.research.google.com to view.

Quick test if the column IDPol is unique: yes they are unique :)

In [14]:
df_freq["IDpol"].value_counts().max()

1

## 2.4 Split: Learn/Val/Test definition:

In the LocalGLMnet paper they are mentioning that they do it exactly like it is done in the Book Wüthrich & Merz (2021):  
> To do a proper out-of-sample generalization analysis we partition the data randomly into a learning data set $L$ and a test data set $T$ . The learning data L contains
$n = 610,206$ instances and the test data set $T$ contains $67,801$ instances; we use exactly the same split as in Table 5.2 of Wüthrich & Merz (2021). The learning data L will be used to learn the network parameters and the test data $T$ is used to perform an out-of-sample generalization analysis.

So to get the same results I also run the code splitting code in R instead of python and exported the splitting feature to later then import it to python.
The R Code looked like this to reproduce the results (Note here the RNGversion!):

```R

RNGversion("3.5.0")
set.seed(100)
ll_replicate_papers_2 <- sample (c(1: nrow(freMTPL2freq)) , round(0.9* nrow(freMTPL2freq)), replace = FALSE)
learn <- freMTPL2freq[ll_replicate_papers_1 ,]
test <- freMTPL2freq[-ll_replicate_papers_1 ,]

# Save the list to a text file
write.table(learn$IDpol, "./Data/learn_split_IDpols_2.txt", row.names = FALSE, col.names = FALSE)
```


In [15]:
ids_in_learn = list(np.genfromtxt(f"{storage_path}/Data/learn_split_IDpols_2.txt").astype(int))
ids_in_test = list(df_freq[~df_freq["IDpol"].isin(ids_in_learn)]["IDpol"].astype(int))

bool_in_learn = df_freq['IDpol'].isin(ids_in_learn) # be careful if the dataset is not sorted by IDpol
bool_in_test = df_freq['IDpol'].isin(ids_in_test) # be careful if the dataset is not sorted by IDpol

In [16]:
display(f"The learning data L contains so many instances: {len(ids_in_learn)}")
display(f"The test data T contains so many instances: {len(ids_in_test)}")

freq_learn = df_freq[bool_in_learn]['ClaimNb'].sum()/df_freq[bool_in_learn]['Exposure'].sum()
freq_test = df_freq[bool_in_test]['ClaimNb'].sum()/df_freq[bool_in_test]['Exposure'].sum()
display(f"Test the resulting portfolio freq (w.r.t Exposure) in learn df: {freq_learn: .2%}")
display(f"Test the resulting portfolio freq (w.r.t Exposure) in test df: {freq_test: .2%}")

'The learning data L contains so many instances: 610206'

'The test data T contains so many instances: 67801'

'Test the resulting portfolio freq (w.r.t Exposure) in learn df:  7.36%'

'Test the resulting portfolio freq (w.r.t Exposure) in test df:  7.35%'

We add also a split of the the learn dataset into train and validation: (90% / 10%). In the LocalGLMnet Paper I didn't found a specific split here so a create a new one. We create here 15 train/val splits because we want to fit 15 differant models.

In [17]:
# create 15 new train and validation split with sklearn:
train_val_split = {}
for run_index in range(15):
  temp_learn_train, temp_learn_val = sk.model_selection.train_test_split(df_freq[bool_in_learn][['IDpol']],
                                                                        test_size=0.1,
                                                                        random_state=random_seeds[run_index])
  train_val_split[f"learn_train_{run_index}"] = df_freq['IDpol'].isin(temp_learn_train['IDpol']) # be careful if the dataset is not sorted by IDpol
  train_val_split[f"learn_val_{run_index}"]  = df_freq['IDpol'].isin(temp_learn_val['IDpol']) # be careful if the dataset is not sorted by IDpol

print("Example train/validation split freq: ")
freq_learn_train = df_freq[train_val_split[f"learn_train_{run_index}"]]['ClaimNb'
                          ].sum()/df_freq[train_val_split[f"learn_train_{run_index}"]]['Exposure'].sum()
freq_learn_val = df_freq[train_val_split[f"learn_val_{run_index}"]]['ClaimNb'
                        ].sum()/df_freq[train_val_split[f"learn_val_{run_index}"]]['Exposure'].sum()

display(f"Test the resulting portfolio freq (w.r.t Exposure) in learn df: {freq_learn: .2%}")
display(f"Test the resulting portfolio freq (w.r.t Exposure) in learn-train df: {freq_learn_train: .2%}")
display(f"Test the resulting portfolio freq (w.r.t Exposure) in learn-val df: {freq_learn_val: .2%}")
del temp_learn_train, temp_learn_val

Example train/validation split freq: 


'Test the resulting portfolio freq (w.r.t Exposure) in learn df:  7.36%'

'Test the resulting portfolio freq (w.r.t Exposure) in learn-train df:  7.37%'

'Test the resulting portfolio freq (w.r.t Exposure) in learn-val df:  7.28%'

Also create a really small learn set (for dummy training of transformers - basically just to check if the code is running).

In [18]:
# create a new train and test split with sklearn:
temp_1, temp_lean_train_dummy = sk.model_selection.train_test_split(df_freq[train_val_split[f"learn_train_{run_index}"]][['IDpol']],
                                                                    test_size=0.01,
                                                                    random_state=random_seeds[0]+1)
bool_in_learn_train_dummy = df_freq['IDpol'].isin(temp_lean_train_dummy['IDpol']) # be careful if the dataset is not sorted by IDpol

freq_learn_train_dummy = df_freq[bool_in_learn_train_dummy]['ClaimNb'].sum()/df_freq[bool_in_learn_train_dummy]['Exposure'].sum()

display(f"Test the resulting portfolio freq (w.r.t Exposure) in learn train dummy df: {freq_learn_train_dummy: .2%}")
del temp_1, temp_lean_train_dummy

'Test the resulting portfolio freq (w.r.t Exposure) in learn train dummy df:  7.66%'

## 2.5 Data-Preperation for GLMs

We use the same data preperation as described in the Book by Wüthrich & Merz (2023)


In [19]:
# Copy the dataframe df_freq:
df_freq_glm = df_freq.copy()
# Area:
# is already numerical (due to the mapping above)
# VehPower:
temp_dict_change_VehPower={}
for i,v in enumerate(sorted(df_freq["VehPower"].unique())):
    if v <9:
        temp_dict_change_VehPower[v]=i+1
    else:
        temp_dict_change_VehPower[v]=6
df_freq_glm["VehPower"] = df_freq["VehPower"].map(temp_dict_change_VehPower).astype('category')
# VehAge:
# note: this part is different from the one in these papers:
# * 2018 Noll Case Study: "French Motor Third-Party Liability Claims"
# * 2019 Schelldorfer Paper: "Nesting Classical Actuarial Models into Neural Networks"
# * 2020 Wüthrich Paper: "From Generalized Linear Models to Neural Networks, and Back"
bins = [0, 6, 13, float('inf')]
labels = ['[0, 6)', '[6, 13)', '[13, ∞)']
df_freq_glm["VehAge"] = pd.cut(df_freq["VehAge"],bins=bins, labels=labels, right=False).astype('category')
# DrivAge:
bins = [18, 21, 26, 31, 41, 51, 71, float('inf')]
labels = ['[18, 21)', '[21, 26)', '[26, 31)', '[31, 41)', '[41, 51)', '[51, 71)', '[71, ∞)']
df_freq_glm["DrivAge"] = pd.cut(df_freq["DrivAge"],bins=bins, labels=labels, right=False).astype('category')
df_freq_glm["DrivAge_Nr"] = df_freq["DrivAge"]
# BonusMalus:
df_freq_glm.loc[df_freq_glm["BonusMalus"] >= 150, "BonusMalus"] = 150
# VehBrand:
# is already categorical (due to the reordering above)
# VehGas:
# is already categorical (due to the cast above)
# Density:
df_freq_glm["Density"] = np.log(df_freq_glm["Density"])
# Region:
# is already categorical

# check if we have the same number of features that we need for the glms as in the paper:
'''
print("Check if we have the same number of features that we need for the glms as in the paper")
print("------------")
test_dim_feature_space = 0
for col in df_freq_glm.select_dtypes(include=[int,float]).columns.drop(["IDpol","ClaimNb","Exposure","DrivAge_Nr"]):
    display(f"Dimensions for feature space of {col}: 1")
    test_dim_feature_space+=1
for col in df_freq_glm.select_dtypes(include=['category']).columns:
    display(f"Dimensions for feature space of {col}: {len(df_freq_glm[col].cat.categories)-1}")
    test_dim_feature_space=test_dim_feature_space+len(df_freq_glm[col].cat.categories)-1
display(f"Total dimensions for feature space: {test_dim_feature_space}")
'''

# Dummy encode all categorical variable for GLM1:
X_glm1 = pd.get_dummies(df_freq_glm, columns=df_freq_glm.select_dtypes(include=['category']).columns,drop_first=True).drop(columns=["IDpol","ClaimNb","Exposure","DrivAge_Nr"])
X_glm1_learn = X_glm1[bool_in_learn]
X_glm1_test = X_glm1[bool_in_test]

# Create the new DrivAge (power and log) columns for GLM2:
columns_to_drop = [col for col in X_glm1.columns if col.startswith('DrivAge_')]
X_glm2 = X_glm1.drop(columns=columns_to_drop)
X_glm2["DrivAge_1"] = df_freq_glm["DrivAge_Nr"]
X_glm2["DrivAge_2"] = df_freq_glm["DrivAge_Nr"]**2
X_glm2["DrivAge_3"] = df_freq_glm["DrivAge_Nr"]**3
X_glm2["DrivAge_4"] = df_freq_glm["DrivAge_Nr"]**4
X_glm2["DrivAge_log"] = np.log(df_freq_glm["DrivAge_Nr"])
X_glm2_learn = X_glm2[bool_in_learn].reset_index(drop=True)
means_DrivAge_learn = X_glm2_learn[[col for col in X_glm2_learn.columns if col.startswith('DrivAge_')]].mean()
for col in X_glm2_learn.columns:
    if col.startswith('DrivAge_'):
        X_glm2[col] = np.array(X_glm2[col]/means_DrivAge_learn[col])
X_glm2_learn = X_glm2[bool_in_learn].reset_index(drop=True)
X_glm2_test = X_glm2[bool_in_test].reset_index(drop=True)

# Adding interaction columns to the data frame for GLM3:
# one has to be careful here since the dataframes before are reindexed:
X_glm3 = X_glm2.copy()
X_glm3["DrivAge_1_x_BonusMalus"] = list(df_freq_glm["BonusMalus"]*df_freq_glm["DrivAge_Nr"])
X_glm3["DrivAge_2_x_BonusMalus"] = list(df_freq_glm["BonusMalus"]*df_freq_glm["DrivAge_Nr"]**2)
X_glm3_learn = X_glm2_learn.copy()
X_glm3_learn["DrivAge_1_x_BonusMalus"] = list(df_freq_glm[bool_in_learn]["BonusMalus"]*df_freq_glm[bool_in_learn]["DrivAge_Nr"])
X_glm3_learn["DrivAge_2_x_BonusMalus"] = list(df_freq_glm[bool_in_learn]["BonusMalus"]*df_freq_glm[bool_in_learn]["DrivAge_Nr"]**2)
means_DrivAge_x_BonusMalus_learn = X_glm3_learn[["DrivAge_1_x_BonusMalus", "DrivAge_2_x_BonusMalus"]].mean()
for col in list(means_DrivAge_x_BonusMalus_learn.index):
    X_glm3[col] = np.array(X_glm3[col]/means_DrivAge_x_BonusMalus_learn[col])
X_glm3_learn = X_glm3[bool_in_learn].reset_index(drop=True)
X_glm3_test = X_glm3[bool_in_test].reset_index(drop=True)


## 2.6 Data-Preperation as described in the LocalGLMnet Paper:

In the LocalGLMnet paper they write regarding the data pre-processing:

> We pre-process these components as follows: we center and normalize
to unit variance the six continuous and the binary components. We apply one-hot encoding to the
two categorical variables, we emphasize that we do not use dummy coding as it is usually done in
GLMs. Below, in Section 3.6, we are going to motivate this one-hot encoding choice (which does not
lead to full rank design matrices); for one-hot encoding vs. dummy coding we refer to formulas (5.21)
and (7.29) in Wüthrich & Merz (2021).


> As a control variable, we add two random feature components that are $i.i.d.$, centered and with unit
variance, the first one having a uniform distribution and the second one having a standard normal
distribution, we call these two additional feature components ‘RandU’ and ‘RandN’. We consider
two additional independent components to understand whether the distributional choice influences
the results of hypothesis testing using the empirical interval $I_\alpha$, see (16).
Altogether (and using one-hot encoding) we receive q = 42 dimensional tabular feature variables $x_i ∈ R^q$; this includes the two additional components RandU and RandN.



So we try now to replicate it:

In [20]:
df_freq_prep_nn = df_freq.copy()
# change VehGas to binary:
df_freq_prep_nn["VehGas"] = df_freq_prep_nn["VehGas"].map({"Diesel":1,"Regular":0}).astype(int)

nr_col = ["Area", "VehPower", "VehAge", "DrivAge", "BonusMalus", "VehGas", "Density"]
cat_col = ["VehBrand", "Region"]

# Note: StandardScaler : = (x-mean)/standard_deviation
# Since it is good practice we are training the standardscaler (mean and standard_deviation) only the training data and apply it on the hole dataset (including the test data)
prep_standardscaler = StandardScaler()
prep_standardscaler.fit(df_freq_prep_nn[bool_in_learn][nr_col])

df_freq_prep_nn[nr_col] = prep_standardscaler.transform(df_freq_prep_nn[nr_col])
# add the dummy columns to the df_freq_prep_nn dataframe:
df_freq_prep_nn = pd.concat([df_freq_prep_nn.drop(columns=cat_col),
                             pd.get_dummies(df_freq_prep_nn[cat_col], columns=cat_col, drop_first=False).astype(int)
                             ], axis=1)
# add back the for the categorical columns that have been dropped above:
df_freq_prep_nn[list(map(lambda item: "Cat_" + item, cat_col))] = df_freq[cat_col]
cat_col = list(map(lambda item: "Cat_" + item, cat_col))

Add to random features as columns. One that is normal distributed with mean = 0 and variance = 1 and one that is uniform distributed with mean = 0 and variance = 1.

Note that the variance of the uniform distribution is $\displaystyle{\frac{1}{12}}(b-a)^{2}$. So we choose $b=\frac{\sqrt{12}}{2}$

In [21]:
# create a random column that is centered around 0 and has a standard deviation of 1 and has uniform distribution:
df_freq_prep_nn["RandU"] = np.random.uniform(-np.sqrt(12)/2,np.sqrt(12)/2,len(df_freq_prep_nn))
# create a random column that is centered around 0 and has a standard deviation of 1 and has normal distribution:
df_freq_prep_nn["RandN"] = np.random.normal(0,1,len(df_freq_prep_nn))

column="RandU"
fig = px.histogram(df_freq_prep_nn[column])
fig.update_layout(
    title=f"Histogram {column}",
    xaxis_title="Values",
    yaxis_title="Frequency",
    showlegend=False
)
fig.show()

column="RandN"
fig = px.histogram(df_freq_prep_nn[column])
fig.update_layout(
    title=f"Histogram {column}",
    xaxis_title="Values",
    yaxis_title="Frequency",
    showlegend=False
)
fig.show()

Output hidden; open in https://colab.research.google.com to view.

Check if all numerical features now have the right mean and variance:

In [22]:
print(df_freq_prep_nn[nr_col + ["RandU","RandN"]].mean())
print(df_freq_prep_nn[nr_col + ["RandU","RandN"]].var())


Area         -0.000056
VehPower     -0.000085
VehAge       -0.000090
DrivAge       0.000082
BonusMalus   -0.000078
VehGas       -0.000212
Density      -0.000675
RandU         0.000565
RandN         0.000442
dtype: float64
Area          0.999080
VehPower      0.999729
VehAge        0.997462
DrivAge       1.000098
BonusMalus    1.000379
VehGas        0.999993
Density       0.996537
RandU         1.000032
RandN         1.001442
dtype: float64


Adding some encodings for the categorical features, in case we want to use later some embeddings:

In [23]:
cat_encoder_all = {}
for col in ["VehBrand", "Region"]:
    cat_encoder = {}
    unique_cat = df_freq.dtypes[col].categories.to_list()
    for i in range(len(unique_cat)):
        cat_encoder[unique_cat[i]] = i
    cat_encoder_all[col]=cat_encoder # we save the encoder dict incase we will need it later to back transform the results.
    df_freq_prep_nn[f"NN_EMB_{col}"] = df_freq[col].map(cat_encoder_all[col]).astype(int)

Creating the learning and test datasets for the neural network models:

Note we are not creating here every train and validation split dataset but instead create those when fitting the model.
So that we are not polluting the RAM.

In [24]:
# Note we are not creating here every train and validation split dataset but instead create those when fitting the model.
# So that we are not polluting the RAM (notebooks have no garbage collector).

# Create Datasets for OHE FNN:
# ------------------------
col_x_fnn_ohe = nr_col + [col for col in df_freq_prep_nn.columns if col.startswith('VehBrand_') or col.startswith('Region_')]
def create_ffn_ohe_data(bool_list, exposure_name="Exposure", response_name="ClaimNb"):
    X_nn_ohe = np.array(df_freq_prep_nn[bool_list][col_x_fnn_ohe].values)
    exposure = np.array(df_freq[bool_list][exposure_name])
    y_true= np.array(df_freq[bool_list][response_name])
    return [X_nn_ohe, exposure], y_true


# Create Datasets for cat embedding FNN:
# ------------------------
def create_ffn_cat_emb_data(bool_list, exposure_name="Exposure", response_name="ClaimNb"):
    X_nn_just_nr = np.array(df_freq_prep_nn[bool_list][nr_col].values),
    Input_EMB_VehBrand = np.array(df_freq_prep_nn[bool_list]["NN_EMB_VehBrand"].values)
    Input_EMB_Region = np.array(df_freq_prep_nn[bool_list]["NN_EMB_Region"].values)
    exposure = np.array(df_freq_prep_nn[bool_list][exposure_name])
    y_true = np.array(df_freq_prep_nn[bool_list][response_name])
    return [X_nn_just_nr, Input_EMB_VehBrand, Input_EMB_Region, exposure], y_true



## 2.7 Data-Preperation for Transformer models:

Since the FT transformer need the every feature as a separate tensor we create a new tensor dataset:

Note we are not creating here every train and validation split dataset but instead create those when fitting the model.
So that we are not polluting the RAM. Note that notebooks have usually no garbage collector, so we try to be carefull.

In [25]:
def df_to_tensor(df: pd.DataFrame, feature_cols: list, exposure: str=None, target: str=None, batch_size: int = 512, dummy_data_for_build=False):
    """
    transforms the pandas dataframe to a tensorflow dataset as input for the model

    Args:
        df (pd dataframe): the pandas dataframe that includes the features
        feature_cols (list): the list of feature columns that should be included in the model
        exposure (str): if the exposure is included it will be used a a separate input (if None it will be ignored)
        target (str): if the target is included it will be used in as a separate input (if None it will be ignored)
        batch_size (int): the batch size for the tensorflow dataset
        dummy_data_for_build (bool): build a dummy dataset for the model (only for building the model) that is not prefetched (default: False)

    Returns:
        tensorflow Dataset (Prefetched and Batched)
    """
    if exposure:
        feature_cols = feature_cols+[exposure]
    temp_dict = {k.lower(): np.array(v).reshape(-1, 1).astype(np.float32, copy=False)
                            if v.dtype in ["float64","float32","int64","int32"] else
                            np.array(v).reshape(-1, 1) for k, v in df[feature_cols].items()}
    if target:
        temp_input = (temp_dict, np.array(df[target]))
    else:
        temp_input = (temp_dict)
    tf_dataset = tf.data.Dataset.from_tensor_slices(temp_input) # create the tf dataset
    tf_dataset = tf_dataset.batch(batch_size) # for parallelizing the calc
    if dummy_data_for_build == False:
        tf_dataset = tf_dataset.prefetch(batch_size) # Prefetch the data for better performance (helps to overlaps the data preprocessing and model execution)
    return tf_dataset

cat_vocabulary = {}
for c in cat_col:
    cat_vocabulary[c] = df_freq_prep_nn.dtypes[c].categories.tolist()


## 2.8 Loss function definition:  

In the paper there are mentioning that they are using the poisson loss function:

> As loss function for parameter fitting and generalization analysis we choose the Poisson deviance loss, which is a distribution adapted and strictly consistent loss function for the mean within the Poisson model, for details we refer to Section 4.1.3 in Wüthrich & Merz (2021).

So we create quickly the loss function in this step.
* Note: on could also just use sklearn.metrics import mean_poisson_deviance
* Note the mean of $d(y, \mu) = 2*\left(y \log \frac{y}{\mu} - y + \mu\right)$: is the same as the formula (5.28) in the Book (2023)


In [26]:
# Loss-function (for numpy arrays)
# ----------------------
def poisson_deviance_loss(y_true, y_pred):
    with np.errstate(divide='ignore'):
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", category=RuntimeWarning)
            xlogy = np.where(y_true != 0, y_true * np.log(y_true / y_pred), 0)
            dev = 2 * (xlogy - y_true + y_pred)
    return dev.mean()

# Loss-Function
# ----------------------
# we use our own loss function here (because it is not included in tensorflow in the same way):
# normally here i would use the tf loss class (using the LossFunctionWrapper but this does not work on colab
# since there is no @keras_export() in the source code...):
# @keras.saving.register_keras_serializable(package="my_package", name="poisson_loss_for_tf")
@tf.function()
def poisson_loss_for_tf(y_true, y_pred, mean=True):
    """Computes the Poisson loss between y_true and y_pred.

    The Poisson loss is the mean of the elements of the `Tensor`
    `2 * (y_true * log(y_true / y_pred) - y_true + y_pred)`.

    Args:
        y_true: A tensor of true values with shape (batch_size,).
        y_pred: A tensor of predicted values with shape (batch_size,).

    Returns:
        The Poisson loss between y_true and y_pred.
   """
   # NOTE: this squeeze is not very professional :) but it does its job right now...
    ''' TODO: check if this commented squeeze is needed or not?
    if y_pred.shape != y_true.shape:
        if y_pred.ndim > y_true.ndim:
            y_pred = tf.squeeze(y_pred, [-1])
        elif y_pred.ndim < y_true.ndim:
                y_true = tf.squeeze(y_true, [-1])
    '''
    if y_pred.shape != y_true.shape:
        y_pred = tf.squeeze(y_pred, [-1])
    y_pred = tf.convert_to_tensor(y_pred)
    y_true = tf.cast(y_true, y_pred.dtype)
    loss = 2 * (y_true * tf.math.log((y_true + keras.backend.epsilon()) / (y_pred + keras.backend.epsilon())) - y_true + y_pred)
    if mean:
        return keras.backend.mean(loss, axis=-1)
    else:
        return loss

# Loss Function Wrapper
# ----------------------
class Poisson_loss_for_tf_Wrapped:
    def __init__(self, y_true=None, y_pred=None, name="poisson_loss_for_tf"):
        self.name = name
        self.y_true = y_true
        self.y_pred = y_pred
    def __call__(self, y_true, y_pred):
        return poisson_loss_for_tf(y_true, y_pred)


# Loss Metrics.
# ----------------------
# See here: https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Metric
class Poisson_Metric_for_tf(tf.keras.metrics.Metric):
    def __init__(self, name='mae', **kwargs):
        super(Poisson_Metric_for_tf, self).__init__(name=name, **kwargs)
        self.total = self.add_weight(name='total', initializer='zeros')
        self.count = self.add_weight(name='count', initializer='zeros')

    def update_state(self, y_true, y_pred, sample_weight=None):
        batch_poisson_loss = poisson_loss_for_tf(y_true, y_pred,mean=False)
        sum_batch_poisson_loss = tf.reduce_sum(batch_poisson_loss)
        num_samples = tf.cast(tf.size(y_true), tf.float32)
        if sample_weight is not None:
            raise ValueError('Code for sample_weight is not jet implemented')
        self.total.assign_add(sum_batch_poisson_loss)
        self.count.assign_add(num_samples)

    def result(self):
        return self.total / self.count

    def reset_states(self):
        self.total.assign(0)
        self.count.assign(0)


# note that the function tf.keras.losses.Poisson() is not the same as the poisson_deviance_loss function above.
# the function tf.keras.losses.Poisson() is the same as mean(y_pred - y_true * tf.math.log(y_pred + 1e-10)).

## 2.9 Initialize container for results:
(dataframe/hash-table/functions that help to store results)

In [27]:
# init hash tables for results
y_pred = {}
y_pred["train"]={}
y_pred["test"]={}

y_true = {}
y_true["train"] = np.array(df_freq[bool_in_learn]["ClaimNb"])
y_true["test"] = np.array(df_freq[bool_in_test]["ClaimNb"])

exposure = {}
exposure["train"] = np.array(df_freq[bool_in_learn]["Exposure"])
exposure["test"] = np.array(df_freq[bool_in_test]["Exposure"])

log_exposure = {}
log_exposure["train"] = np.array(np.log(df_freq[bool_in_learn]["Exposure"]))
log_exposure["test"] = np.array(np.log(df_freq[bool_in_test]["Exposure"]))

epochs_and_time = {}

df_results = pd.DataFrame(columns=["model",
                                   "epochs",
                                   "run_time",
                                   "# parameters",
                                   "poisson deviance loss: train",
                                   "poisson deviance loss: test",
                                   f"pred-avg-freq: train (obs = {freq_learn: .2%})",
                                   f"pred-avg-freq: test (obs = {freq_test: .2%})"])

# create a python data class to store the results:
@dataclass
class Results:
    model: str
    epochs: int = field(default=None)
    run_time: float = field(default=None)
    nr_parameters: int = field(default=None)
    poisson_deviance_loss_train: float = field(default=None)
    poisson_deviance_loss_test: float = field(default=None)
    pred_avg_freq_train: float = field(default=None)
    pred_avg_freq_test: float = field(default=None)

# create a function that stores the results in a dataframe not using append since dataframe object has no attribute append:
def store_results_in_df(results):
    global df_results
    global freq_learn
    global freq_test
    if len(df_results[df_results["model"]!=results.model])==0:
        df_results = pd.DataFrame({"model":results.model,
                                            "epochs":results.epochs,
                                            "run_time":results.run_time,
                                            "nr_parameters":results.nr_parameters,
                                            "loss_train":results.poisson_deviance_loss_train,
                                            "loss_test":results.poisson_deviance_loss_test,
                                            f"pred_avg_freq_train":results.pred_avg_freq_train,
                                            f"pred_avg_freq_test":results.pred_avg_freq_test},
                                  index=[0])
    else:
        df_results = pd.concat([df_results[df_results["model"]!=results.model],
                                pd.DataFrame({"model":results.model,
                                                "epochs":results.epochs,
                                                "run_time":results.run_time,
                                                "nr_parameters":results.nr_parameters,
                                                "loss_train":results.poisson_deviance_loss_train,
                                                "loss_test":results.poisson_deviance_loss_test,
                                                f"pred_avg_freq_train":results.pred_avg_freq_train,
                                                f"pred_avg_freq_test":results.pred_avg_freq_test},
                                             index=[0])
                                ], ignore_index=True).reset_index(drop=True)


def calc_avg_df(list_models):
    for i, model in enumerate(list_models):
        filtered_results = df_results[df_results['model'].str.startswith(model)]
        averages = pd.DataFrame(filtered_results.select_dtypes(include=['number']).mean()).T
        averages.insert(0, 'model', model)
        if i == 0:
            df_avg = averages
        else:
            df_avg = pd.concat([df_avg, averages], ignore_index=True)
    return df_avg


def calc_std_df(list_models):
    for i, model in enumerate(list_models):
        filtered_results = df_results[df_results['model'].str.startswith(model)]
        averages = pd.DataFrame(filtered_results.select_dtypes(include=['number']).std()).T
        averages.insert(0, 'model', model)
        if i == 0:
            df_std = averages
        else:
            df_std = pd.concat([df_std, averages], ignore_index=True)
    return df_std



# 3. Benchmark-Models:

Note for a lot of the following models we use the same model architecture as described in the Book by Wüthrich & Merz (2023)

## 3.1 Mean-Model:

Note we run the code 15 times to get the results for an average of the runtimes.

In [31]:
for run_index in range(15):
    start_time = time.time()
    constant_model=df_freq[bool_in_learn]['ClaimNb'].sum()/df_freq[bool_in_learn]['Exposure'].sum()
    end_time = time.time()
    execution_time_mean_model = end_time - start_time

    y_pred["train"]["homogeneous model"] = constant_model*exposure["train"]
    y_pred["test"]["homogeneous model"] = constant_model*exposure["test"]

    mean_model_results = Results(model=f"homogeneous model (run: {run_index})",
                                    epochs=0,
                                    run_time=execution_time_mean_model,
                                    nr_parameters=1,
                                    poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"]["homogeneous model"]),
                                    poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"]["homogeneous model"]),
                                    pred_avg_freq_train=constant_model,
                                    pred_avg_freq_test=constant_model)
    # store the results in the dataframe:
    store_results_in_df(mean_model_results)

# display(df_results)

## 3.2 GLM results:

Due to the fact:
> "GLM, which is currently the industry standard for non-life claim frequency prediction"

We replicate here the results for the GLM (GLM3) that are shown in the LocalGLMnet Paper and the Book by Wüthrich & Merz (2023):  

Note we run the code 15 times to get the results for an average of the runtimes.

In [32]:
# Recreating results GLM1:
# -------------------------
for run_index in range(15):
    start_time = time.time()
    poisson_glm1 = PoissonRegressor(alpha = 0,max_iter=1000, solver='newton-cholesky') # scikit-learn.org: alpha = 0 is equivalent to unpenalized GLMs
    poisson_glm1.fit(X_glm1_learn,y_true["train"]/exposure["train"],sample_weight=exposure["train"])
    end_time = time.time()
    execution_time_glm1 = end_time - start_time
    # Make predictions using the fitted model
    y_pred["train"]["GLM1"] = poisson_glm1.predict(X_glm1_learn)*exposure["train"]
    y_pred["test"]["GLM1"] = poisson_glm1.predict(X_glm1_test)*exposure["test"]
    # store the results in the results class:
    glm1_results = Results(model=f"GLM1 (run: {run_index})",
                            epochs=0,
                            run_time=execution_time_glm1,
                            nr_parameters=len(poisson_glm1.coef_)+len([poisson_glm1.intercept_]),
                            poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"]["GLM1"]),
                            poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"]["GLM1"]),
                            pred_avg_freq_train=y_pred["train"]["GLM1"].sum()/exposure["train"].sum(),
                            pred_avg_freq_test=y_pred["test"]["GLM1"].sum()/exposure["test"].sum())
    # store the results in the dataframe:
    store_results_in_df(glm1_results)


# Recreating results GLM2:
# -------------------------
for run_index in range(15):
    start_time = time.time()
    poisson_glm2 = PoissonRegressor(alpha = 0,max_iter=1000, solver='newton-cholesky') # scikit-learn.org: alpha = 0 is equivalent to unpenalized GLMs
    poisson_glm2.fit(X_glm2_learn,y_true["train"]/exposure["train"],sample_weight=exposure["train"])
    end_time = time.time()
    execution_time_glm2 = end_time - start_time
    # Make predictions using the fitted model
    y_pred["train"]["GLM2"] = poisson_glm2.predict(X_glm2_learn)*exposure["train"]
    y_pred["test"]["GLM2"] = poisson_glm2.predict(X_glm2_test)*exposure["test"]
    # store the results in the results class:
    glm2_results = Results(model=f"GLM2 (run: {run_index})",
                            epochs=0,
                            run_time=execution_time_glm2,
                            nr_parameters=len(poisson_glm2.coef_)+len([poisson_glm2.intercept_]),
                            poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"]["GLM2"]),
                            poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"]["GLM2"]),
                            pred_avg_freq_train=y_pred["train"]["GLM2"].sum()/exposure["train"].sum(),
                            pred_avg_freq_test=y_pred["test"]["GLM2"].sum()/exposure["test"].sum())
    # store the results in the dataframe:
    store_results_in_df(glm2_results)


# Recreating results GLM3:
# -------------------------
for run_index in range(15):
    start_time = time.time()
    poisson_glm3 = PoissonRegressor(alpha = 0,max_iter=1000, solver='newton-cholesky') # scikit-learn.org: alpha = 0 is equivalent to unpenalized GLMs
    poisson_glm3.fit(X_glm3_learn,y_true["train"]/exposure["train"],sample_weight=exposure["train"])
    end_time = time.time()
    execution_time_glm3 = end_time - start_time

    # Make predictions using the fitted model
    y_pred["train"]["GLM3"] = poisson_glm3.predict(X_glm3_learn)*exposure["train"]
    y_pred["test"]["GLM3"] = poisson_glm3.predict(X_glm3_test)*exposure["test"]

    # store the results in the results class:
    glm3_results = Results(model=f"GLM3 (run: {run_index})",
                            epochs=0,
                            run_time=execution_time_glm3,
                            nr_parameters=len(poisson_glm3.coef_)+len([poisson_glm3.intercept_]),
                            poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"]["GLM3"]),
                            poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"]["GLM3"]),
                            pred_avg_freq_train=y_pred["train"]["GLM3"].sum()/exposure["train"].sum(),
                            pred_avg_freq_test=y_pred["test"]["GLM3"].sum()/exposure["test"].sum())
    # store the results in the dataframe:
    store_results_in_df(glm3_results)
# display(df_results)

## 3.3 Feedforward Neural Network OHE:

Note: we run the code 15 times on different seeds (to calc the avg and std of runtime and results).

Create and Build the model:

In [33]:
# Create the dataframes needed for evaluation:
data_nn_ohe_learn, y_true_learn = create_ffn_ohe_data(bool_in_learn)
data_nn_ohe_test, y_true_test = create_ffn_ohe_data(bool_in_test)

for run_index in range(15):
    # Create the dataframes needed for training:
    data_nn_ohe_learn_train, y_true_learn_train = create_ffn_ohe_data(train_val_split[f"learn_train_{run_index}"])
    data_nn_ohe_learn_val, y_true_learn_val = create_ffn_ohe_data(train_val_split[f"learn_val_{run_index}"])

    print(f"Model: {run_index}")
    # Define FNN Model:
    # ----------------------
    # note we use here the function api instead of the model subclassing
    # to make the code more readable and easier to understand:
    # (for the transformer based models we will use model subclasses)
    def Create_Poisson_FFN_OHE(input_dim=42,mean_model_results=1):
        # set random seeds
        set_random_seeds(int(random_seeds[run_index]))
        # Build the network
        Input_Matrix_OHE = tf.keras.layers.Input(shape=(input_dim,), dtype='float32', name='Input_Matrix')
        Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')
        hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(Input_Matrix_OHE)
        hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
        hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
        Result_FFN1 = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_FFN1',
                        weights=[np.zeros((10, 1)), np.array([np.log(mean_model_results)])],
                        trainable=True)(hidden3)
        Response = tf.keras.layers.Multiply(name='Result')([Result_FFN1, Input_Exposure])
        # Define and Return the model
        return tf.keras.models.Model(inputs=[Input_Matrix_OHE, Input_Exposure], outputs=[Response], name='Poisson_FFN_OHE')

    # create the model:
    # ----------------------
    FFN_OHE = Create_Poisson_FFN_OHE(input_dim=40,mean_model_results=constant_model)

    # Compile the models
    # ----------------------
    FFN_OHE.compile(optimizer='nadam', loss=poisson_loss_for_tf, metrics=[poisson_loss_for_tf])

    # model callbacks:
    # ----------------------
    early_stopping_callback = tf.keras.callbacks.EarlyStopping(patience=15, monitor='val_poisson_loss_for_tf', restore_best_weights=True)


    # model fitting:
    # ----------------------
    # model without RandU and RandN:
    start_time = time.time()
    epochs_OHE=500

    FFN_OHE_history = FFN_OHE.fit( x=data_nn_ohe_learn_train,
                                   y=y_true_learn_train,
                                  validation_data=[data_nn_ohe_learn_val, y_true_learn_val],
                                    epochs=epochs_OHE,
                                    batch_size=5000,
                                    verbose=0,
                                    callbacks=[early_stopping_callback]
                                    )
    end_time = time.time()
    execution_time_nn_ohe = end_time - start_time
    best_epoch_FFN_ohe = np.argmin(FFN_OHE_history.history['val_poisson_loss_for_tf'])+1

    # save models:
    # ----------------------
    FFN_OHE.save_weights(f'{storage_path}/saved_models/Poisson_FFN_OHE_{run_index}.weights.h5')


    # load the saved model weights:
    # ----------------------
    FFN_OHE.load_weights(f'{storage_path}/saved_models/Poisson_FFN_OHE_{run_index}.weights.h5')


    # predict with the models:
    # ----------------------
    y_pred["train"]["FFN_OHE"] = np.array([x for [x] in FFN_OHE.predict(data_nn_ohe_learn,
                                                                        batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
    y_pred["test"]["FFN_OHE"] = np.array([x for [x] in FFN_OHE.predict(data_nn_ohe_test,
                                                                        batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])

    # evaluate the models:
    # ----------------------
    # store the results in the results class:
    FFN_OHE_results = Results(model=f"FFN_OHE (run: {run_index})",
                                epochs=best_epoch_FFN_ohe,
                                run_time=execution_time_nn_ohe,
                                nr_parameters=[np.sum([np.prod(v.get_shape().as_list()) for v in FFN_OHE.trainable_weights])],
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"]["FFN_OHE"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"]["FFN_OHE"]),
                                pred_avg_freq_train=y_pred["train"]["FFN_OHE"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"]["FFN_OHE"].sum()/exposure["test"].sum())

    # store the results in the result-dataframe:
    store_results_in_df(FFN_OHE_results)
# display(df_results)
# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_ohe_learn, data_nn_ohe_test, data_nn_ohe_learn_train, data_nn_ohe_learn_val


Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


NOTE: in the case of FNN_OHE:
if i use the tf-dataframes instead of the array inputs: the fit is 4-5 times slower and i get way worse results...



## 3.4 Feedforward Neural Network with Categorical Embeddings:


Note: we run the code 15 times on different seeds (to calc the avg and std of runtime and results).

In [34]:
# Create the dataframes needed for evaluation:
data_nn_emb_learn, y_true_learn = create_ffn_cat_emb_data(bool_in_learn)
data_nn_emb_test, y_true_test = create_ffn_cat_emb_data(bool_in_test)

for run_index in range(15):
    # Create the dataframes needed for training:
    data_nn_emb_learn_train, y_true_learn_train = create_ffn_cat_emb_data(train_val_split[f"learn_train_{run_index}"])
    data_nn_emb_learn_val, y_true_learn_val = create_ffn_cat_emb_data(train_val_split[f"learn_val_{run_index}"])

    # Define FNN with Cat. Embedding Model:
    # ----------------------
    # note we use here the function api instead of the model subclassing
    # to make the code more readable and easier to understand:
    # (for the transformer based models we will use model subclasses)
    print(f"Model: {run_index}")
    def Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=1,mean_model_results=1):
        # set random seeds
        set_random_seeds(int(random_seeds[run_index]))

        Input_Matrix_Num = tf.keras.layers.Input(shape=(input_nr_dim,), dtype='float32', name='Input_Matrix_Num')
        Input_VehBrand = tf.keras.layers.Input(shape=(1,), name='Input_VehBrand')
        Input_Region = tf.keras.layers.Input(shape=(1,), name='Input_Region')
        Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')

        All_Inputs = [Input_Matrix_Num,Input_VehBrand,Input_Region,Input_Exposure]

        Emb_VehBrand = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["VehBrand"].keys()),output_dim=emb_dim,
                                embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05 ),
                                name="Embedding_VehBrand")(Input_VehBrand)
        Emb_Region = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["Region"].keys()),output_dim=emb_dim,
                                embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05),
                            name="Embedding_Region")(Input_Region)

        Reshaped_Emb_VehBrand = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_VehBrand")(Emb_VehBrand)
        Reshaped_Emb_Region = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_Region")(Emb_Region)

        concatenation_layer = tf.keras.layers.Concatenate(name="concatenation_layer")([Input_Matrix_Num,Reshaped_Emb_VehBrand,Reshaped_Emb_Region])

        # Build the network
        hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(concatenation_layer)
        hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
        hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
        Result_FFN1 = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_FFN1',
                        weights=[np.zeros((10, 1)), np.array([np.log(mean_model_results)])],
                        trainable=True)(hidden3)

        Response = tf.keras.layers.Multiply(name='Result')([Result_FFN1, Input_Exposure])

        # Define the model
        return tf.keras.models.Model(inputs=All_Inputs, outputs=[Response], name='Poisson_CAT_EMB')

    # create the model:
    # ----------------------
    emb_dim=2
    FNN_CAT_EMB = Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=emb_dim,mean_model_results=constant_model)

    # Compile the model
    # ----------------------
    FNN_CAT_EMB.compile(optimizer='nadam', loss=poisson_loss_for_tf, metrics=[poisson_loss_for_tf])

    # model callbacks:
    # ----------------------
    early_stopping_callback = tf.keras.callbacks.EarlyStopping(patience=15, monitor='val_poisson_loss_for_tf', restore_best_weights=True)

    # model fitting:
    # ----------------------
    start_time = time.time()
    epochs_CAT_EMB=500
    FNN_CAT_EMB_history = FNN_CAT_EMB.fit(x=data_nn_emb_learn_train,
                                    y=y_true_learn_train,
                                    validation_data=[data_nn_emb_learn_val, y_true_learn_val],
                                    epochs=epochs_CAT_EMB,
                                    batch_size=7000,
                                    verbose=0,
                                    callbacks=[early_stopping_callback]
                                    )

    end_time = time.time()
    execution_time_FNN_CAT_EMB = end_time - start_time
    best_epoch_FNN_CAT_EMB = np.argmin(FNN_CAT_EMB_history.history['val_poisson_loss_for_tf'])+1

    # save models:
    # ----------------------
    FNN_CAT_EMB.save_weights(f'{storage_path}/saved_models/Poisson_FNN_CAT_EMB_{run_index}.weights.h5')

    # load the saved model weights:
    # ----------------------
    FNN_CAT_EMB.load_weights(f'{storage_path}/saved_models/Poisson_FNN_CAT_EMB_{run_index}.weights.h5')

    # predict with the model:
    # ----------------------
    y_pred["train"]["FNN_CAT_EMB"] = np.array([x for [x] in FNN_CAT_EMB.predict(data_nn_emb_learn,
                                                                        batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
    y_pred["test"]["FNN_CAT_EMB"] = np.array([x for [x] in FNN_CAT_EMB.predict(data_nn_emb_test,
                                                                    batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
    # evaluate the model:
    # ----------------------
    FNN_CAT_EMB_results = Results(model=f"FNN_CAT_EMB (run: {run_index})",
                                epochs=best_epoch_FNN_CAT_EMB,
                                run_time=execution_time_FNN_CAT_EMB,
                                nr_parameters=[np.sum([np.prod(v.get_shape().as_list()) for v in FNN_CAT_EMB.trainable_weights])],
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"]["FNN_CAT_EMB"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"]["FNN_CAT_EMB"]),
                                pred_avg_freq_train=y_pred["train"]["FNN_CAT_EMB"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"]["FNN_CAT_EMB"].sum()/exposure["test"].sum())
    store_results_in_df(FNN_CAT_EMB_results)
# display(df_results)
# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_emb_learn, data_nn_emb_test, data_nn_emb_learn_train, data_nn_emb_learn_val

Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


## 3.5 CANN (GLM3 and FNN_CAT_EMB):

Note: we run the code 15 times on different seeds (to calc the avg and std of runtime and results).

Note: the Code for the CANN below is basically the same code as for the FNN with categorical embeddings. The only changes are
* we use a other exposure column (Exposure_x_GLM3_pred instead of Exposure)
* we set the initial weights and bias of the last layer to zero

In [36]:
# create the new exposure times GLM3_pred column for CANN models.
df_freq_prep_nn["Exposure_x_GLM3_pred"] = list(poisson_glm3.predict(X_glm3)*df_freq_prep_nn["Exposure"])

In [37]:
# Create the dataframes needed for evaluation:
data_nn_emb_learn, y_true_learn = create_ffn_cat_emb_data(bool_in_learn, exposure_name = "Exposure_x_GLM3_pred")
data_nn_emb_test, y_true_test = create_ffn_cat_emb_data(bool_in_test, exposure_name = "Exposure_x_GLM3_pred")

for run_index in range(15):
    # Create the dataframes needed for training:
    data_nn_emb_learn_train, y_true_learn_train = create_ffn_cat_emb_data(train_val_split[f"learn_train_{run_index}"],
                                                                          exposure_name = "Exposure_x_GLM3_pred")
    data_nn_emb_learn_val, y_true_learn_val = create_ffn_cat_emb_data(train_val_split[f"learn_val_{run_index}"],
                                                                      exposure_name = "Exposure_x_GLM3_pred")

    # Define FNN with Cat. Embedding Model:
    # ----------------------
    # note we use here the function api instead of the model subclassing
    # to make the code more readable and easier to understand:
    # (for the transformer based models we will use model subclasses)
    print(f"Model: {run_index}")
    def Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=1,mean_model_results=1):
        # set random seeds
        set_random_seeds(int(random_seeds[run_index]))

        Input_Matrix_Num = tf.keras.layers.Input(shape=(input_nr_dim,), dtype='float32', name='Input_Matrix_Num')
        Input_VehBrand = tf.keras.layers.Input(shape=(1,), name='Input_VehBrand')
        Input_Region = tf.keras.layers.Input(shape=(1,), name='Input_Region')
        Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')

        All_Inputs = [Input_Matrix_Num,Input_VehBrand,Input_Region,Input_Exposure]

        Emb_VehBrand = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["VehBrand"].keys()),output_dim=emb_dim,
                                embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05 ),
                                name="Embedding_VehBrand")(Input_VehBrand)
        Emb_Region = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["Region"].keys()),output_dim=emb_dim,
                                embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05),
                            name="Embedding_Region")(Input_Region)

        Reshaped_Emb_VehBrand = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_VehBrand")(Emb_VehBrand)
        Reshaped_Emb_Region = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_Region")(Emb_Region)

        concatenation_layer = tf.keras.layers.Concatenate(name="concatenation_layer")([Input_Matrix_Num,Reshaped_Emb_VehBrand,Reshaped_Emb_Region])

        # Build the network
        hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(concatenation_layer)
        hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
        hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
        Result_FFN1 = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_FFN1',
                        weights=[np.zeros((10, 1)), np.array([0])],
                        trainable=True)(hidden3)

        Response = tf.keras.layers.Multiply(name='Result')([Result_FFN1, Input_Exposure])

        # Define the model
        return tf.keras.models.Model(inputs=All_Inputs, outputs=[Response], name='Poisson_CAT_EMB')

    # create the model:
    # ----------------------
    emb_dim=2
    CANN = Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=emb_dim,mean_model_results=constant_model)

    # Compile the model
    # ----------------------
    CANN.compile(optimizer='nadam', loss=poisson_loss_for_tf, metrics=[poisson_loss_for_tf])

    # model callbacks:
    # ----------------------
    early_stopping_callback = tf.keras.callbacks.EarlyStopping(patience=15, monitor='val_poisson_loss_for_tf', restore_best_weights=True)

    # model fitting:
    # ----------------------
    start_time = time.time()
    epochs_CAT_EMB=500

    CANN_history = CANN.fit(x=data_nn_emb_learn_train,
                                    y=y_true_learn_train,
                                    validation_data=[data_nn_emb_learn_val, y_true_learn_val],
                                    epochs=epochs_CAT_EMB,
                                    batch_size=7000,
                                    verbose=0,
                                    callbacks=[early_stopping_callback]
                                    )

    end_time = time.time()

    execution_time_CANN = end_time - start_time
    best_epoch_CANN = np.argmin(CANN_history.history['val_poisson_loss_for_tf'])+1

    # save models:
    # ----------------------
    CANN.save_weights(f'{storage_path}/saved_models/Poisson_CANN_{run_index}.weights.h5')

    # load the saved model weights:
    # ----------------------
    CANN.load_weights(f'{storage_path}/saved_models/Poisson_CANN_{run_index}.weights.h5')


    # predict with the model:
    # ----------------------
    y_pred["train"]["CANN"] = np.array([x for [x] in CANN.predict(data_nn_emb_learn,
                                                                        batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
    y_pred["test"]["CANN"] = np.array([x for [x] in CANN.predict(data_nn_emb_test,
                                                                    batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
    # evaluate the model:
    # ----------------------
    CANN_results = Results(model=f"CANN (run: {run_index})",
                                epochs=best_epoch_CANN,
                                run_time=execution_time_CANN,
                                nr_parameters=[np.sum([np.prod(v.get_shape().as_list()) for v in CANN.trainable_weights])],
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"]["CANN"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"]["CANN"]),
                                pred_avg_freq_train=y_pred["train"]["CANN"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"]["CANN"].sum()/exposure["test"].sum())
    store_results_in_df(CANN_results)
# display(df_results)
# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_emb_learn, data_nn_emb_test, data_nn_emb_learn_train, data_nn_emb_learn_val


Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


## 3.6 LocalGLMnets OHE
(excl. Random Features):

Note: we run the code 15 times on different seeds (to calc the avg and std of runtime and results).

In [38]:
# Create the dataframes needed for evaluation:
data_nn_ohe_learn, y_true_learn = create_ffn_ohe_data(bool_in_learn)
data_nn_ohe_test, y_true_test = create_ffn_ohe_data(bool_in_test)

for run_index in range(15):
    # Create the dataframes needed for training:
    data_nn_ohe_learn_train, y_true_learn_train = create_ffn_ohe_data(train_val_split[f"learn_train_{run_index}"])
    data_nn_ohe_learn_val, y_true_learn_val = create_ffn_ohe_data(train_val_split[f"learn_val_{run_index}"])

    print(f"Model: {run_index}")
    # Define FNN with Cat. Embedding Model:
    # ----------------------
    # note we use here the function api instead of the model subclassing
    # to make the code more readable and easier to understand:
    # (for the transformer based models we will use model subclasses)

    # create dummy glm for initial weights
    # ----------------------
    poisson_glm_dummy = PoissonRegressor(alpha = 0,max_iter=1000) # scikit-learn.org: alpha = 0 is equivalent to unpenalized GLMs
    poisson_glm_dummy.fit(data_nn_ohe_learn[0],y_true_learn/data_nn_ohe_learn[1],sample_weight=data_nn_ohe_learn[1]) # note: data_nn_ohe_learn = [X_ohe,exposure]

    # Define LocalGLMnet:
    # ----------------------
    def Create_Poisson_LocalGLMnet(input_dim=40,initial_glm_bias=1, initial_glm_betas=None):
        # set random seeds
        set_random_seeds(int(random_seeds[run_index]))
        Input_Matrix_OHE = tf.keras.layers.Input(shape=(input_dim,), dtype='float32', name='Input_Matrix')
        Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')
        # Build the network
        hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(Input_Matrix_OHE)
        hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
        hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
        Attention = tf.keras.layers.Dense(units=input_dim, activation='linear', name='attention',
                        weights=[np.zeros((10, input_dim)), initial_glm_betas])(hidden3)
        # note that the weights are set to 0 and the bias is set to the initial glm betas
        # create a layer that calculates the dot product between the attention weights (Attention) and the input matrix Input_Matrix_OHE:
        # (Attention has the same dimension as the input matrix Input_Matrix_OHE):
        weighted_input = tf.keras.layers.Multiply(name='feature_contributions')([Attention, Input_Matrix_OHE])
        scalar_product = tf.keras.layers.Dense(units=1, activation='linear', name='scalar_product',
                            weights=[np.ones((input_dim, 1)), np.array([0])],
                            trainable=False)(weighted_input)
        # Note that we actually don't want to make the following weights trainable,
        # but to get the bias to be trainable we need to do so. see comment in Book Wüthrich & Merz (2023) page 500
        Result_LocalGLMnet_without_Exposure = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_LocalGLMnet_without_Exposure',
                        weights=[np.ones((1, 1)), np.array([initial_glm_bias])],
                        trainable=True)(scalar_product)
        Response = tf.keras.layers.Multiply(name='Result')([Result_LocalGLMnet_without_Exposure, Input_Exposure])
        return tf.keras.models.Model(inputs=[Input_Matrix_OHE, Input_Exposure], outputs=[Response], name='Poisson_LocalGLMnet')

    # create the model:
    # ----------------------
    LocalGLMnet = Create_Poisson_LocalGLMnet(input_dim=40,initial_glm_bias=poisson_glm_dummy.intercept_,initial_glm_betas=poisson_glm_dummy.coef_)

    # Compile the model
    # ----------------------
    LocalGLMnet.compile(optimizer='nadam', loss=poisson_loss_for_tf, metrics=[poisson_loss_for_tf])

    # model callbacks:
    # ----------------------
    early_stopping_callback = tf.keras.callbacks.EarlyStopping(patience=15, monitor='val_poisson_loss_for_tf', restore_best_weights=True)

    # model fitting:
    # ----------------------
    start_time = time.time()
    epochs_OHE=500
    LocalGLMnet_history = LocalGLMnet.fit(x=data_nn_ohe_learn_train,
                                          y=y_true_learn_train,
                                          validation_data=[data_nn_ohe_learn_val, y_true_learn_val],
                                          epochs=epochs_OHE,
                                          batch_size=5000,
                                          verbose=0,
                                          callbacks=[early_stopping_callback]
                                          )
    end_time = time.time()
    execution_time_LocalGLMnet = end_time - start_time
    best_epoch_LocalGLMnet = np.argmin(LocalGLMnet_history.history['val_poisson_loss_for_tf'])+1

    # save models:
    # ----------------------
    LocalGLMnet.save_weights(f'{storage_path}/saved_models/Poisson_LocalGLMnet_{run_index}.weights.h5')

    # load the saved model weights:
    # ----------------------
    LocalGLMnet.load_weights(f'{storage_path}/saved_models/Poisson_LocalGLMnet_{run_index}.weights.h5')

    # predict with the model:
    # ----------------------
    y_pred["train"]["LocalGLMnet"] = np.array([x for [x] in LocalGLMnet.predict(data_nn_ohe_learn,
                                                                        batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
    y_pred["test"]["LocalGLMnet"] = np.array([x for [x] in LocalGLMnet.predict(data_nn_ohe_test,
                                                                    batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
    # evaluate the model:
    # ----------------------
    LocalGLMnet_results = Results(model=f"LocalGLMnet (run: {run_index})",
                                epochs=best_epoch_LocalGLMnet,
                                run_time=execution_time_LocalGLMnet,
                                nr_parameters=[np.sum([np.prod(v.get_shape().as_list()) for v in LocalGLMnet.trainable_weights])],
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"]["LocalGLMnet"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"]["LocalGLMnet"]),
                                pred_avg_freq_train=y_pred["train"]["LocalGLMnet"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"]["LocalGLMnet"].sum()/exposure["test"].sum())
    # store the results in the dataframe:
    store_results_in_df(LocalGLMnet_results)
# display(df_results)
# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_ohe_learn, data_nn_ohe_test, data_nn_ohe_learn_train, data_nn_ohe_learn_val

Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


## 3.7 Compare Benchmark Results:
to those in the LocalGLM Paper and Wüthrich & Merz Book (2023):

In [43]:
# # save the results:
# with open(f'{storage_path}/Data/df_results.pickle', 'wb') as handle:
#     pickle.dump(df_results, handle, protocol=pickle.HIGHEST_PROTOCOL)

# load the results:
with open(f'{storage_path}/Data/df_results.pickle', 'rb') as handle:
    df_results = pickle.load(handle)

In [40]:
# display(df_results)

In [44]:
print("Results Average:")
display(calc_avg_df(["homogeneous model","GLM1","GLM2","GLM3","FFN_OHE","FNN_CAT_EMB","CANN","LocalGLMnet"]))
print("Results Standard-Deviation:")
display(calc_std_df(["homogeneous model","GLM1","GLM2","GLM3","FFN_OHE","FNN_CAT_EMB","CANN","LocalGLMnet"]))

Results Average:


Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model,0.0,0.054847,1.0,0.252132,0.254454,0.073631,0.073631
1,GLM1,0.0,2.220158,49.0,0.241015,0.241463,0.073631,0.0739
2,GLM2,0.0,2.752693,48.0,0.240911,0.241125,0.073631,0.073981
3,GLM3,0.0,1.900497,50.0,0.240844,0.241022,0.073631,0.074048
4,FFN_OHE,42.2,37.80556,1306.0,0.237535,0.238652,0.073906,0.07431
5,FNN_CAT_EMB,72.933333,58.728892,792.0,0.237682,0.238267,0.073774,0.074238
6,CANN,90.333333,68.55912,792.0,0.23742,0.238102,0.074019,0.074438
7,LocalGLMnet,25.333333,29.720892,1737.0,0.237095,0.239211,0.073825,0.074267


Results Standard-Deviation:


Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model,0.0,0.002512,0.0,0.0,5.74595e-17,2.872975e-17,2.872975e-17
1,GLM1,0.0,0.473819,0.0,5.74595e-17,5.74595e-17,1.436488e-17,0.0
2,GLM2,0.0,0.970156,0.0,5.74595e-17,2.872975e-17,0.0,1.436488e-17
3,GLM3,0.0,0.408252,0.0,2.872975e-17,2.872975e-17,0.0,1.436488e-17
4,FFN_OHE,14.663853,8.624492,0.0,0.0003255191,0.0001570462,0.001223993,0.001209107
5,FNN_CAT_EMB,21.661245,13.907265,0.0,0.0001590947,0.0001514444,0.001071399,0.001088943
6,CANN,53.898935,33.162059,0.0,0.0006076588,0.0003253586,0.001111365,0.001103431
7,LocalGLMnet,7.622023,5.097405,0.0,0.000334063,0.0002176521,0.0008787938,0.0009078453


Note that we have the same results as described in the book 2023 by Wüthrich and Merz for the Mean and GLM Models:

| Model | In-sample | Out-of-sample |
|-------|-----------|---------------|
| Poisson null | 25.213 | 25.445 |
| Poisson GLM3 | 24.101 | 24.146 |
| Poisson GLM3 | 24.091 | 24.113 |
| Poisson GLM3 | 24.084 | 24.102 |

And for the other models we have very similar results compared to those in the paper LocalGLMnet (2023):
(Our Results to those in the following table from from the paper (results from the paper): (a),(b),(d),(e),(f). We changed here the validation set for LocalGLMnet from 20% (in Paper) to 10% but the results is very similar the one in the 2023 LocalGLMnet paper (d) (see our std analysis).

| Model | In-sample | Out-of-sample |
|-------|-----------|---------------|
| (a) null model | 25.213 | 25.445 |
| (b) FFN network | 23.764 | 23.873 |
| (c) LocalGLMnet | 23.728 | 23.945 |
| (d) reduced LocalGLMnet | 23.714 | 23.912 |
| (e) Poisson GLM3 | 24.084 | 24.102 |
| (f) Categorical Embedding network | 23.690 | 23.824 |
| (g) Nagging network | 23.691 | 23.783 |

# 4. Transformer Models

## 4.1 FT-Transformer-Model:

The FT-Transformer Model was introduced by Gorishniy et al 2021.

So the code ideas are based on the code from [gorishniy2021revisiting]

Please see the paper for more details: https://arxiv.org/abs/2106.11959

Please see here their code written for torch nn's: https://github.com/Yura52/rtdl

References:
        * [gorishniy2021revisiting]  Gorishniy, Rubachev, Khrulkov, Babenko "Revisiting Deep Learning Models for Tabular Data" 2021


NOTE: I rewrote the original code quite a bit to suit our purpose, one can find my oop code (on the tensorflow framework) in the helper folder provided with this notebook.

NOTE: Below we use instead of the .fit function a costum training loop (to have a extra bit of freedom).

NOTE: we run the code 15 times on different seeds (to calc the avg and std of runtime and results).

NOTE: It is very important that the learn/test split stays the same for all models, otherwise the results are not comparable!
But we can change up the learn-train/learn-val split for each model to get a better estimate of the generalization error.

NOTE: as described in the data preperation section, we are not creating here every train and validation split dataset at once but instead create those when fitting the model. So that we are not polluting the RAM. Note that notebooks have usually no garbage collector, so we try to be carefull.

In [None]:
# Create the dataframes needed for evaluation:
# --------------------
# NOTE: in the 2021 Gorishniy et al paper the batch size is different for the different Datasets
# but is not hyperparameter tuned. Bigger datasets they used a batch size of 1024 and
# for smaller datasets a batch size of (256/512).
batch_size = 1024
learn_data = df_to_tensor(df_freq_prep_nn[bool_in_learn], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)
test_data = df_to_tensor(df_freq_prep_nn[bool_in_test], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)

# NOTE we use at first just a fraction of the data to test the code:
learn_train_dummy_data = df_to_tensor(df_freq_prep_nn[bool_in_learn_train_dummy], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size,
                                      dummy_data_for_build=True)

for run_index in range(15):
    # Create the dataframes needed for training:
    learn_train_data = df_to_tensor(df_freq_prep_nn[train_val_split[f"learn_train_{run_index}"]],
                                    feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)
    learn_val_data = df_to_tensor(df_freq_prep_nn[train_val_split[f"learn_val_{run_index}"]],
                                  feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)

    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    print(f"-------We are at Model: {str(run_index).zfill(2)}-----------------------")
    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    # Define FT-Transformer Models:
    # ----------------------
    # NOTE: we use here tensorflow/keras model subclasses (not the functional or sequential api)
    # NOTE: we use here instead of the .fit function a costum training loop

    # create the model:
    # ----------------------
    set_random_seeds(int(random_seeds[run_index]))

    FT_transformer = EnhActuar.Feature_Tokenizer_Transformer(
            emb_dim = 32, # NOTE: In the default setting for the 2021 Gorishniy paper they used emb_dim = 192 (but the parameter size would here go trough the roof, so we use something smaller)
            nr_features = nr_col,
            cat_features = cat_col,
            cat_vocabulary = cat_vocabulary,
            count_transformer_blocks = 3,
            attention_n_heads = 8,
            attention_dropout = 0.2,
            ffn_d_hidden = None, # NOTE: change to None if ReGLU should be used -> None uses default value (4/3*emb_dim), they write that they used 2*emb_dim if not ReGLU.
            ffn_activation_ReGLU = True, # NOTE: set True if ReGLU should be used
            ffn_dropout = 0.1,
            prenormalization = True,
            output_dim = 1,
            last_activation = 'exponential',
            exposure_name = "Exposure",
            seed_nr = int(random_seeds[run_index])
    )

    # See here regarding costum training loop: https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch

    # Instantiate an optimizer to train the model.
    # ----------------------
    # create an optimizer AdamW with learning rate 1e-4, weight decay 1e-5:
    optimizer = tf.keras.optimizers.AdamW(learning_rate=1e-4, weight_decay=1e-5)

    # Instantiate a loss function
    # ----------------------
    # we use our own loss function here
    # because it is not included in tensorflow in the same way (see section loss function for more details):
    loss_fn = Poisson_loss_for_tf_Wrapped()

    # Prepare the metrics.
    # ----------------------
    # we use a costume metric here (because it is not included in tensorflow in the same way):
    train_acc_metric = Poisson_Metric_for_tf()
    val_acc_metric = Poisson_Metric_for_tf()
    test_acc_metric = Poisson_Metric_for_tf()

    @tf.function
    def train_step(x, y):
        # Open a GradientTape to record the operations run during the forward pass, which enables auto-differentiation.
        with tf.GradientTape() as tape:
            # Run the forward pass of the layer. The operations that the layer applies to its inputs are going to be recorded on the GradientTape.
            y_pred = FT_transformer(x, training=True)["output"]  # prediction for this minibatch
            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y, y_pred)
        # Use the gradient tape to automatically retrieve the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, FT_transformer.trainable_weights)
        # Run one step of gradient descent by updating the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, FT_transformer.trainable_weights))
        # Update training metric.
        train_acc_metric.update_state(y, y_pred)
        return loss_value

    @tf.function
    def val_step(x, y):
        # Run the forward pass of the layer.
        # (note: training=False is needed because the layers have different behavior during training versus inference (e.g. Dropout))
        y_pred = FT_transformer(x, training=False)["output"]
        # Update val metrics
        val_acc_metric.update_state(y, y_pred)

    @tf.function
    def test_step(x, y):
        # Run the forward pass of the layer.
        # (note: training=False is needed because the layers have different behavior during training versus inference (e.g. Dropout))
        y_pred = FT_transformer(x, training=False)["output"]
        # Update val metrics
        test_acc_metric.update_state(y, y_pred)

    # model fitting:
    # ----------------------
    start_time = time.time()
    Val_Progress = helper.Easy_ProgressTracker(patience=15)
    epochs = 500

    for epoch in range(epochs):
        # Iterate over the batches of the dataset.
        for step, (x_batch_train, y_batch_train) in enumerate(learn_train_data):
            loss_value = train_step(x_batch_train, y_batch_train)
            helper.costume_progress_bar(f"Ensemble: {str(run_index).zfill(2)}/{14} / Epoch: {epoch} / Batch: {step} / Train-Loss (Batch): {round(float(loss_value),4)}",step,len(learn_train_data), 30)

        # Display metrics at the end of each epoch.
        print_train_loss = train_acc_metric.result()
        # Reset training metrics at the end of each epoch
        train_acc_metric.reset_states()

        # Run a validation at the end of each epoch.
        for x_batch_val, y_batch_val in learn_val_data:
            val_step(x_batch_val, y_batch_val)
        print_val_loss = val_acc_metric.result()
        val_acc_metric.reset_states()
        for x_batch_test, y_batch_test in test_data:
            test_step(x_batch_test, y_batch_test)
        print_test_loss = test_acc_metric.result()
        test_acc_metric.reset_states()

        Val_Progress(current_epoch=epoch, current_score = print_val_loss)

        print(f"\nEnsemble: {str(run_index).zfill(2)}/{14} / Epoch: {epoch} / Train-Loss: %.4f / Val-Loss: %.4f / Test-Loss: %.4f / Time taken: %s / ---- Currently Best Val-Epoch: %d" % (
            # str(run_index).zfill(2),
            float(print_train_loss),
            float(print_val_loss),
            float(print_test_loss),
            datetime.timedelta(seconds=int(time.time() - start_time)),
            Val_Progress.best_epoch
            ), end = " ")
        if Val_Progress.progress == True:
            print("<------- Best VAL Epoch so far")
        else:
            print("\r")


        # Callback: save best model / early stopping:
        # ----------------------
        earliest_epoch2save = 10
        if Val_Progress.progress and Val_Progress.current_epoch >= earliest_epoch2save:
            # FT_transformer.save(storage_path +'/Poisson_FT_transformer')
            FT_transformer.save_weights(f'{storage_path}/saved_models/Poisson_FT_transformer_{run_index}.weights.h5')
        if Val_Progress.patience_over:
            break

    # create some metrics after the loop
    best_epoch_FT_transformer = Val_Progress.best_epoch
    execution_time_FT_transformer = time.time() - start_time

    # load the best saved model and epochs_and_time from the pickle file:
    # ----------------------
    # FT_transformer = keras.models.load_model(save_path +'/Poisson_FT_transformer')
    FT_transformer.load_weights(f'{storage_path}/saved_models/Poisson_FT_transformer_{run_index}.weights.h5')

    # predict with the model:
    # ----------------------
    y_pred["train"][f"FT_transformer"] = np.array([x for [x] in FT_transformer.predict(learn_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                )["output"]])
    y_pred["test"][f"FT_transformer"] = np.array([x for [x] in FT_transformer.predict(test_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                )["output"]])

    # evaluate the model:
    # ----------------------
    FT_transformer_results = Results(model=f"FT_transformer (run: {run_index})",
                                epochs=best_epoch_FT_transformer,
                                run_time=execution_time_FT_transformer,
                                nr_parameters=[np.sum([np.prod(v.get_shape().as_list()) for v in FT_transformer.trainable_weights])],
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"FT_transformer"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"FT_transformer"]),
                                pred_avg_freq_train=y_pred["train"][f"FT_transformer"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"FT_transformer"].sum()/exposure["test"].sum())
    # store the results in the dataframe:
    store_results_in_df(FT_transformer_results)
    display(df_results)



-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 00-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 00/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0326  : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 0 / Train-Loss: 0.3631 / Val-Loss: 0.2746 / Test-Loss: 0.2746 / Time taken: 0:00:43 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0304  : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 1 / Train-Loss: 0.2621 / Val-Loss: 0.2576 / Test-Loss: 0.2570 / Time taken: 0:00:57 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.03    : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 00/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0328 : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 11 / Train-Loss: 0.2390 / Val-Loss: 0.2483 / Test-Loss: 0.2447 / Time taken: 0:03:43 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 12 / Train-Loss: 0.2388 / Val-Loss: 0.2482 / Test-Loss: 0.2446 / Time taken: 0:03:56 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0326 : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 13 / Train-Loss: 0.2387 / Val-Loss: 0.2478 / Test-Loss: 0.2441 / Time taken: 0:04:13 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
116,LocalGLMnet (run: 11),38,29.706788,1737,0.236772,0.238934,0.073772,0.074275
117,LocalGLMnet (run: 12),11,15.253076,1737,0.237896,0.239430,0.073266,0.073614
118,LocalGLMnet (run: 13),19,20.083682,1737,0.237316,0.239288,0.073202,0.073597
119,LocalGLMnet (run: 14),28,24.299453,1737,0.236903,0.239005,0.075655,0.076141


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 01-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 01/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0084  : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 0 / Train-Loss: 0.3070 / Val-Loss: 0.2569 / Test-Loss: 0.2548 / Time taken: 0:00:31 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.007   : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 1 / Train-Loss: 0.2504 / Val-Loss: 0.2551 / Test-Loss: 0.2524 / Time taken: 0:00:46 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0065  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 01/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0063 : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 13 / Train-Loss: 0.2384 / Val-Loss: 0.2489 / Test-Loss: 0.2440 / Time taken: 0:03:51 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.006  : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 14 / Train-Loss: 0.2385 / Val-Loss: 0.2492 / Test-Loss: 0.2441 / Time taken: 0:04:13 / ---- Currently Best Val-Epoch: 13 
Ensemble: 01/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.0062 : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 15 / Train-Loss: 0.2381 / Val-Loss: 0.2487 / Test-Loss: 0.2437 / Time taken: 0:04:35 / ---- Currently Best Val-Epoch: 15 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 16 / Batch: 536 / Train-Loss (Batch): 0.0061 : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 16 / Train-Loss: 0.2381 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
117,LocalGLMnet (run: 12),11,15.253076,1737,0.237896,0.239430,0.073266,0.073614
118,LocalGLMnet (run: 13),19,20.083682,1737,0.237316,0.239288,0.073202,0.073597
119,LocalGLMnet (run: 14),28,24.299453,1737,0.236903,0.239005,0.075655,0.076141
120,FT_transformer (run: 0),82,1483.549221,27133,0.238317,0.239865,0.060856,0.061208


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 02-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 02/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0299  : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 0 / Train-Loss: 0.2769 / Val-Loss: 0.2522 / Test-Loss: 0.2546 / Time taken: 0:00:29 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0307  : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 1 / Train-Loss: 0.2496 / Val-Loss: 0.2493 / Test-Loss: 0.2516 / Time taken: 0:00:42 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.031   : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 02/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0323 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 11 / Train-Loss: 0.2393 / Val-Loss: 0.2431 / Test-Loss: 0.2444 / Time taken: 0:03:31 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0328 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 12 / Train-Loss: 0.2392 / Val-Loss: 0.2429 / Test-Loss: 0.2441 / Time taken: 0:03:45 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 13 / Train-Loss: 0.2389 / Val-Loss: 0.2425 / Test-Loss: 0.2438 / Time taken: 0:03:59 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0321 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
118,LocalGLMnet (run: 13),19,20.083682,1737,0.237316,0.239288,0.073202,0.073597
119,LocalGLMnet (run: 14),28,24.299453,1737,0.236903,0.239005,0.075655,0.076141
120,FT_transformer (run: 0),82,1483.549221,27133,0.238317,0.239865,0.060856,0.061208
121,FT_transformer (run: 1),89,1614.216463,27133,0.237163,0.238795,0.061112,0.061284


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 03-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 03/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0299  : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 0 / Train-Loss: 0.2761 / Val-Loss: 0.2597 / Test-Loss: 0.2550 / Time taken: 0:00:44 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 03/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0304  : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 1 / Train-Loss: 0.2509 / Val-Loss: 0.2575 / Test-Loss: 0.2524 / Time taken: 0:00:57 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 03/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0309  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 03/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0334 : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 11 / Train-Loss: 0.2388 / Val-Loss: 0.2503 / Test-Loss: 0.2439 / Time taken: 0:03:23 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 03/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0335 : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 12 / Train-Loss: 0.2385 / Val-Loss: 0.2501 / Test-Loss: 0.2437 / Time taken: 0:03:38 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 03/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 13 / Train-Loss: 0.2384 / Val-Loss: 0.2497 / Test-Loss: 0.2433 / Time taken: 0:04:00 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 03/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
119,LocalGLMnet (run: 14),28,24.299453,1737,0.236903,0.239005,0.075655,0.076141
120,FT_transformer (run: 0),82,1483.549221,27133,0.238317,0.239865,0.060856,0.061208
121,FT_transformer (run: 1),89,1614.216463,27133,0.237163,0.238795,0.061112,0.061284
122,FT_transformer (run: 2),76,1453.342622,27133,0.238176,0.239629,0.060320,0.060438


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 04-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 04/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0311  : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 0 / Train-Loss: 0.3085 / Val-Loss: 0.2607 / Test-Loss: 0.2655 / Time taken: 0:00:33 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.03    : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 1 / Train-Loss: 0.2575 / Val-Loss: 0.2500 / Test-Loss: 0.2553 / Time taken: 0:00:56 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0303  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 04/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0334 : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 12 / Train-Loss: 0.2394 / Val-Loss: 0.2389 / Test-Loss: 0.2430 / Time taken: 0:03:52 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 13 / Train-Loss: 0.2393 / Val-Loss: 0.2385 / Test-Loss: 0.2426 / Time taken: 0:04:14 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 14 / Train-Loss: 0.2392 / Val-Loss: 0.2385 / Test-Loss: 0.2425 / Time taken: 0:04:29 / ---- Currently Best Val-Epoch: 14 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.0338 : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
120,FT_transformer (run: 0),82,1483.549221,27133,0.238317,0.239865,0.060856,0.061208
121,FT_transformer (run: 1),89,1614.216463,27133,0.237163,0.238795,0.061112,0.061284
122,FT_transformer (run: 2),76,1453.342622,27133,0.238176,0.239629,0.060320,0.060438
123,FT_transformer (run: 3),44,1021.084522,27133,0.238937,0.239780,0.059604,0.059712


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 05-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 05/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0304  : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 0 / Train-Loss: 0.2981 / Val-Loss: 0.2504 / Test-Loss: 0.2561 / Time taken: 0:00:33 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 05/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0305  : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 1 / Train-Loss: 0.2520 / Val-Loss: 0.2464 / Test-Loss: 0.2516 / Time taken: 0:00:48 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 05/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0305  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 05/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 11 / Train-Loss: 0.2391 / Val-Loss: 0.2415 / Test-Loss: 0.2444 / Time taken: 0:03:15 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 05/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 12 / Train-Loss: 0.2391 / Val-Loss: 0.2413 / Test-Loss: 0.2440 / Time taken: 0:03:29 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 05/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0338 : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 13 / Train-Loss: 0.2389 / Val-Loss: 0.2411 / Test-Loss: 0.2439 / Time taken: 0:03:47 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 05/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
121,FT_transformer (run: 1),89,1614.216463,27133,0.237163,0.238795,0.061112,0.061284
122,FT_transformer (run: 2),76,1453.342622,27133,0.238176,0.239629,0.060320,0.060438
123,FT_transformer (run: 3),44,1021.084522,27133,0.238937,0.239780,0.059604,0.059712
124,FT_transformer (run: 4),90,2107.885792,27133,0.236846,0.238730,0.063062,0.063192


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 06-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 06/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0306  : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 0 / Train-Loss: 0.2971 / Val-Loss: 0.2553 / Test-Loss: 0.2581 / Time taken: 0:00:27 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0303  : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 1 / Train-Loss: 0.2526 / Val-Loss: 0.2496 / Test-Loss: 0.2519 / Time taken: 0:00:49 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0307  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 06/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 11 / Train-Loss: 0.2391 / Val-Loss: 0.2438 / Test-Loss: 0.2444 / Time taken: 0:03:27 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 12 / Train-Loss: 0.2390 / Val-Loss: 0.2437 / Test-Loss: 0.2442 / Time taken: 0:03:41 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 13 / Train-Loss: 0.2390 / Val-Loss: 0.2430 / Test-Loss: 0.2434 / Time taken: 0:03:55 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0325 : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
122,FT_transformer (run: 2),76,1453.342622,27133,0.238176,0.239629,0.060320,0.060438
123,FT_transformer (run: 3),44,1021.084522,27133,0.238937,0.239780,0.059604,0.059712
124,FT_transformer (run: 4),90,2107.885792,27133,0.236846,0.238730,0.063062,0.063192
125,FT_transformer (run: 5),71,1416.576316,27133,0.237981,0.239434,0.060926,0.061031


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 07-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 07/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.03    : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 0 / Train-Loss: 0.2648 / Val-Loss: 0.2553 / Test-Loss: 0.2536 / Time taken: 0:00:30 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0307  : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 1 / Train-Loss: 0.2458 / Val-Loss: 0.2519 / Test-Loss: 0.2489 / Time taken: 0:00:44 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0316  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 07/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 11 / Train-Loss: 0.2387 / Val-Loss: 0.2476 / Test-Loss: 0.2431 / Time taken: 0:03:13 / ---- Currently Best Val-Epoch: 10 
Ensemble: 07/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0333 : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 12 / Train-Loss: 0.2386 / Val-Loss: 0.2471 / Test-Loss: 0.2427 / Time taken: 0:03:28 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0336 : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 13 / Train-Loss: 0.2386 / Val-Loss: 0.2468 / Test-Loss: 0.2423 / Time taken: 0:03:44 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0334 : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 14 / Train-Loss: 0.2383 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
123,FT_transformer (run: 3),44,1021.084522,27133,0.238937,0.239780,0.059604,0.059712
124,FT_transformer (run: 4),90,2107.885792,27133,0.236846,0.238730,0.063062,0.063192
125,FT_transformer (run: 5),71,1416.576316,27133,0.237981,0.239434,0.060926,0.061031
126,FT_transformer (run: 6),109,1999.786216,27133,0.236723,0.238745,0.061890,0.061992


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 08-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 08/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0325  : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 0 / Train-Loss: 0.3294 / Val-Loss: 0.2767 / Test-Loss: 0.2716 / Time taken: 0:00:29 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0305  : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 1 / Train-Loss: 0.2601 / Val-Loss: 0.2634 / Test-Loss: 0.2568 / Time taken: 0:00:51 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0301  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 08/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0335 : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 14 / Train-Loss: 0.2383 / Val-Loss: 0.2530 / Test-Loss: 0.2439 / Time taken: 0:04:17 / ---- Currently Best Val-Epoch: 14 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 15 / Train-Loss: 0.2382 / Val-Loss: 0.2527 / Test-Loss: 0.2436 / Time taken: 0:04:32 / ---- Currently Best Val-Epoch: 15 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 16 / Batch: 536 / Train-Loss (Batch): 0.0334 : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 16 / Train-Loss: 0.2381 / Val-Loss: 0.2528 / Test-Loss: 0.2437 / Time taken: 0:04:48 / ---- Currently Best Val-Epoch: 15 
Ensemble: 08/14 / Epoch: 17 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 17 / Train-Loss: 0.2381 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
124,FT_transformer (run: 4),90,2107.885792,27133,0.236846,0.238730,0.063062,0.063192
125,FT_transformer (run: 5),71,1416.576316,27133,0.237981,0.239434,0.060926,0.061031
126,FT_transformer (run: 6),109,1999.786216,27133,0.236723,0.238745,0.061890,0.061992
127,FT_transformer (run: 7),83,1588.023347,27133,0.236495,0.238729,0.064304,0.064473


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 09-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 09/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0102  : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 0 / Train-Loss: 0.3012 / Val-Loss: 0.2620 / Test-Loss: 0.2600 / Time taken: 0:00:43 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 09/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0076  : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 1 / Train-Loss: 0.2537 / Val-Loss: 0.2565 / Test-Loss: 0.2542 / Time taken: 0:00:57 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 09/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0065  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 09/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0061 : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 11 / Train-Loss: 0.2391 / Val-Loss: 0.2497 / Test-Loss: 0.2453 / Time taken: 0:03:22 / ---- Currently Best Val-Epoch: 10 
Ensemble: 09/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0061 : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 12 / Train-Loss: 0.2390 / Val-Loss: 0.2500 / Test-Loss: 0.2455 / Time taken: 0:03:37 / ---- Currently Best Val-Epoch: 10 
Ensemble: 09/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.006  : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 13 / Train-Loss: 0.2387 / Val-Loss: 0.2499 / Test-Loss: 0.2452 / Time taken: 0:03:51 / ---- Currently Best Val-Epoch: 10 
Ensemble: 09/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0061 : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 14 / Train-Loss: 0.2387 / Val-Loss: 0.2491 / Test-Loss: 0.2444 / Time taken: 0:04:06 / ---

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
125,FT_transformer (run: 5),71,1416.576316,27133,0.237981,0.239434,0.060926,0.061031
126,FT_transformer (run: 6),109,1999.786216,27133,0.236723,0.238745,0.061890,0.061992
127,FT_transformer (run: 7),83,1588.023347,27133,0.236495,0.238729,0.064304,0.064473
128,FT_transformer (run: 8),71,1392.158405,27133,0.237939,0.239110,0.060636,0.060748


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 10-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 10/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0297  : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 0 / Train-Loss: 0.3201 / Val-Loss: 0.2548 / Test-Loss: 0.2559 / Time taken: 0:00:28 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0301  : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 1 / Train-Loss: 0.2521 / Val-Loss: 0.2519 / Test-Loss: 0.2527 / Time taken: 0:00:50 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0308  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 10/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0325 : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 11 / Train-Loss: 0.2391 / Val-Loss: 0.2448 / Test-Loss: 0.2443 / Time taken: 0:03:28 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 12 / Train-Loss: 0.2391 / Val-Loss: 0.2444 / Test-Loss: 0.2439 / Time taken: 0:03:43 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 13 / Train-Loss: 0.2390 / Val-Loss: 0.2441 / Test-Loss: 0.2435 / Time taken: 0:03:58 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0325 : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
126,FT_transformer (run: 6),109,1999.786216,27133,0.236723,0.238745,0.061890,0.061992
127,FT_transformer (run: 7),83,1588.023347,27133,0.236495,0.238729,0.064304,0.064473
128,FT_transformer (run: 8),71,1392.158405,27133,0.237939,0.239110,0.060636,0.060748
129,FT_transformer (run: 9),70,1372.753220,27133,0.239179,0.240097,0.058446,0.058635


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 11-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 11/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0304  : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 0 / Train-Loss: 0.2791 / Val-Loss: 0.2444 / Test-Loss: 0.2541 / Time taken: 0:00:28 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0307  : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 1 / Train-Loss: 0.2491 / Val-Loss: 0.2402 / Test-Loss: 0.2497 / Time taken: 0:00:43 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0313  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 11/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0329 : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 11 / Train-Loss: 0.2398 / Val-Loss: 0.2338 / Test-Loss: 0.2426 / Time taken: 0:03:09 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 12 / Train-Loss: 0.2398 / Val-Loss: 0.2342 / Test-Loss: 0.2432 / Time taken: 0:03:24 / ---- Currently Best Val-Epoch: 11 
Ensemble: 11/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0337 : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 13 / Train-Loss: 0.2399 / Val-Loss: 0.2328 / Test-Loss: 0.2417 / Time taken: 0:03:38 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0339 : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 14 / Train-Loss: 0.2399 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
127,FT_transformer (run: 7),83,1588.023347,27133,0.236495,0.238729,0.064304,0.064473
128,FT_transformer (run: 8),71,1392.158405,27133,0.237939,0.239110,0.060636,0.060748
129,FT_transformer (run: 9),70,1372.753220,27133,0.239179,0.240097,0.058446,0.058635
130,FT_transformer (run: 10),55,1187.509606,27133,0.239155,0.239897,0.059857,0.060138


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 12-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 12/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0305  : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 0 / Train-Loss: 0.2906 / Val-Loss: 0.2516 / Test-Loss: 0.2541 / Time taken: 0:00:45 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0306  : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 1 / Train-Loss: 0.2482 / Val-Loss: 0.2485 / Test-Loss: 0.2505 / Time taken: 0:01:07 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0308  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 12/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0329 : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 13 / Train-Loss: 0.2388 / Val-Loss: 0.2441 / Test-Loss: 0.2434 / Time taken: 0:04:28 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0335 : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 14 / Train-Loss: 0.2386 / Val-Loss: 0.2440 / Test-Loss: 0.2433 / Time taken: 0:04:43 / ---- Currently Best Val-Epoch: 14 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 15 / Train-Loss: 0.2384 / Val-Loss: 0.2438 / Test-Loss: 0.2429 / Time taken: 0:04:58 / ---- Currently Best Val-Epoch: 15 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 16 / Batch: 536 / Train-Loss (Batch): 0.0329 : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
128,FT_transformer (run: 8),71,1392.158405,27133,0.237939,0.239110,0.060636,0.060748
129,FT_transformer (run: 9),70,1372.753220,27133,0.239179,0.240097,0.058446,0.058635
130,FT_transformer (run: 10),55,1187.509606,27133,0.239155,0.239897,0.059857,0.060138
131,FT_transformer (run: 11),86,1705.731414,27133,0.236645,0.238818,0.062893,0.063125


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 13-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 13/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0301  : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 0 / Train-Loss: 0.2791 / Val-Loss: 0.2501 / Test-Loss: 0.2547 / Time taken: 0:00:43 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0306  : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 1 / Train-Loss: 0.2497 / Val-Loss: 0.2465 / Test-Loss: 0.2504 / Time taken: 0:01:05 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0308  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 13/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0335 : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 12 / Train-Loss: 0.2392 / Val-Loss: 0.2430 / Test-Loss: 0.2444 / Time taken: 0:04:15 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0333 : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 13 / Train-Loss: 0.2391 / Val-Loss: 0.2429 / Test-Loss: 0.2443 / Time taken: 0:04:30 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 14 / Train-Loss: 0.2390 / Val-Loss: 0.2429 / Test-Loss: 0.2443 / Time taken: 0:04:45 / ---- Currently Best Val-Epoch: 14 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
129,FT_transformer (run: 9),70,1372.753220,27133,0.239179,0.240097,0.058446,0.058635
130,FT_transformer (run: 10),55,1187.509606,27133,0.239155,0.239897,0.059857,0.060138
131,FT_transformer (run: 11),86,1705.731414,27133,0.236645,0.238818,0.062893,0.063125
132,FT_transformer (run: 12),68,1463.326831,27133,0.238008,0.239344,0.061057,0.061140


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 14-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 14/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0315  : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 0 / Train-Loss: 0.3222 / Val-Loss: 0.2598 / Test-Loss: 0.2640 / Time taken: 0:00:31 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 14/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.03    : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 1 / Train-Loss: 0.2565 / Val-Loss: 0.2505 / Test-Loss: 0.2549 / Time taken: 0:00:48 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 14/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0301  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 14/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0323 : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 12 / Train-Loss: 0.2394 / Val-Loss: 0.2425 / Test-Loss: 0.2448 / Time taken: 0:03:40 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 14/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 13 / Train-Loss: 0.2392 / Val-Loss: 0.2427 / Test-Loss: 0.2450 / Time taken: 0:03:55 / ---- Currently Best Val-Epoch: 12 
Ensemble: 14/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0329 : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 14 / Train-Loss: 0.2392 / Val-Loss: 0.2422 / Test-Loss: 0.2445 / Time taken: 0:04:10 / ---- Currently Best Val-Epoch: 14 <------- Best VAL Epoch so far
Ensemble: 14/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 15 / Train-Loss: 0.2391 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
130,FT_transformer (run: 10),55,1187.509606,27133,0.239155,0.239897,0.059857,0.060138
131,FT_transformer (run: 11),86,1705.731414,27133,0.236645,0.238818,0.062893,0.063125
132,FT_transformer (run: 12),68,1463.326831,27133,0.238008,0.239344,0.061057,0.061140
133,FT_transformer (run: 13),100,1992.376854,27133,0.237393,0.239739,0.060784,0.060886


In [None]:
# # save the results:
# with open(f'{storage_path}/Data/df_results.pickle', 'wb') as handle:
#     pickle.dump(df_results, handle, protocol=pickle.HIGHEST_PROTOCOL)

# load the results:
with open(f'{storage_path}/Data/df_results.pickle', 'rb') as handle:
    df_results = pickle.load(handle)

## 4.2 CANN-FT-Transformer:  

Note: we run the code 15 times on different seeds (to calc the avg and std of runtime and results).

Note: the Code for the CANN below is basically the same code as for the FT-Transformer. The only changes are
* we use a other exposure column (Exposure_x_GLM3_pred instead of Exposure)
* we set the initial weights and bias of the last layer to zero

In [None]:
# create the new exposure times GLM3_pred column for CANN models.
df_freq_prep_nn["Exposure_x_GLM3_pred"] = list(poisson_glm3.predict(X_glm3)*df_freq_prep_nn["Exposure"])

In [None]:
# Create the dataframes needed for evaluation:
# --------------------
# NOTE: in the 2021 Gorishniy et al paper the batch size is different for the different Datasets
# but is not hyperparameter tuned. Bigger datasets they used a batch size of 1024 and
# for smaller datasets a batch size of (256/512).
batch_size = 1024
learn_data = df_to_tensor(df_freq_prep_nn[bool_in_learn], feature_cols=nr_col+cat_col, exposure="Exposure_x_GLM3_pred", target="ClaimNb", batch_size=batch_size)
test_data = df_to_tensor(df_freq_prep_nn[bool_in_test], feature_cols=nr_col+cat_col, exposure="Exposure_x_GLM3_pred", target="ClaimNb", batch_size=batch_size)

# NOTE we use at first just a fraction of the data to test the code:
learn_train_dummy_data = df_to_tensor(df_freq_prep_nn[bool_in_learn_train_dummy], feature_cols=nr_col+cat_col, exposure="Exposure_x_GLM3_pred", target="ClaimNb", batch_size=batch_size,
                                      dummy_data_for_build=True)

for run_index in range(15):
    # Create the dataframes needed for training:
    learn_train_data = df_to_tensor(df_freq_prep_nn[train_val_split[f"learn_train_{run_index}"]],
                                    feature_cols=nr_col+cat_col, exposure="Exposure_x_GLM3_pred", target="ClaimNb", batch_size=batch_size)
    learn_val_data = df_to_tensor(df_freq_prep_nn[train_val_split[f"learn_val_{run_index}"]],
                                  feature_cols=nr_col+cat_col, exposure="Exposure_x_GLM3_pred", target="ClaimNb", batch_size=batch_size)

    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    print(f"-------We are at Model: {str(run_index).zfill(2)}-----------------------")
    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    # Define FT-Transformer Models:
    # ----------------------
    # NOTE: we use here tensorflow/keras model subclasses (not the functional or sequential api)
    # NOTE: we use here instead of the .fit function a costum training loop

    # create the model:
    # ----------------------
    set_random_seeds(int(random_seeds[run_index]))

    FT_transformer = EnhActuar.Feature_Tokenizer_Transformer(
            emb_dim = 32, # NOTE: In the default setting for the 2021 Gorishniy paper they used emb_dim = 192 (but the parameter size would here go trough the roof, so we use something smaller)
            nr_features = nr_col,
            cat_features = cat_col,
            cat_vocabulary = cat_vocabulary,
            count_transformer_blocks = 3,
            attention_n_heads = 8,
            attention_dropout = 0.2,
            ffn_d_hidden = None, # NOTE: change to None if ReGLU should be used -> None uses default value (4/3*emb_dim), they write that they used 2*emb_dim if not ReGLU.
            ffn_activation_ReGLU = True, # NOTE: set True if ReGLU should be used
            ffn_dropout = 0.1,
            prenormalization = True,
            output_dim = 1,
            last_activation = 'exponential',
            exposure_name = "Exposure_x_GLM3_pred",
            last_layer_initial_weights = "zeros",
            last_layer_initial_bias = "zeros",
            seed_nr = int(random_seeds[run_index])
    )

    # See here regarding costum training loop: https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch

    # Instantiate an optimizer to train the model.
    # ----------------------
    # create an optimizer AdamW with learning rate 1e-4, weight decay 1e-5:
    optimizer = tf.keras.optimizers.AdamW(learning_rate=1e-4, weight_decay=1e-5)

    # Instantiate a loss function
    # ----------------------
    # we use our own loss function here
    # because it is not included in tensorflow in the same way (see section loss function for more details):
    loss_fn = Poisson_loss_for_tf_Wrapped()

    # Prepare the metrics.
    # ----------------------
    # we use a costume metric here (because it is not included in tensorflow in the same way):
    train_acc_metric = Poisson_Metric_for_tf()
    val_acc_metric = Poisson_Metric_for_tf()
    test_acc_metric = Poisson_Metric_for_tf()

    @tf.function
    def train_step(x, y):
        # Open a GradientTape to record the operations run during the forward pass, which enables auto-differentiation.
        with tf.GradientTape() as tape:
            # Run the forward pass of the layer. The operations that the layer applies to its inputs are going to be recorded on the GradientTape.
            y_pred = FT_transformer(x, training=True)["output"]  # prediction for this minibatch
            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y, y_pred)
        # Use the gradient tape to automatically retrieve the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, FT_transformer.trainable_weights)
        # Run one step of gradient descent by updating the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, FT_transformer.trainable_weights))
        # Update training metric.
        train_acc_metric.update_state(y, y_pred)
        return loss_value

    @tf.function
    def val_step(x, y):
        # Run the forward pass of the layer.
        # (note: training=False is needed because the layers have different behavior during training versus inference (e.g. Dropout))
        y_pred = FT_transformer(x, training=False)["output"]
        # Update val metrics
        val_acc_metric.update_state(y, y_pred)

    @tf.function
    def test_step(x, y):
        # Run the forward pass of the layer.
        # (note: training=False is needed because the layers have different behavior during training versus inference (e.g. Dropout))
        y_pred = FT_transformer(x, training=False)["output"]
        # Update val metrics
        test_acc_metric.update_state(y, y_pred)

    # model fitting:
    # ----------------------
    start_time = time.time()
    Val_Progress = helper.Easy_ProgressTracker(patience=15)
    epochs = 500

    for epoch in range(epochs):
        # Iterate over the batches of the dataset.
        for step, (x_batch_train, y_batch_train) in enumerate(learn_train_data):
            loss_value = train_step(x_batch_train, y_batch_train)
            helper.costume_progress_bar(f"Ensemble: {str(run_index).zfill(2)}/{14} / Epoch: {epoch} / Batch: {step} / Train-Loss (Batch): {round(float(loss_value),4)}",step,len(learn_train_data), 30)

        # Display metrics at the end of each epoch.
        print_train_loss = train_acc_metric.result()
        # Reset training metrics at the end of each epoch
        train_acc_metric.reset_states()

        # Run a validation at the end of each epoch.
        for x_batch_val, y_batch_val in learn_val_data:
            val_step(x_batch_val, y_batch_val)
        print_val_loss = val_acc_metric.result()
        val_acc_metric.reset_states()
        for x_batch_test, y_batch_test in test_data:
            test_step(x_batch_test, y_batch_test)
        print_test_loss = test_acc_metric.result()
        test_acc_metric.reset_states()

        Val_Progress(current_epoch=epoch, current_score = print_val_loss)

        print(f"\nEnsemble: {str(run_index).zfill(2)}/{14} / Epoch: {epoch} / Train-Loss: %.4f / Val-Loss: %.4f / Test-Loss: %.4f / Time taken: %s / ---- Currently Best Val-Epoch: %d" % (
            # str(run_index).zfill(2),
            float(print_train_loss),
            float(print_val_loss),
            float(print_test_loss),
            datetime.timedelta(seconds=int(time.time() - start_time)),
            Val_Progress.best_epoch
            ), end = " ")
        if Val_Progress.progress == True:
            print("<------- Best VAL Epoch so far")
        else:
            print("\r")


        # Callback: save best model / early stopping:
        # ----------------------
        earliest_epoch2save = 10
        if Val_Progress.progress and Val_Progress.current_epoch >= earliest_epoch2save:
            FT_transformer.save_weights(f'{storage_path}/saved_models/Poisson_CAFTT_{run_index}.weights.h5')
        if Val_Progress.patience_over:
            break

    # create some metrics after the loop
    best_epoch_FT_transformer = Val_Progress.best_epoch
    execution_time_FT_transformer = time.time() - start_time

    # load the best saved model and epochs_and_time from the pickle file:
    # ----------------------
    FT_transformer.load_weights(f'{storage_path}/saved_models/Poisson_CAFTT_{run_index}.weights.h5')

    # predict with the model:
    # ----------------------
    y_pred["train"][f"CAFTT"] = np.array([x for [x] in FT_transformer.predict(learn_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                )["output"]])
    y_pred["test"][f"CAFTT"] = np.array([x for [x] in FT_transformer.predict(test_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                )["output"]])

    # evaluate the model:
    # ----------------------
    CAFTT_results = Results(model=f"CAFTT (run: {run_index})",
                                epochs=best_epoch_FT_transformer,
                                run_time=execution_time_FT_transformer,
                                nr_parameters=[np.sum([np.prod(v.get_shape().as_list()) for v in FT_transformer.trainable_weights])],
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"CAFTT"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"CAFTT"]),
                                pred_avg_freq_train=y_pred["train"][f"CAFTT"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"CAFTT"].sum()/exposure["test"].sum())
    # store the results in the dataframe:
    store_results_in_df(CAFTT_results)
    display(df_results)
    # save the results:
    with open(f'{storage_path}/Data/df_results.pickle', 'wb') as handle:
        pickle.dump(df_results, handle, protocol=pickle.HIGHEST_PROTOCOL)


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 00-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 00/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0329  : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 0 / Train-Loss: 0.2403 / Val-Loss: 0.2449 / Test-Loss: 0.2414 / Time taken: 0:00:43 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0329  : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 1 / Train-Loss: 0.2405 / Val-Loss: 0.2449 / Test-Loss: 0.2413 / Time taken: 0:00:56 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0329  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 00/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 11 / Train-Loss: 0.2388 / Val-Loss: 0.2436 / Test-Loss: 0.2397 / Time taken: 0:03:44 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0334 : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 12 / Train-Loss: 0.2387 / Val-Loss: 0.2434 / Test-Loss: 0.2395 / Time taken: 0:03:59 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0333 : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 13 / Train-Loss: 0.2386 / Val-Loss: 0.2432 / Test-Loss: 0.2394 / Time taken: 0:04:13 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
131,FT_transformer (run: 11),86,1705.731414,27133,0.236645,0.238818,0.062893,0.063125
132,FT_transformer (run: 12),68,1463.326831,27133,0.238008,0.239344,0.061057,0.061140
133,FT_transformer (run: 13),100,1992.376854,27133,0.237393,0.239739,0.060784,0.060886
134,FT_transformer (run: 14),89,1749.585324,27133,0.238085,0.240123,0.061349,0.061344


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 01-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 01/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.007   : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 0 / Train-Loss: 0.2402 / Val-Loss: 0.2458 / Test-Loss: 0.2413 / Time taken: 0:00:44 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.007   : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 1 / Train-Loss: 0.2404 / Val-Loss: 0.2457 / Test-Loss: 0.2413 / Time taken: 0:00:57 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0071  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 01/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0068 : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 11 / Train-Loss: 0.2385 / Val-Loss: 0.2441 / Test-Loss: 0.2395 / Time taken: 0:03:22 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0069 : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 12 / Train-Loss: 0.2384 / Val-Loss: 0.2438 / Test-Loss: 0.2393 / Time taken: 0:03:44 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0067 : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 13 / Train-Loss: 0.2383 / Val-Loss: 0.2440 / Test-Loss: 0.2394 / Time taken: 0:04:06 / ---- Currently Best Val-Epoch: 12 
Ensemble: 01/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0068 : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 14 / Train-Loss: 0.2383 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
132,FT_transformer (run: 12),68,1463.326831,27133,0.238008,0.239344,0.061057,0.061140
133,FT_transformer (run: 13),100,1992.376854,27133,0.237393,0.239739,0.060784,0.060886
134,FT_transformer (run: 14),89,1749.585324,27133,0.238085,0.240123,0.061349,0.061344
135,CAFTT (run: 0),42,930.150725,27133,0.237616,0.238094,0.066199,0.066450


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 02-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 02/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 0 / Train-Loss: 0.2409 / Val-Loss: 0.2400 / Test-Loss: 0.2413 / Time taken: 0:00:30 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 1 / Train-Loss: 0.2410 / Val-Loss: 0.2400 / Test-Loss: 0.2413 / Time taken: 0:00:43 / ---- Currently Best Val-Epoch: 0 
Ensemble: 02/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 02/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0336 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 11 / Train-Loss: 0.2391 / Val-Loss: 0.2388 / Test-Loss: 0.2396 / Time taken: 0:03:01 / ---- Currently Best Val-Epoch: 10 
Ensemble: 02/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0336 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 12 / Train-Loss: 0.2390 / Val-Loss: 0.2389 / Test-Loss: 0.2397 / Time taken: 0:03:16 / ---- Currently Best Val-Epoch: 10 
Ensemble: 02/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0336 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 13 / Train-Loss: 0.2388 / Val-Loss: 0.2387 / Test-Loss: 0.2394 / Time taken: 0:03:30 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 14 / Train-Loss: 0.2387 / Val-Loss: 0.2387 / Test-Loss: 0.23

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
133,FT_transformer (run: 13),100,1992.376854,27133,0.237393,0.239739,0.060784,0.060886
134,FT_transformer (run: 14),89,1749.585324,27133,0.238085,0.240123,0.061349,0.061344
135,CAFTT (run: 0),42,930.150725,27133,0.237616,0.238094,0.066199,0.066450
136,CAFTT (run: 1),49,1030.421487,27133,0.237010,0.237976,0.065635,0.065934


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 03-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 03/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 0 / Train-Loss: 0.2401 / Val-Loss: 0.2468 / Test-Loss: 0.2413 / Time taken: 0:00:44 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 03/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 1 / Train-Loss: 0.2402 / Val-Loss: 0.2469 / Test-Loss: 0.2414 / Time taken: 0:00:59 / ---- Currently Best Val-Epoch: 0 
Ensemble: 03/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0333  : [##############################] 99.


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 03/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0341 : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 13 / Train-Loss: 0.2382 / Val-Loss: 0.2457 / Test-Loss: 0.2395 / Time taken: 0:04:11 / ---- Currently Best Val-Epoch: 12 
Ensemble: 03/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0341 : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 14 / Train-Loss: 0.2382 / Val-Loss: 0.2458 / Test-Loss: 0.2394 / Time taken: 0:04:25 / ---- Currently Best Val-Epoch: 12 
Ensemble: 03/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.0341 : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 15 / Train-Loss: 0.2382 / Val-Loss: 0.2456 / Test-Loss: 0.2393 / Time taken: 0:04:39 / ---- Currently Best Val-Epoch: 15 <------- Best VAL Epoch so far
Ensemble: 03/14 / Epoch: 16 / Batch: 536 / Train-Loss (Batch): 0.034  : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 16 / Train-Loss: 0.2380 / Val-Loss: 0.2457 / Test-Loss: 0.23

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
134,FT_transformer (run: 14),89,1749.585324,27133,0.238085,0.240123,0.061349,0.061344
135,CAFTT (run: 0),42,930.150725,27133,0.237616,0.238094,0.066199,0.066450
136,CAFTT (run: 1),49,1030.421487,27133,0.237010,0.237976,0.065635,0.065934
137,CAFTT (run: 2),55,1157.808053,27133,0.237262,0.238194,0.066101,0.066402


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 04-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 04/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 0 / Train-Loss: 0.2412 / Val-Loss: 0.2374 / Test-Loss: 0.2412 / Time taken: 0:00:30 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 1 / Train-Loss: 0.2413 / Val-Loss: 0.2373 / Test-Loss: 0.2412 / Time taken: 0:00:44 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0332  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 04/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0337 : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 11 / Train-Loss: 0.2396 / Val-Loss: 0.2358 / Test-Loss: 0.2392 / Time taken: 0:03:21 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0339 : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 12 / Train-Loss: 0.2394 / Val-Loss: 0.2357 / Test-Loss: 0.2390 / Time taken: 0:03:35 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0341 : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 13 / Train-Loss: 0.2393 / Val-Loss: 0.2358 / Test-Loss: 0.2392 / Time taken: 0:03:57 / ---- Currently Best Val-Epoch: 12 
Ensemble: 04/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.034  : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 14 / Train-Loss: 0.2391 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
135,CAFTT (run: 0),42,930.150725,27133,0.237616,0.238094,0.066199,0.066450
136,CAFTT (run: 1),49,1030.421487,27133,0.237010,0.237976,0.065635,0.065934
137,CAFTT (run: 2),55,1157.808053,27133,0.237262,0.238194,0.066101,0.066402
138,CAFTT (run: 3),76,1503.508472,27133,0.236652,0.237869,0.065459,0.065667


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 05-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 05/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 0 / Train-Loss: 0.2410 / Val-Loss: 0.2387 / Test-Loss: 0.2413 / Time taken: 0:00:29 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 05/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 1 / Train-Loss: 0.2411 / Val-Loss: 0.2387 / Test-Loss: 0.2413 / Time taken: 0:00:42 / ---- Currently Best Val-Epoch: 0 
Ensemble: 05/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 05/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0335 : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 11 / Train-Loss: 0.2389 / Val-Loss: 0.2370 / Test-Loss: 0.2392 / Time taken: 0:03:25 / ---- Currently Best Val-Epoch: 10 
Ensemble: 05/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0335 : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 12 / Train-Loss: 0.2388 / Val-Loss: 0.2369 / Test-Loss: 0.2390 / Time taken: 0:03:47 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 05/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0336 : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 13 / Train-Loss: 0.2387 / Val-Loss: 0.2369 / Test-Loss: 0.2389 / Time taken: 0:04:01 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 05/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0338 : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 14 / Train-Loss: 0.2387 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
136,CAFTT (run: 1),49,1030.421487,27133,0.237010,0.237976,0.065635,0.065934
137,CAFTT (run: 2),55,1157.808053,27133,0.237262,0.238194,0.066101,0.066402
138,CAFTT (run: 3),76,1503.508472,27133,0.236652,0.237869,0.065459,0.065667
139,CAFTT (run: 4),68,1342.143729,27133,0.236980,0.238098,0.066915,0.067137


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 06-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 06/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 0 / Train-Loss: 0.2408 / Val-Loss: 0.2409 / Test-Loss: 0.2413 / Time taken: 0:00:30 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 1 / Train-Loss: 0.2409 / Val-Loss: 0.2408 / Test-Loss: 0.2413 / Time taken: 0:00:44 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0332  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 06/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0336 : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 11 / Train-Loss: 0.2392 / Val-Loss: 0.2392 / Test-Loss: 0.2396 / Time taken: 0:03:22 / ---- Currently Best Val-Epoch: 10 
Ensemble: 06/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 12 / Train-Loss: 0.2390 / Val-Loss: 0.2389 / Test-Loss: 0.2393 / Time taken: 0:03:36 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 13 / Train-Loss: 0.2390 / Val-Loss: 0.2390 / Test-Loss: 0.2395 / Time taken: 0:03:59 / ---- Currently Best Val-Epoch: 12 
Ensemble: 06/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0339 : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 14 / Train-Loss: 0.2389 / Val-Loss: 0.2389 / Test-Loss: 0.23

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
137,CAFTT (run: 2),55,1157.808053,27133,0.237262,0.238194,0.066101,0.066402
138,CAFTT (run: 3),76,1503.508472,27133,0.236652,0.237869,0.065459,0.065667
139,CAFTT (run: 4),68,1342.143729,27133,0.236980,0.238098,0.066915,0.067137
140,CAFTT (run: 5),57,1170.775908,27133,0.237184,0.238024,0.066190,0.066403


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 07-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 07/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0328  : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 0 / Train-Loss: 0.2404 / Val-Loss: 0.2449 / Test-Loss: 0.2413 / Time taken: 0:00:33 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0328  : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 1 / Train-Loss: 0.2404 / Val-Loss: 0.2448 / Test-Loss: 0.2412 / Time taken: 0:00:49 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0328  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 07/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0336 : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 11 / Train-Loss: 0.2386 / Val-Loss: 0.2439 / Test-Loss: 0.2397 / Time taken: 0:03:25 / ---- Currently Best Val-Epoch: 10 
Ensemble: 07/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0334 : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 12 / Train-Loss: 0.2386 / Val-Loss: 0.2437 / Test-Loss: 0.2396 / Time taken: 0:03:39 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0335 : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 13 / Train-Loss: 0.2384 / Val-Loss: 0.2438 / Test-Loss: 0.2396 / Time taken: 0:03:53 / ---- Currently Best Val-Epoch: 12 
Ensemble: 07/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0337 : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 14 / Train-Loss: 0.2384 / Val-Loss: 0.2436 / Test-Loss: 0.23

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
138,CAFTT (run: 3),76,1503.508472,27133,0.236652,0.237869,0.065459,0.065667
139,CAFTT (run: 4),68,1342.143729,27133,0.236980,0.238098,0.066915,0.067137
140,CAFTT (run: 5),57,1170.775908,27133,0.237184,0.238024,0.066190,0.066403
141,CAFTT (run: 6),63,1285.190307,27133,0.237250,0.237963,0.065993,0.066259


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 08-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 08/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0333  : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 0 / Train-Loss: 0.2399 / Val-Loss: 0.2494 / Test-Loss: 0.2413 / Time taken: 0:00:44 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0333  : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 1 / Train-Loss: 0.2399 / Val-Loss: 0.2494 / Test-Loss: 0.2413 / Time taken: 0:01:06 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0334  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 08/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0341 : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 11 / Train-Loss: 0.2381 / Val-Loss: 0.2479 / Test-Loss: 0.2395 / Time taken: 0:03:28 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.034  : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 12 / Train-Loss: 0.2380 / Val-Loss: 0.2476 / Test-Loss: 0.2393 / Time taken: 0:03:42 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.034  : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 13 / Train-Loss: 0.2379 / Val-Loss: 0.2476 / Test-Loss: 0.2393 / Time taken: 0:03:56 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0342 : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
139,CAFTT (run: 4),68,1342.143729,27133,0.236980,0.238098,0.066915,0.067137
140,CAFTT (run: 5),57,1170.775908,27133,0.237184,0.238024,0.066190,0.066403
141,CAFTT (run: 6),63,1285.190307,27133,0.237250,0.237963,0.065993,0.066259
142,CAFTT (run: 7),82,1445.914973,27133,0.236267,0.238129,0.066620,0.066841


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 09-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 09/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0072  : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 0 / Train-Loss: 0.2402 / Val-Loss: 0.2459 / Test-Loss: 0.2413 / Time taken: 0:00:28 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 09/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0071  : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 1 / Train-Loss: 0.2404 / Val-Loss: 0.2460 / Test-Loss: 0.2414 / Time taken: 0:00:49 / ---- Currently Best Val-Epoch: 0 
Ensemble: 09/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0072  : [##############################] 99.


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 09/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0067 : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 12 / Train-Loss: 0.2380 / Val-Loss: 0.2443 / Test-Loss: 0.2391 / Time taken: 0:03:42 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 09/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0068 : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 13 / Train-Loss: 0.2379 / Val-Loss: 0.2443 / Test-Loss: 0.2391 / Time taken: 0:03:56 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 09/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0066 : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 14 / Train-Loss: 0.2379 / Val-Loss: 0.2443 / Test-Loss: 0.2391 / Time taken: 0:04:10 / ---- Currently Best Val-Epoch: 13 
Ensemble: 09/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.0067 : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 15 / Train-Loss: 0.2378 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
140,CAFTT (run: 5),57,1170.775908,27133,0.237184,0.238024,0.066190,0.066403
141,CAFTT (run: 6),63,1285.190307,27133,0.237250,0.237963,0.065993,0.066259
142,CAFTT (run: 7),82,1445.914973,27133,0.236267,0.238129,0.066620,0.066841
143,CAFTT (run: 8),74,1397.400270,27133,0.236362,0.237732,0.066140,0.066352


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 10-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 10/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0328  : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 0 / Train-Loss: 0.2407 / Val-Loss: 0.2414 / Test-Loss: 0.2413 / Time taken: 0:00:28 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0328  : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 1 / Train-Loss: 0.2408 / Val-Loss: 0.2414 / Test-Loss: 0.2413 / Time taken: 0:00:43 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0328  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 10/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0333 : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 11 / Train-Loss: 0.2386 / Val-Loss: 0.2401 / Test-Loss: 0.2394 / Time taken: 0:03:12 / ---- Currently Best Val-Epoch: 10 
Ensemble: 10/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 12 / Train-Loss: 0.2387 / Val-Loss: 0.2399 / Test-Loss: 0.2393 / Time taken: 0:03:34 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0334 : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 13 / Train-Loss: 0.2386 / Val-Loss: 0.2400 / Test-Loss: 0.2393 / Time taken: 0:03:49 / ---- Currently Best Val-Epoch: 12 
Ensemble: 10/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0337 : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 14 / Train-Loss: 0.2385 / Val-Loss: 0.2399 / Test-Loss: 0.23

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
141,CAFTT (run: 6),63,1285.190307,27133,0.237250,0.237963,0.065993,0.066259
142,CAFTT (run: 7),82,1445.914973,27133,0.236267,0.238129,0.066620,0.066841
143,CAFTT (run: 8),74,1397.400270,27133,0.236362,0.237732,0.066140,0.066352
144,CAFTT (run: 9),29,687.138017,27133,0.238057,0.238411,0.065254,0.065593


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 11-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 11/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 0 / Train-Loss: 0.2417 / Val-Loss: 0.2327 / Test-Loss: 0.2413 / Time taken: 0:00:28 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 1 / Train-Loss: 0.2418 / Val-Loss: 0.2326 / Test-Loss: 0.2412 / Time taken: 0:00:42 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0331  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 11/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0341 : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 14 / Train-Loss: 0.2397 / Val-Loss: 0.2305 / Test-Loss: 0.2394 / Time taken: 0:04:14 / ---- Currently Best Val-Epoch: 14 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.0342 : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 15 / Train-Loss: 0.2397 / Val-Loss: 0.2304 / Test-Loss: 0.2393 / Time taken: 0:04:29 / ---- Currently Best Val-Epoch: 15 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 16 / Batch: 536 / Train-Loss (Batch): 0.034  : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 16 / Train-Loss: 0.2396 / Val-Loss: 0.2303 / Test-Loss: 0.2392 / Time taken: 0:04:44 / ---- Currently Best Val-Epoch: 16 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 17 / Batch: 536 / Train-Loss (Batch): 0.034  : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
142,CAFTT (run: 7),82,1445.914973,27133,0.236267,0.238129,0.066620,0.066841
143,CAFTT (run: 8),74,1397.400270,27133,0.236362,0.237732,0.066140,0.066352
144,CAFTT (run: 9),29,687.138017,27133,0.238057,0.238411,0.065254,0.065593
145,CAFTT (run: 10),58,1175.236680,27133,0.237128,0.238169,0.065283,0.065611


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 12-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 12/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 0 / Train-Loss: 0.2407 / Val-Loss: 0.2418 / Test-Loss: 0.2413 / Time taken: 0:00:30 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 1 / Train-Loss: 0.2408 / Val-Loss: 0.2418 / Test-Loss: 0.2413 / Time taken: 0:00:45 / ---- Currently Best Val-Epoch: 0 
Ensemble: 12/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 12/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0337 : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 12 / Train-Loss: 0.2387 / Val-Loss: 0.2409 / Test-Loss: 0.2395 / Time taken: 0:03:43 / ---- Currently Best Val-Epoch: 11 
Ensemble: 12/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0337 : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 13 / Train-Loss: 0.2386 / Val-Loss: 0.2408 / Test-Loss: 0.2394 / Time taken: 0:04:04 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0341 : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 14 / Train-Loss: 0.2385 / Val-Loss: 0.2407 / Test-Loss: 0.2392 / Time taken: 0:04:19 / ---- Currently Best Val-Epoch: 14 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 15 / Train-Loss: 0.2385 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
143,CAFTT (run: 8),74,1397.400270,27133,0.236362,0.237732,0.066140,0.066352
144,CAFTT (run: 9),29,687.138017,27133,0.238057,0.238411,0.065254,0.065593
145,CAFTT (run: 10),58,1175.236680,27133,0.237128,0.238169,0.065283,0.065611
146,CAFTT (run: 11),59,1248.218261,27133,0.237067,0.238216,0.065540,0.065780


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 13-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 13/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 0 / Train-Loss: 0.2409 / Val-Loss: 0.2400 / Test-Loss: 0.2413 / Time taken: 0:00:43 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 1 / Train-Loss: 0.2410 / Val-Loss: 0.2400 / Test-Loss: 0.2412 / Time taken: 0:00:58 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0331  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 13/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 11 / Train-Loss: 0.2394 / Val-Loss: 0.2386 / Test-Loss: 0.2396 / Time taken: 0:03:29 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0336 : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 12 / Train-Loss: 0.2392 / Val-Loss: 0.2384 / Test-Loss: 0.2395 / Time taken: 0:03:43 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0335 : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 13 / Train-Loss: 0.2390 / Val-Loss: 0.2383 / Test-Loss: 0.2393 / Time taken: 0:03:58 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
144,CAFTT (run: 9),29,687.138017,27133,0.238057,0.238411,0.065254,0.065593
145,CAFTT (run: 10),58,1175.236680,27133,0.237128,0.238169,0.065283,0.065611
146,CAFTT (run: 11),59,1248.218261,27133,0.237067,0.238216,0.065540,0.065780
147,CAFTT (run: 12),50,1096.780236,27133,0.237433,0.237963,0.065774,0.066086


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 14-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 14/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0334  : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 0 / Train-Loss: 0.2410 / Val-Loss: 0.2388 / Test-Loss: 0.2413 / Time taken: 0:00:28 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 14/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0335  : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 1 / Train-Loss: 0.2411 / Val-Loss: 0.2387 / Test-Loss: 0.2413 / Time taken: 0:00:43 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 14/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0334  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 14/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0342 : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 11 / Train-Loss: 0.2392 / Val-Loss: 0.2370 / Test-Loss: 0.2395 / Time taken: 0:03:31 / ---- Currently Best Val-Epoch: 10 
Ensemble: 14/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0342 : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 12 / Train-Loss: 0.2390 / Val-Loss: 0.2370 / Test-Loss: 0.2393 / Time taken: 0:03:45 / ---- Currently Best Val-Epoch: 10 
Ensemble: 14/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0343 : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 13 / Train-Loss: 0.2390 / Val-Loss: 0.2367 / Test-Loss: 0.2390 / Time taken: 0:03:59 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 14/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0343 : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 14 / Train-Loss: 0.2389 / Val-Loss: 0.2365 / Test-Loss: 0.23

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
145,CAFTT (run: 10),58,1175.236680,27133,0.237128,0.238169,0.065283,0.065611
146,CAFTT (run: 11),59,1248.218261,27133,0.237067,0.238216,0.065540,0.065780
147,CAFTT (run: 12),50,1096.780236,27133,0.237433,0.237963,0.065774,0.066086
148,CAFTT (run: 13),41,893.841639,27133,0.237690,0.238301,0.065890,0.066175


In [None]:
# # save the results:
# with open(f'{storage_path}/Data/df_results.pickle', 'wb') as handle:
#     pickle.dump(df_results, handle, protocol=pickle.HIGHEST_PROTOCOL)

# load the results:
with open(f'{storage_path}/Data/df_results.pickle', 'rb') as handle:
    df_results = pickle.load(handle)



## 4.3 LocalGLM-FT-Transformer:  

Note: we run the code 15 times on different seeds (to calc the avg and std of runtime and results).

In [None]:
# Create the dataframes for creation of the glm-ohe-start model:
# --------------------
data_nn_ohe_learn, y_true_learn = create_ffn_ohe_data(bool_in_learn)

# create dummy glm for initial weights
# ----------------------
poisson_glm_dummy = PoissonRegressor(alpha = 0,max_iter=1000) # scikit-learn.org: alpha = 0 is equivalent to unpenalized GLMs
poisson_glm_dummy.fit(data_nn_ohe_learn[0],y_true_learn/data_nn_ohe_learn[1],sample_weight=data_nn_ohe_learn[1]) # note: data_nn_ohe_learn = [X_ohe,exposure]
# get the betas from the glm:
glm_nr_col_betas = poisson_glm_dummy.coef_[:len(nr_col)]
current_beta_index = len(nr_col)
glm_cat_col_betas = {}
for c in cat_vocabulary.keys():
    glm_cat_col_betas[c] = poisson_glm_dummy.coef_[current_beta_index:current_beta_index+len(cat_vocabulary[c])]
    current_beta_index += len(cat_vocabulary[c])
glm_intercept = poisson_glm_dummy.intercept_


# Create the dataframes needed for evaluation:
# --------------------
# NOTE: in the 2021 Gorishniy paper the batch size is different for the different Datasets
# but is not hyperparameter tuned. Bigger datasets they used a batch size of 1024 and
# for smaller datasets a batch size of (256/512).
batch_size = 1024
learn_data = df_to_tensor(df_freq_prep_nn[bool_in_learn], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)
test_data = df_to_tensor(df_freq_prep_nn[bool_in_test], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)

for run_index in range(15):

    # Create the dataframes needed for training:
    learn_train_data = df_to_tensor(df_freq_prep_nn[train_val_split[f"learn_train_{run_index}"]],
                                    feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)
    learn_val_data = df_to_tensor(df_freq_prep_nn[train_val_split[f"learn_val_{run_index}"]],
                                  feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)

    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    print(f"-------We are at Model: {str(run_index).zfill(2)}-----------------------")
    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    print(f"-------------------------------------------------")
    # Define FT-Transformer Models:
    # ----------------------
    # NOTE: we use here tensorflow/keras model subclasses (not the functional or sequential api)
    # NOTE: we use here instead of the .fit function a costum training loop

    # create the model:
    # ----------------------
    set_random_seeds(int(random_seeds[run_index]))
    LocalGLMftt = EnhActuar.LocalGLM_FT_Transformer(
            emb_dim = 32, # NOTE: In the default setting for the 2021 Gorishniy paper they used emb_dim = 192 (but the parameter size would here go trough the roof, so we use something smaller)
            nr_features = nr_col,
            cat_features = cat_col,
            cat_vocabulary = cat_vocabulary,
            count_transformer_blocks = 3,
            attention_n_heads = 8,
            attention_dropout = 0.2,
            ffn_d_hidden = None, # NOTE: change to None if ReGLU should be used -> None uses default value (4/3*emb_dim), they write that they used 2*emb_dim if not ReGLU.
            ffn_activation_ReGLU = True, # NOTE: set True if ReGLU should be used
            ffn_dropout = 0.1,
            prenormalization = True,
            output_dim = 1,
            last_activation = 'exponential',
            exposure_name = "Exposure",
            last_layer_initial_weights = "zeros",
            last_layer_initial_bias = "ones",
            init_glm_cat_col_weights = glm_cat_col_betas,
            init_glm_nr_col_weights = glm_nr_col_betas,
            init_glm_bias = glm_intercept,
            trainable_glm_emb = False,
            seed_nr = int(random_seeds[run_index])
    )

    # See here regarding costum training loop: https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch

    # Instantiate an optimizer to train the model.
    # ----------------------
    # create an optimizer AdamW with learning rate 1e-4, weight decay 1e-5:
    optimizer = tf.keras.optimizers.AdamW(learning_rate=1e-4, weight_decay=1e-5)

    # Instantiate a loss function
    # ----------------------
    # we use our own loss function here
    # because it is not included in tensorflow in the same way (see section loss function for more details):
    loss_fn = Poisson_loss_for_tf_Wrapped()

    # Prepare the metrics.
    # ----------------------
    # we use a costume metric here (because it is not included in tensorflow in the same way):
    train_acc_metric = Poisson_Metric_for_tf()
    val_acc_metric = Poisson_Metric_for_tf()
    test_acc_metric = Poisson_Metric_for_tf()

    @tf.function
    def train_step(x, y):
        # Open a GradientTape to record the operations run during the forward pass, which enables auto-differentiation.
        with tf.GradientTape() as tape:
            # Run the forward pass of the layer. The operations that the layer applies to its inputs are going to be recorded on the GradientTape.
            y_pred = LocalGLMftt(x, training=True)["output"]  # prediction for this minibatch
            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y, y_pred)
        # Use the gradient tape to automatically retrieve the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, LocalGLMftt.trainable_weights)
        # Run one step of gradient descent by updating the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, LocalGLMftt.trainable_weights))
        # Update training metric.
        train_acc_metric.update_state(y, y_pred)
        return loss_value

    @tf.function
    def val_step(x, y):
        # Run the forward pass of the layer.
        # (note: training=False is needed because the layers have different behavior during training versus inference (e.g. Dropout))
        y_pred = LocalGLMftt(x, training=False)["output"]
        # Update val metrics
        val_acc_metric.update_state(y, y_pred)

    @tf.function
    def test_step(x, y):
        # Run the forward pass of the layer.
        # (note: training=False is needed because the layers have different behavior during training versus inference (e.g. Dropout))
        y_pred = LocalGLMftt(x, training=False)["output"]
        # Update val metrics
        test_acc_metric.update_state(y, y_pred)

    # model fitting:
    # ----------------------
    start_time = time.time()
    Val_Progress = helper.Easy_ProgressTracker(patience=15)
    epochs = 500

    for epoch in range(epochs):
        # Iterate over the batches of the dataset.
        for step, (x_batch_train, y_batch_train) in enumerate(learn_train_data):
            loss_value = train_step(x_batch_train, y_batch_train)
            helper.costume_progress_bar(f"Ensemble: {str(run_index).zfill(2)}/{14} / Epoch: {epoch} / Batch: {step} / Train-Loss (Batch): {round(float(loss_value),4)}",step,len(learn_train_data), 30)

        # Display metrics at the end of each epoch.
        print_train_loss = train_acc_metric.result()
        # Reset training metrics at the end of each epoch
        train_acc_metric.reset_states()

        # Run a validation at the end of each epoch.
        for x_batch_val, y_batch_val in learn_val_data:
            val_step(x_batch_val, y_batch_val)
        print_val_loss = val_acc_metric.result()
        val_acc_metric.reset_states()
        for x_batch_test, y_batch_test in test_data:
            test_step(x_batch_test, y_batch_test)
        print_test_loss = test_acc_metric.result()
        test_acc_metric.reset_states()

        Val_Progress(current_epoch=epoch, current_score = print_val_loss)

        print(f"\nEnsemble: {str(run_index).zfill(2)}/{14} / Epoch: {epoch} / Train-Loss: %.4f / Val-Loss: %.4f / Test-Loss: %.4f / Time taken: %s / ---- Currently Best Val-Epoch: %d" % (
            # str(run_index).zfill(2),
            float(print_train_loss),
            float(print_val_loss),
            float(print_test_loss),
            datetime.timedelta(seconds=int(time.time() - start_time)),
            Val_Progress.best_epoch
            ), end = " ")
        if Val_Progress.progress == True:
            print("<------- Best VAL Epoch so far")
        else:
            print("\r")


        # Callback: save best model / early stopping:
        # ----------------------
        earliest_epoch2save = 10
        if Val_Progress.progress and Val_Progress.current_epoch >= earliest_epoch2save:
            LocalGLMftt.save_weights(f'{storage_path}/saved_models/Poisson_LocalGLMftt_{run_index}.weights.h5')
        if Val_Progress.patience_over:
            break

    # create some metrics after the loop
    best_epoch_LocalGLMftt = Val_Progress.best_epoch
    execution_time_LocalGLMftt = time.time() - start_time

    # load the best saved model and epochs_and_time from the pickle file:
    # ----------------------
    LocalGLMftt.load_weights(f'{storage_path}/saved_models/Poisson_LocalGLMftt_{run_index}.weights.h5')

    # predict with the model:
    # ----------------------
    y_pred["train"][f"LocalGLMftt"] = np.array([x for [x] in LocalGLMftt.predict(learn_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                )["output"]])
    y_pred["test"][f"LocalGLMftt"] = np.array([x for [x] in LocalGLMftt.predict(test_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                )["output"]])

    # evaluate the model:
    # ----------------------
    LocalGLMftt_results = Results(model=f"LocalGLMftt (run: {run_index})",
                                epochs=best_epoch_LocalGLMftt,
                                run_time=execution_time_LocalGLMftt,
                                nr_parameters=[np.sum([np.prod(v.get_shape().as_list()) for v in LocalGLMftt.trainable_weights])],
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"LocalGLMftt"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"LocalGLMftt"]),
                                pred_avg_freq_train=y_pred["train"][f"LocalGLMftt"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"LocalGLMftt"].sum()/exposure["test"].sum())
    # store the results in the dataframe:
    store_results_in_df(LocalGLMftt_results)
    display(df_results)
    # save the results:
    with open(f'{storage_path}/Data/df_results.pickle', 'wb') as handle:
        pickle.dump(df_results, handle, protocol=pickle.HIGHEST_PROTOCOL)



-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 00-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 00/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0333  : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 0 / Train-Loss: 0.2414 / Val-Loss: 0.2455 / Test-Loss: 0.2424 / Time taken: 0:00:44 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0329  : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 1 / Train-Loss: 0.2413 / Val-Loss: 0.2454 / Test-Loss: 0.2422 / Time taken: 0:00:58 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0328  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 00/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 11 / Train-Loss: 0.2397 / Val-Loss: 0.2441 / Test-Loss: 0.2402 / Time taken: 0:03:37 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0323 : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 12 / Train-Loss: 0.2395 / Val-Loss: 0.2439 / Test-Loss: 0.2399 / Time taken: 0:04:00 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0336 : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 13 / Train-Loss: 0.2393 / Val-Loss: 0.2437 / Test-Loss: 0.2397 / Time taken: 0:04:15 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 00/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 00/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
146,CAFTT (run: 11),59,1248.218261,27133,0.237067,0.238216,0.065540,0.065780
147,CAFTT (run: 12),50,1096.780236,27133,0.237433,0.237963,0.065774,0.066086
148,CAFTT (run: 13),41,893.841639,27133,0.237690,0.238301,0.065890,0.066175
149,CAFTT (run: 14),57,1187.882093,27133,0.237231,0.237943,0.066632,0.066834


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 01-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 01/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0075  : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 0 / Train-Loss: 0.2413 / Val-Loss: 0.2461 / Test-Loss: 0.2424 / Time taken: 0:00:30 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0071  : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 1 / Train-Loss: 0.2413 / Val-Loss: 0.2464 / Test-Loss: 0.2426 / Time taken: 0:00:43 / ---- Currently Best Val-Epoch: 0 
Ensemble: 01/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0068  : [##############################] 99.


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 01/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0068 : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 11 / Train-Loss: 0.2399 / Val-Loss: 0.2450 / Test-Loss: 0.2404 / Time taken: 0:03:10 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0069 : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 12 / Train-Loss: 0.2398 / Val-Loss: 0.2448 / Test-Loss: 0.2402 / Time taken: 0:03:25 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0068 : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 13 / Train-Loss: 0.2396 / Val-Loss: 0.2447 / Test-Loss: 0.2400 / Time taken: 0:03:38 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 01/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0068 : [##############################] 99.8%
Ensemble: 01/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
147,CAFTT (run: 12),50,1096.780236,27133,0.237433,0.237963,0.065774,0.066086
148,CAFTT (run: 13),41,893.841639,27133,0.237690,0.238301,0.065890,0.066175
149,CAFTT (run: 14),57,1187.882093,27133,0.237231,0.237943,0.066632,0.066834
150,LocalGLMftt (run: 0),37,886.447916,27430,0.238011,0.238797,0.068197,0.068621


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 02-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 02/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0334  : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 0 / Train-Loss: 0.2419 / Val-Loss: 0.2407 / Test-Loss: 0.2426 / Time taken: 0:00:31 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 1 / Train-Loss: 0.2418 / Val-Loss: 0.2405 / Test-Loss: 0.2422 / Time taken: 0:00:45 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0328  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 02/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 11 / Train-Loss: 0.2399 / Val-Loss: 0.2387 / Test-Loss: 0.2399 / Time taken: 0:03:20 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 12 / Train-Loss: 0.2398 / Val-Loss: 0.2385 / Test-Loss: 0.2398 / Time taken: 0:03:35 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0333 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 13 / Train-Loss: 0.2395 / Val-Loss: 0.2383 / Test-Loss: 0.2396 / Time taken: 0:03:50 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 02/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0333 : [##############################] 99.8%
Ensemble: 02/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
148,CAFTT (run: 13),41,893.841639,27133,0.237690,0.238301,0.065890,0.066175
149,CAFTT (run: 14),57,1187.882093,27133,0.237231,0.237943,0.066632,0.066834
150,LocalGLMftt (run: 0),37,886.447916,27430,0.238011,0.238797,0.068197,0.068621
151,LocalGLMftt (run: 1),42,868.665794,27430,0.237444,0.238771,0.067885,0.068379


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 03-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 03/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0336  : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 0 / Train-Loss: 0.2411 / Val-Loss: 0.2476 / Test-Loss: 0.2424 / Time taken: 0:00:31 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 03/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0333  : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 1 / Train-Loss: 0.2410 / Val-Loss: 0.2477 / Test-Loss: 0.2420 / Time taken: 0:00:54 / ---- Currently Best Val-Epoch: 0 
Ensemble: 03/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.033   : [##############################] 99.


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 03/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0321 : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 11 / Train-Loss: 0.2390 / Val-Loss: 0.2462 / Test-Loss: 0.2401 / Time taken: 0:03:29 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 03/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0321 : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 12 / Train-Loss: 0.2389 / Val-Loss: 0.2461 / Test-Loss: 0.2399 / Time taken: 0:03:51 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 03/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0325 : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 13 / Train-Loss: 0.2387 / Val-Loss: 0.2459 / Test-Loss: 0.2398 / Time taken: 0:04:08 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 03/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0319 : [##############################] 99.8%
Ensemble: 03/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
149,CAFTT (run: 14),57,1187.882093,27133,0.237231,0.237943,0.066632,0.066834
150,LocalGLMftt (run: 0),37,886.447916,27430,0.238011,0.238797,0.068197,0.068621
151,LocalGLMftt (run: 1),42,868.665794,27430,0.237444,0.238771,0.067885,0.068379
152,LocalGLMftt (run: 2),34,842.569307,27430,0.237727,0.238600,0.067902,0.068265


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 04-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 04/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0336  : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 0 / Train-Loss: 0.2422 / Val-Loss: 0.2384 / Test-Loss: 0.2424 / Time taken: 0:00:31 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 1 / Train-Loss: 0.2421 / Val-Loss: 0.2382 / Test-Loss: 0.2420 / Time taken: 0:00:46 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0332  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 04/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0328 : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 11 / Train-Loss: 0.2410 / Val-Loss: 0.2370 / Test-Loss: 0.2406 / Time taken: 0:03:29 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0329 : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 12 / Train-Loss: 0.2408 / Val-Loss: 0.2369 / Test-Loss: 0.2405 / Time taken: 0:03:51 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0326 : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 13 / Train-Loss: 0.2407 / Val-Loss: 0.2367 / Test-Loss: 0.2403 / Time taken: 0:04:08 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 04/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0328 : [##############################] 99.8%
Ensemble: 04/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
150,LocalGLMftt (run: 0),37,886.447916,27430,0.238011,0.238797,0.068197,0.068621
151,LocalGLMftt (run: 1),42,868.665794,27430,0.237444,0.238771,0.067885,0.068379
152,LocalGLMftt (run: 2),34,842.569307,27430,0.237727,0.238600,0.067902,0.068265
153,LocalGLMftt (run: 3),62,1377.378268,27430,0.237049,0.239091,0.067171,0.067539


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 05-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 05/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0337  : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 0 / Train-Loss: 0.2420 / Val-Loss: 0.2395 / Test-Loss: 0.2423 / Time taken: 0:00:45 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 05/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 1 / Train-Loss: 0.2419 / Val-Loss: 0.2394 / Test-Loss: 0.2420 / Time taken: 0:00:59 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 05/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.033   : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 05/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 11 / Train-Loss: 0.2404 / Val-Loss: 0.2388 / Test-Loss: 0.2405 / Time taken: 0:03:47 / ---- Currently Best Val-Epoch: 10 
Ensemble: 05/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0328 : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 12 / Train-Loss: 0.2403 / Val-Loss: 0.2388 / Test-Loss: 0.2406 / Time taken: 0:04:02 / ---- Currently Best Val-Epoch: 10 
Ensemble: 05/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 13 / Train-Loss: 0.2402 / Val-Loss: 0.2386 / Test-Loss: 0.2402 / Time taken: 0:04:16 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 05/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 05/14 / Epoch: 14 / Train-Loss: 0.2401 / Val-Loss: 0.2386 / Test-Loss: 0.24

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
151,LocalGLMftt (run: 1),42,868.665794,27430,0.237444,0.238771,0.067885,0.068379
152,LocalGLMftt (run: 2),34,842.569307,27430,0.237727,0.238600,0.067902,0.068265
153,LocalGLMftt (run: 3),62,1377.378268,27430,0.237049,0.239091,0.067171,0.067539
154,LocalGLMftt (run: 4),52,1172.448311,27430,0.237175,0.238773,0.067236,0.067723


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 06-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 06/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0338  : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 0 / Train-Loss: 0.2418 / Val-Loss: 0.2416 / Test-Loss: 0.2422 / Time taken: 0:00:34 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0334  : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 1 / Train-Loss: 0.2417 / Val-Loss: 0.2414 / Test-Loss: 0.2420 / Time taken: 0:00:49 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0331  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 06/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0329 : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 11 / Train-Loss: 0.2402 / Val-Loss: 0.2398 / Test-Loss: 0.2403 / Time taken: 0:03:51 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 12 / Train-Loss: 0.2401 / Val-Loss: 0.2399 / Test-Loss: 0.2403 / Time taken: 0:04:08 / ---- Currently Best Val-Epoch: 11 
Ensemble: 06/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 13 / Train-Loss: 0.2399 / Val-Loss: 0.2398 / Test-Loss: 0.2401 / Time taken: 0:04:24 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 06/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0324 : [##############################] 99.8%
Ensemble: 06/14 / Epoch: 14 / Train-Loss: 0.2398 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
152,LocalGLMftt (run: 2),34,842.569307,27430,0.237727,0.238600,0.067902,0.068265
153,LocalGLMftt (run: 3),62,1377.378268,27430,0.237049,0.239091,0.067171,0.067539
154,LocalGLMftt (run: 4),52,1172.448311,27430,0.237175,0.238773,0.067236,0.067723
155,LocalGLMftt (run: 5),58,1289.168909,27430,0.237039,0.238749,0.068158,0.068611


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 07-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 07/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0334  : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 0 / Train-Loss: 0.2414 / Val-Loss: 0.2456 / Test-Loss: 0.2424 / Time taken: 0:00:34 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0328  : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 1 / Train-Loss: 0.2412 / Val-Loss: 0.2456 / Test-Loss: 0.2422 / Time taken: 0:00:49 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0327  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 07/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0333 : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 12 / Train-Loss: 0.2393 / Val-Loss: 0.2446 / Test-Loss: 0.2399 / Time taken: 0:04:05 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 13 / Train-Loss: 0.2393 / Val-Loss: 0.2445 / Test-Loss: 0.2398 / Time taken: 0:04:20 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0329 : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 14 / Train-Loss: 0.2391 / Val-Loss: 0.2445 / Test-Loss: 0.2398 / Time taken: 0:04:34 / ---- Currently Best Val-Epoch: 14 <------- Best VAL Epoch so far
Ensemble: 07/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 07/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
153,LocalGLMftt (run: 3),62,1377.378268,27430,0.237049,0.239091,0.067171,0.067539
154,LocalGLMftt (run: 4),52,1172.448311,27430,0.237175,0.238773,0.067236,0.067723
155,LocalGLMftt (run: 5),58,1289.168909,27430,0.237039,0.238749,0.068158,0.068611
156,LocalGLMftt (run: 6),81,1688.195204,27430,0.236139,0.238735,0.066555,0.066987


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 08-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 08/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.034   : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 0 / Train-Loss: 0.2409 / Val-Loss: 0.2500 / Test-Loss: 0.2423 / Time taken: 0:00:31 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0334  : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 1 / Train-Loss: 0.2408 / Val-Loss: 0.2500 / Test-Loss: 0.2420 / Time taken: 0:00:45 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0333  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 08/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0325 : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 11 / Train-Loss: 0.2389 / Val-Loss: 0.2482 / Test-Loss: 0.2398 / Time taken: 0:03:19 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0325 : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 12 / Train-Loss: 0.2388 / Val-Loss: 0.2481 / Test-Loss: 0.2398 / Time taken: 0:03:33 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0333 : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 13 / Train-Loss: 0.2386 / Val-Loss: 0.2478 / Test-Loss: 0.2395 / Time taken: 0:03:48 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 08/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 08/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
154,LocalGLMftt (run: 4),52,1172.448311,27430,0.237175,0.238773,0.067236,0.067723
155,LocalGLMftt (run: 5),58,1289.168909,27430,0.237039,0.238749,0.068158,0.068611
156,LocalGLMftt (run: 6),81,1688.195204,27430,0.236139,0.238735,0.066555,0.066987
157,LocalGLMftt (run: 7),39,938.974456,27430,0.237802,0.238847,0.068716,0.069133


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 09-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 09/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0075  : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 0 / Train-Loss: 0.2413 / Val-Loss: 0.2462 / Test-Loss: 0.2424 / Time taken: 0:00:30 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 09/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.007   : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 1 / Train-Loss: 0.2412 / Val-Loss: 0.2461 / Test-Loss: 0.2422 / Time taken: 0:00:44 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 09/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0069  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 09/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0068 : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 11 / Train-Loss: 0.2395 / Val-Loss: 0.2448 / Test-Loss: 0.2402 / Time taken: 0:03:23 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 09/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0069 : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 12 / Train-Loss: 0.2393 / Val-Loss: 0.2446 / Test-Loss: 0.2399 / Time taken: 0:03:38 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 09/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0068 : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 13 / Train-Loss: 0.2391 / Val-Loss: 0.2447 / Test-Loss: 0.2399 / Time taken: 0:03:53 / ---- Currently Best Val-Epoch: 12 
Ensemble: 09/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0068 : [##############################] 99.8%
Ensemble: 09/14 / Epoch: 14 / Train-Loss: 0.2390 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
155,LocalGLMftt (run: 5),58,1289.168909,27430,0.237039,0.238749,0.068158,0.068611
156,LocalGLMftt (run: 6),81,1688.195204,27430,0.236139,0.238735,0.066555,0.066987
157,LocalGLMftt (run: 7),39,938.974456,27430,0.237802,0.238847,0.068716,0.069133
158,LocalGLMftt (run: 8),42,951.956478,27430,0.237784,0.238886,0.068275,0.068710


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 10-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 10/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0334  : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 0 / Train-Loss: 0.2417 / Val-Loss: 0.2423 / Test-Loss: 0.2423 / Time taken: 0:00:31 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.033   : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 1 / Train-Loss: 0.2416 / Val-Loss: 0.2421 / Test-Loss: 0.2421 / Time taken: 0:00:46 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0328  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 10/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0329 : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 12 / Train-Loss: 0.2401 / Val-Loss: 0.2403 / Test-Loss: 0.2402 / Time taken: 0:03:36 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0332 : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 13 / Train-Loss: 0.2399 / Val-Loss: 0.2401 / Test-Loss: 0.2400 / Time taken: 0:03:51 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 10/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0329 : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 14 / Train-Loss: 0.2397 / Val-Loss: 0.2402 / Test-Loss: 0.2400 / Time taken: 0:04:06 / ---- Currently Best Val-Epoch: 13 
Ensemble: 10/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.0328 : [##############################] 99.8%
Ensemble: 10/14 / Epoch: 15 / Train-Loss: 0.2395 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
156,LocalGLMftt (run: 6),81,1688.195204,27430,0.236139,0.238735,0.066555,0.066987
157,LocalGLMftt (run: 7),39,938.974456,27430,0.237802,0.238847,0.068716,0.069133
158,LocalGLMftt (run: 8),42,951.956478,27430,0.237784,0.238886,0.068275,0.068710
159,LocalGLMftt (run: 9),72,1377.468203,27430,0.236593,0.239080,0.066974,0.067353


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 11-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 11/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0336  : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 0 / Train-Loss: 0.2427 / Val-Loss: 0.2339 / Test-Loss: 0.2424 / Time taken: 0:00:45 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0332  : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 1 / Train-Loss: 0.2426 / Val-Loss: 0.2336 / Test-Loss: 0.2421 / Time taken: 0:01:00 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0332  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 11/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 11 / Train-Loss: 0.2405 / Val-Loss: 0.2317 / Test-Loss: 0.2398 / Time taken: 0:03:28 / ---- Currently Best Val-Epoch: 11 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 12 / Train-Loss: 0.2403 / Val-Loss: 0.2315 / Test-Loss: 0.2396 / Time taken: 0:03:43 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 13 / Train-Loss: 0.2403 / Val-Loss: 0.2312 / Test-Loss: 0.2395 / Time taken: 0:03:59 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 11/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0331 : [##############################] 99.8%
Ensemble: 11/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
157,LocalGLMftt (run: 7),39,938.974456,27430,0.237802,0.238847,0.068716,0.069133
158,LocalGLMftt (run: 8),42,951.956478,27430,0.237784,0.238886,0.068275,0.068710
159,LocalGLMftt (run: 9),72,1377.468203,27430,0.236593,0.239080,0.066974,0.067353
160,LocalGLMftt (run: 10),40,923.518332,27430,0.237557,0.238749,0.068609,0.068993


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 12-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 12/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0335  : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 0 / Train-Loss: 0.2417 / Val-Loss: 0.2423 / Test-Loss: 0.2424 / Time taken: 0:00:31 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0331  : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 1 / Train-Loss: 0.2415 / Val-Loss: 0.2422 / Test-Loss: 0.2421 / Time taken: 0:00:47 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.033   : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 12/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0321 : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 12 / Train-Loss: 0.2395 / Val-Loss: 0.2406 / Test-Loss: 0.2401 / Time taken: 0:03:59 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0326 : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 13 / Train-Loss: 0.2394 / Val-Loss: 0.2404 / Test-Loss: 0.2398 / Time taken: 0:04:22 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 14 / Train-Loss: 0.2392 / Val-Loss: 0.2404 / Test-Loss: 0.2397 / Time taken: 0:04:44 / ---- Currently Best Val-Epoch: 14 <------- Best VAL Epoch so far
Ensemble: 12/14 / Epoch: 15 / Batch: 536 / Train-Loss (Batch): 0.0322 : [##############################] 99.8%
Ensemble: 12/14 / Epoch: 

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
158,LocalGLMftt (run: 8),42,951.956478,27430,0.237784,0.238886,0.068275,0.068710
159,LocalGLMftt (run: 9),72,1377.468203,27430,0.236593,0.239080,0.066974,0.067353
160,LocalGLMftt (run: 10),40,923.518332,27430,0.237557,0.238749,0.068609,0.068993
161,LocalGLMftt (run: 11),40,986.134145,27430,0.237617,0.238476,0.068948,0.069450


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 13-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 13/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.034   : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 0 / Train-Loss: 0.2419 / Val-Loss: 0.2408 / Test-Loss: 0.2422 / Time taken: 0:00:43 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0333  : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 1 / Train-Loss: 0.2419 / Val-Loss: 0.2408 / Test-Loss: 0.2423 / Time taken: 0:00:59 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0332  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 13/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 11 / Train-Loss: 0.2402 / Val-Loss: 0.2398 / Test-Loss: 0.2405 / Time taken: 0:04:02 / ---- Currently Best Val-Epoch: 10 
Ensemble: 13/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 12 / Train-Loss: 0.2401 / Val-Loss: 0.2396 / Test-Loss: 0.2402 / Time taken: 0:04:18 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.0328 : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 13 / Train-Loss: 0.2400 / Val-Loss: 0.2396 / Test-Loss: 0.2401 / Time taken: 0:04:33 / ---- Currently Best Val-Epoch: 13 <------- Best VAL Epoch so far
Ensemble: 13/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0326 : [##############################] 99.8%
Ensemble: 13/14 / Epoch: 14 / Train-Loss: 0.2398 / Val-

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
159,LocalGLMftt (run: 9),72,1377.468203,27430,0.236593,0.239080,0.066974,0.067353
160,LocalGLMftt (run: 10),40,923.518332,27430,0.237557,0.238749,0.068609,0.068993
161,LocalGLMftt (run: 11),40,986.134145,27430,0.237617,0.238476,0.068948,0.069450
162,LocalGLMftt (run: 12),56,1313.066551,27430,0.237276,0.238862,0.067985,0.068371


-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
-------We are at Model: 14-----------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
Ensemble: 14/14 / Epoch: 0 / Batch: 536 / Train-Loss (Batch): 0.0341  : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 0 / Train-Loss: 0.2420 / Val-Loss: 0.2399 / Test-Loss: 0.2423 / Time taken: 0:00:34 / ---- Currently Best Val-Epoch: 0 <------- Best VAL Epoch so far
Ensemble: 14/14 / Epoch: 1 / Batch: 536 / Train-Loss (Batch): 0.0337  : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 1 / Train-Loss: 0.2419 / Val-Loss: 0.2396 / Test-Loss: 0.2421 / Time taken: 0:00:56 / ---- Currently Best Val-Epoch: 1 <------- Best VAL Epoch so far
Ensemble: 14/14 / Epoch: 2 / Batch: 536 / Train-Loss (Batch): 0.0335  : [#####


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Ensemble: 14/14 / Epoch: 11 / Batch: 536 / Train-Loss (Batch): 0.0327 : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 11 / Train-Loss: 0.2400 / Val-Loss: 0.2380 / Test-Loss: 0.2404 / Time taken: 0:03:52 / ---- Currently Best Val-Epoch: 10 
Ensemble: 14/14 / Epoch: 12 / Batch: 536 / Train-Loss (Batch): 0.0329 : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 12 / Train-Loss: 0.2398 / Val-Loss: 0.2375 / Test-Loss: 0.2399 / Time taken: 0:04:14 / ---- Currently Best Val-Epoch: 12 <------- Best VAL Epoch so far
Ensemble: 14/14 / Epoch: 13 / Batch: 536 / Train-Loss (Batch): 0.033  : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 13 / Train-Loss: 0.2397 / Val-Loss: 0.2378 / Test-Loss: 0.2402 / Time taken: 0:04:31 / ---- Currently Best Val-Epoch: 12 
Ensemble: 14/14 / Epoch: 14 / Batch: 536 / Train-Loss (Batch): 0.0329 : [##############################] 99.8%
Ensemble: 14/14 / Epoch: 14 / Train-Loss: 0.2395 / Val-Loss: 0.2373 / Test-Loss: 0.23

Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model (run: 0),0,0.025613,1,0.252132,0.254454,0.073631,0.073631
1,homogeneous model (run: 1),0,0.023689,1,0.252132,0.254454,0.073631,0.073631
2,homogeneous model (run: 2),0,0.026122,1,0.252132,0.254454,0.073631,0.073631
3,homogeneous model (run: 3),0,0.023556,1,0.252132,0.254454,0.073631,0.073631
4,homogeneous model (run: 4),0,0.024540,1,0.252132,0.254454,0.073631,0.073631
...,...,...,...,...,...,...,...,...
160,LocalGLMftt (run: 10),40,923.518332,27430,0.237557,0.238749,0.068609,0.068993
161,LocalGLMftt (run: 11),40,986.134145,27430,0.237617,0.238476,0.068948,0.069450
162,LocalGLMftt (run: 12),56,1313.066551,27430,0.237276,0.238862,0.067985,0.068371
163,LocalGLMftt (run: 13),86,1851.208139,27430,0.236052,0.238693,0.066132,0.066383


In [None]:
# save the results:
# with open(f'{storage_path}/Data/df_results.pickle', 'wb') as handle:
#     pickle.dump(df_results, handle, protocol=pickle.HIGHEST_PROTOCOL)

# load the results:
with open(f'{storage_path}/Data/df_results.pickle', 'rb') as handle:
    df_results = pickle.load(handle)

In [45]:
print("Results Average:")
display(calc_avg_df(["homogeneous model","GLM1","GLM2","GLM3","FFN_OHE","FNN_CAT_EMB","CANN","LocalGLMnet","FT_transformer","CAFTT","LocalGLMftt"]))
print("Results Standard-Deviation:")
display(calc_std_df(["homogeneous model","GLM1","GLM2","GLM3","FFN_OHE","FNN_CAT_EMB","CANN","LocalGLMnet","FT_transformer","CAFTT","LocalGLMftt"]))

Results Average:


Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model,0.0,0.054847,1.0,0.252132,0.254454,0.073631,0.073631
1,GLM1,0.0,2.220158,49.0,0.241015,0.241463,0.073631,0.0739
2,GLM2,0.0,2.752693,48.0,0.240911,0.241125,0.073631,0.073981
3,GLM3,0.0,1.900497,50.0,0.240844,0.241022,0.073631,0.074048
4,FFN_OHE,42.2,37.80556,1306.0,0.237535,0.238652,0.073906,0.07431
5,FNN_CAT_EMB,72.933333,58.728892,792.0,0.237682,0.238267,0.073774,0.074238
6,CANN,90.333333,68.55912,792.0,0.23742,0.238102,0.074019,0.074438
7,LocalGLMnet,25.333333,29.720892,1737.0,0.237095,0.239211,0.073825,0.074267
8,FT_transformer,78.866667,1569.86041,27133.0,0.237803,0.239389,0.06114,0.06129
9,CAFTT,57.333333,1170.160723,27133.0,0.237146,0.238072,0.065975,0.066235


Results Standard-Deviation:


Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model,0.0,0.002512,0.0,0.0,5.74595e-17,2.872975e-17,2.872975e-17
1,GLM1,0.0,0.473819,0.0,5.74595e-17,5.74595e-17,1.436488e-17,0.0
2,GLM2,0.0,0.970156,0.0,5.74595e-17,2.872975e-17,0.0,1.436488e-17
3,GLM3,0.0,0.408252,0.0,2.872975e-17,2.872975e-17,0.0,1.436488e-17
4,FFN_OHE,14.663853,8.624492,0.0,0.0003255191,0.0001570462,0.001223993,0.001209107
5,FNN_CAT_EMB,21.661245,13.907265,0.0,0.0001590947,0.0001514444,0.001071399,0.001088943
6,CANN,53.898935,33.162059,0.0,0.0006076588,0.0003253586,0.001111365,0.001103431
7,LocalGLMnet,7.622023,5.097405,0.0,0.000334063,0.0002176521,0.0008787938,0.0009078453
8,FT_transformer,16.638452,302.316624,0.0,0.000898291,0.0005281384,0.001459836,0.001458034
9,CAFTT,14.185841,220.449926,0.0,0.0004739992,0.0001743234,0.0004993066,0.0004707813


# 5. Ensemble Models

## 5.1 Ensembles FFN OHE:

In [None]:
# Create the dataframes needed for evaluation:
data_nn_ohe_learn, y_true_learn = create_ffn_ohe_data(bool_in_learn)
data_nn_ohe_test, y_true_test = create_ffn_ohe_data(bool_in_test)

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        print(f"Model: {run_index}")
        # Define FNN Model:
        # ----------------------
        # note we use here the function api instead of the model subclassing
        # to make the code more readable and easier to understand:
        # (for the transformer based models we will use model subclasses)
        def Create_Poisson_FFN_OHE(input_dim=42,mean_model_results=1):
            # set random seeds
            set_random_seeds(int(random_seeds[run_index]))
            # Build the network
            Input_Matrix_OHE = tf.keras.layers.Input(shape=(input_dim,), dtype='float32', name='Input_Matrix')
            Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')
            hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(Input_Matrix_OHE)
            hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
            hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
            Result_FFN1 = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_FFN1',
                            weights=[np.zeros((10, 1)), np.array([np.log(mean_model_results)])],
                            trainable=True)(hidden3)
            Response = tf.keras.layers.Multiply(name='Result')([Result_FFN1, Input_Exposure])
            # Define and Return the model
            return tf.keras.models.Model(inputs=[Input_Matrix_OHE, Input_Exposure], outputs=[Response], name='Poisson_FFN_OHE')

        # create the model:
        # ----------------------
        FFN_OHE = Create_Poisson_FFN_OHE(input_dim=40,mean_model_results=constant_model)

        # load the saved model weights:
        # ----------------------
        FFN_OHE.load_weights(f'{storage_path}/saved_models/Poisson_FFN_OHE_{run_index}.weights.h5')


        # predict with the models:
        # ----------------------
        y_pred["train"][f"FFN_OHE_{run_index}"] = np.array([x for [x] in FFN_OHE.predict(data_nn_ohe_learn, verbose=0,
                                                                            batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
        y_pred["test"][f"FFN_OHE_{run_index}"] = np.array([x for [x] in FFN_OHE.predict(data_nn_ohe_test, verbose=0,
                                                                            batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])


    # evaluate the models:
    # ----------------------
    # store the results in the results class:
    y_pred["train"][f"Ensemble_FFN_OHE_{index}"] = np.mean([y_pred["train"][f"FFN_OHE_{i}"] for i in ensemble_range], axis=0)
    y_pred["test"][f"Ensemble_FFN_OHE_{index}"] = np.mean([y_pred["test"][f"FFN_OHE_{i}"] for i in ensemble_range], axis=0)

    Ensemble_FFN_OHE_results = Results(model=f"Ensemble_FFN_OHE (run: {index})",
                                epochs=0,
                                run_time=0,
                                nr_parameters=0,
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"Ensemble_FFN_OHE_{index}"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"Ensemble_FFN_OHE_{index}"]),
                                pred_avg_freq_train=y_pred["train"][f"Ensemble_FFN_OHE_{index}"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"Ensemble_FFN_OHE_{index}"].sum()/exposure["test"].sum())

    # # store the results in the result-dataframe:
    store_results_in_df(Ensemble_FFN_OHE_results)

# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_ohe_learn, data_nn_ohe_test

range(0, 5)
Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
range(5, 10)
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
range(10, 15)
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


## 5.2 Ensembles FFN CAT EMB:

In [None]:
# Create the dataframes needed for evaluation:
data_nn_emb_learn, y_true_learn = create_ffn_cat_emb_data(bool_in_learn)
data_nn_emb_test, y_true_test = create_ffn_cat_emb_data(bool_in_test)


for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        # Define FNN with Cat. Embedding Model:
        # ----------------------
        # note we use here the function api instead of the model subclassing
        # to make the code more readable and easier to understand:
        # (for the transformer based models we will use model subclasses)
        print(f"Model: {run_index}")
        def Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=1,mean_model_results=1):
            # set random seeds
            set_random_seeds(int(random_seeds[run_index]))

            Input_Matrix_Num = tf.keras.layers.Input(shape=(input_nr_dim,), dtype='float32', name='Input_Matrix_Num')
            Input_VehBrand = tf.keras.layers.Input(shape=(1,), name='Input_VehBrand')
            Input_Region = tf.keras.layers.Input(shape=(1,), name='Input_Region')
            Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')

            All_Inputs = [Input_Matrix_Num,Input_VehBrand,Input_Region,Input_Exposure]

            Emb_VehBrand = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["VehBrand"].keys()),output_dim=emb_dim,
                                    embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05 ),
                                    name="Embedding_VehBrand")(Input_VehBrand)
            Emb_Region = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["Region"].keys()),output_dim=emb_dim,
                                    embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05),
                                name="Embedding_Region")(Input_Region)

            Reshaped_Emb_VehBrand = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_VehBrand")(Emb_VehBrand)
            Reshaped_Emb_Region = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_Region")(Emb_Region)

            concatenation_layer = tf.keras.layers.Concatenate(name="concatenation_layer")([Input_Matrix_Num,Reshaped_Emb_VehBrand,Reshaped_Emb_Region])

            # Build the network
            hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(concatenation_layer)
            hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
            hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
            Result_FFN1 = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_FFN1',
                            weights=[np.zeros((10, 1)), np.array([np.log(mean_model_results)])],
                            trainable=True)(hidden3)

            Response = tf.keras.layers.Multiply(name='Result')([Result_FFN1, Input_Exposure])

            # Define the model
            return tf.keras.models.Model(inputs=All_Inputs, outputs=[Response], name='Poisson_CAT_EMB')

        # create the model:
        # ----------------------
        emb_dim=2
        FNN_CAT_EMB = Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=emb_dim,mean_model_results=constant_model)

        # load the saved model weights:
        # ----------------------
        FNN_CAT_EMB.load_weights(f'{storage_path}/saved_models/Poisson_FNN_CAT_EMB_{run_index}.weights.h5')

        # predict with the model:
        # ----------------------
        y_pred["train"][f"FNN_CAT_EMB_{run_index}"] = np.array([x for [x] in FNN_CAT_EMB.predict(data_nn_emb_learn, verbose=0,
                                                                            batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
        y_pred["test"][f"FNN_CAT_EMB_{run_index}"] = np.array([x for [x] in FNN_CAT_EMB.predict(data_nn_emb_test, verbose=0,
                                                                        batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])


    # evaluate the models:
    # ----------------------
    # store the results in the results class:
    y_pred["train"][f"Ensemble_FNN_CAT_EMB_{index}"] = np.mean([y_pred["train"][f"FNN_CAT_EMB_{i}"] for i in ensemble_range], axis=0)
    y_pred["test"][f"Ensemble_FNN_CAT_EMB_{index}"] = np.mean([y_pred["test"][f"FNN_CAT_EMB_{i}"] for i in ensemble_range], axis=0)

    Ensemble_FNN_CAT_EMB_results = Results(model=f"Ensemble_FNN_CAT_EMB (run: {index})",
                                epochs=0,
                                run_time=0,
                                nr_parameters=0,
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"Ensemble_FNN_CAT_EMB_{index}"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"Ensemble_FNN_CAT_EMB_{index}"]),
                                pred_avg_freq_train=y_pred["train"][f"Ensemble_FNN_CAT_EMB_{index}"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"Ensemble_FNN_CAT_EMB_{index}"].sum()/exposure["test"].sum())

    # # store the results in the result-dataframe:
    store_results_in_df(Ensemble_FNN_CAT_EMB_results)

# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_emb_learn, data_nn_emb_test

range(0, 5)
Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
range(5, 10)
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
range(10, 15)
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


## 5.3 Ensembles CANN:

In [None]:
# create the new exposure times GLM3_pred column for CANN models.
df_freq_prep_nn["Exposure_x_GLM3_pred"] = list(poisson_glm3.predict(X_glm3)*df_freq_prep_nn["Exposure"])

# Create the dataframes needed for evaluation:
data_nn_emb_learn, y_true_learn = create_ffn_cat_emb_data(bool_in_learn, exposure_name = "Exposure_x_GLM3_pred")
data_nn_emb_test, y_true_test = create_ffn_cat_emb_data(bool_in_test, exposure_name = "Exposure_x_GLM3_pred")

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        # Define FNN with Cat. Embedding Model:
        # ----------------------
        # note we use here the function api instead of the model subclassing
        # to make the code more readable and easier to understand:
        # (for the transformer based models we will use model subclasses)
        print(f"Model: {run_index}")
        def Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=1,mean_model_results=1):
            # set random seeds
            set_random_seeds(int(random_seeds[run_index]))

            Input_Matrix_Num = tf.keras.layers.Input(shape=(input_nr_dim,), dtype='float32', name='Input_Matrix_Num')
            Input_VehBrand = tf.keras.layers.Input(shape=(1,), name='Input_VehBrand')
            Input_Region = tf.keras.layers.Input(shape=(1,), name='Input_Region')
            Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')

            All_Inputs = [Input_Matrix_Num,Input_VehBrand,Input_Region,Input_Exposure]

            Emb_VehBrand = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["VehBrand"].keys()),output_dim=emb_dim,
                                    embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05 ),
                                    name="Embedding_VehBrand")(Input_VehBrand)
            Emb_Region = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["Region"].keys()),output_dim=emb_dim,
                                    embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05),
                                name="Embedding_Region")(Input_Region)

            Reshaped_Emb_VehBrand = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_VehBrand")(Emb_VehBrand)
            Reshaped_Emb_Region = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_Region")(Emb_Region)

            concatenation_layer = tf.keras.layers.Concatenate(name="concatenation_layer")([Input_Matrix_Num,Reshaped_Emb_VehBrand,Reshaped_Emb_Region])

            # Build the network
            hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(concatenation_layer)
            hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
            hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
            Result_FFN1 = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_FFN1',
                            weights=[np.zeros((10, 1)), np.array([0])],
                            trainable=True)(hidden3)

            Response = tf.keras.layers.Multiply(name='Result')([Result_FFN1, Input_Exposure])

            # Define the model
            return tf.keras.models.Model(inputs=All_Inputs, outputs=[Response], name='Poisson_CAT_EMB')

        # create the model:
        # ----------------------
        emb_dim=2
        CANN = Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=emb_dim,mean_model_results=constant_model)

        # load the saved model weights:
        # ----------------------
        CANN.load_weights(f'{storage_path}/saved_models/Poisson_CANN_{run_index}.weights.h5')


        # predict with the model:
        # ----------------------
        y_pred["train"][f"CANN_{run_index}"] = np.array([x for [x] in CANN.predict(data_nn_emb_learn, verbose=0,
                                                                            batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
        y_pred["test"][f"CANN_{run_index}"] = np.array([x for [x] in CANN.predict(data_nn_emb_test, verbose=0,
                                                                        batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])


    # evaluate the models:
    # ----------------------
    # store the results in the results class:
    y_pred["train"][f"Ensemble_CANN_{index}"] = np.mean([y_pred["train"][f"CANN_{i}"] for i in ensemble_range], axis=0)
    y_pred["test"][f"Ensemble_CANN_{index}"] = np.mean([y_pred["test"][f"CANN_{i}"] for i in ensemble_range], axis=0)

    Ensemble_CANN_results = Results(model=f"Ensemble_CANN (run: {index})",
                                epochs=0,
                                run_time=0,
                                nr_parameters=0,
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"Ensemble_CANN_{index}"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"Ensemble_CANN_{index}"]),
                                pred_avg_freq_train=y_pred["train"][f"Ensemble_CANN_{index}"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"Ensemble_CANN_{index}"].sum()/exposure["test"].sum())

    # # store the results in the result-dataframe:
    store_results_in_df(Ensemble_CANN_results)

# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_emb_learn, data_nn_emb_test


range(0, 5)
Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
range(5, 10)
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
range(10, 15)
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


## 5.4 Ensembles LocalGLMnet:

In [None]:
# Create the dataframes needed for evaluation:
data_nn_ohe_learn, y_true_learn = create_ffn_ohe_data(bool_in_learn)
data_nn_ohe_test, y_true_test = create_ffn_ohe_data(bool_in_test)

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        print(f"Model: {run_index}")
        # Define FNN with Cat. Embedding Model:
        # ----------------------
        # note we use here the function api instead of the model subclassing
        # to make the code more readable and easier to understand:
        # (for the transformer based models we will use model subclasses)

        # create dummy glm for initial weights
        # ----------------------
        poisson_glm_dummy = PoissonRegressor(alpha = 0,max_iter=1000) # scikit-learn.org: alpha = 0 is equivalent to unpenalized GLMs
        poisson_glm_dummy.fit(data_nn_ohe_learn[0],y_true_learn/data_nn_ohe_learn[1],sample_weight=data_nn_ohe_learn[1]) # note: data_nn_ohe_learn = [X_ohe,exposure]

        # Define LocalGLMnet:
        # ----------------------
        def Create_Poisson_LocalGLMnet(input_dim=40,initial_glm_bias=1, initial_glm_betas=None):
            # set random seeds
            set_random_seeds(int(random_seeds[run_index]))
            Input_Matrix_OHE = tf.keras.layers.Input(shape=(input_dim,), dtype='float32', name='Input_Matrix')
            Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')
            # Build the network
            hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(Input_Matrix_OHE)
            hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
            hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
            Attention = tf.keras.layers.Dense(units=input_dim, activation='linear', name='attention',
                            weights=[np.zeros((10, input_dim)), initial_glm_betas])(hidden3)
            # note that the weights are set to 0 and the bias is set to the initial glm betas
            # create a layer that calculates the dot product between the attention weights (Attention) and the input matrix Input_Matrix_OHE:
            # (Attention has the same dimension as the input matrix Input_Matrix_OHE):
            weighted_input = tf.keras.layers.Multiply(name='feature_contributions')([Attention, Input_Matrix_OHE])
            scalar_product = tf.keras.layers.Dense(units=1, activation='linear', name='scalar_product',
                                weights=[np.ones((input_dim, 1)), np.array([0])],
                                trainable=False)(weighted_input)
            # Note that we actually don't want to make the following weights trainable,
            # but to get the bias to be trainable we need to do so. see comment in Book Wüthrich & Merz (2023) page 500
            Result_LocalGLMnet_without_Exposure = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_LocalGLMnet_without_Exposure',
                            weights=[np.ones((1, 1)), np.array([initial_glm_bias])],
                            trainable=True)(scalar_product)
            Response = tf.keras.layers.Multiply(name='Result')([Result_LocalGLMnet_without_Exposure, Input_Exposure])
            return tf.keras.models.Model(inputs=[Input_Matrix_OHE, Input_Exposure], outputs=[Response], name='Poisson_LocalGLMnet')

        # create the model:
        # ----------------------
        LocalGLMnet = Create_Poisson_LocalGLMnet(input_dim=40,initial_glm_bias=poisson_glm_dummy.intercept_,initial_glm_betas=poisson_glm_dummy.coef_)

        # load the saved model weights:
        # ----------------------
        LocalGLMnet.load_weights(f'{storage_path}/saved_models/Poisson_LocalGLMnet_{run_index}.weights.h5')

        # predict with the model:
        # ----------------------
        y_pred["train"][f"LocalGLMnet_{run_index}"] = np.array([x for [x] in LocalGLMnet.predict(data_nn_ohe_learn, verbose=0,
                                                                            batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
        y_pred["test"][f"LocalGLMnet_{run_index}"] = np.array([x for [x] in LocalGLMnet.predict(data_nn_ohe_test, verbose=0,
                                                                        batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])


    # evaluate the models:
    # ----------------------
    # store the results in the results class:
    y_pred["train"][f"Ensemble_LocalGLMnet_{index}"] = np.mean([y_pred["train"][f"LocalGLMnet_{i}"] for i in ensemble_range], axis=0)
    y_pred["test"][f"Ensemble_LocalGLMnet_{index}"] = np.mean([y_pred["test"][f"LocalGLMnet_{i}"] for i in ensemble_range], axis=0)

    Ensemble_LocalGLMnet_results = Results(model=f"Ensemble_LocalGLMnet (run: {index})",
                                epochs=0,
                                run_time=0,
                                nr_parameters=0,
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"Ensemble_LocalGLMnet_{index}"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"Ensemble_LocalGLMnet_{index}"]),
                                pred_avg_freq_train=y_pred["train"][f"Ensemble_LocalGLMnet_{index}"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"Ensemble_LocalGLMnet_{index}"].sum()/exposure["test"].sum())

    # # store the results in the result-dataframe:
    store_results_in_df(Ensemble_LocalGLMnet_results)

# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_ohe_learn, data_nn_ohe_test


range(0, 5)
Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
range(5, 10)
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
range(10, 15)
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


## 5.5 Ensembles FT-Transformer:

In [None]:
# Create the dataframes needed for evaluation:
# --------------------
# NOTE: in the 2021 Gorishniy paper the batch size is different for the different Datasets
# but is not hyperparameter tuned. Bigger datasets they used a batch size of 1024 and
# for smaller datasets a batch size of (256/512).
batch_size = 1024
learn_data = df_to_tensor(df_freq_prep_nn[bool_in_learn], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)
test_data = df_to_tensor(df_freq_prep_nn[bool_in_test], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)

# NOTE we use at first just a fraction of the data to test the code:
learn_train_dummy_data = df_to_tensor(df_freq_prep_nn[bool_in_learn_train_dummy], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size,
                                      dummy_data_for_build=True)

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        print(f"Model: {run_index}")
        # Define FT-Transformer Models:
        # ----------------------
        # NOTE: we use here tensorflow/keras model subclasses (not the functional or sequential api)
        # NOTE: we use here instead of the .fit function a costum training loop

        # create the model:
        # ----------------------
        set_random_seeds(int(random_seeds[run_index]))

        FT_transformer = EnhActuar.Feature_Tokenizer_Transformer(
                emb_dim = 32, # NOTE: In the default setting for the 2021 Gorishniy paper they used emb_dim = 192 (but the parameter size would here go trough the roof, so we use something smaller)
                nr_features = nr_col,
                cat_features = cat_col,
                cat_vocabulary = cat_vocabulary,
                count_transformer_blocks = 3,
                attention_n_heads = 8,
                attention_dropout = 0.2,
                ffn_d_hidden = None, # NOTE: change to None if ReGLU should be used -> None uses default value (4/3*emb_dim), they write that they used 2*emb_dim if not ReGLU.
                ffn_activation_ReGLU = True, # NOTE: set True if ReGLU should be used
                ffn_dropout = 0.1,
                prenormalization = True,
                output_dim = 1,
                last_activation = 'exponential',
                exposure_name = "Exposure",
                seed_nr = int(random_seeds[run_index])
        )

        FT_transformer.predict(learn_train_dummy_data,verbose=0,batch_size=100000)

        # load the best saved model and epochs_and_time from the pickle file:
        # ----------------------
        # FT_transformer = keras.models.load_model(save_path +'/Poisson_FT_transformer')
        FT_transformer.load_weights(f'{storage_path}/saved_models/Poisson_FT_transformer_{run_index}.weights.h5')

        # predict with the model:
        # ----------------------
        y_pred["train"][f"FT_transformer_{run_index}"] = np.array([x for [x] in FT_transformer.predict(learn_data,verbose=0,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])
        y_pred["test"][f"FT_transformer_{run_index}"] = np.array([x for [x] in FT_transformer.predict(test_data,verbose=0,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])


    # evaluate the models:
    # ----------------------
    # store the results in the results class:
    y_pred["train"][f"Ensemble_FT_transformer_{index}"] = np.mean([y_pred["train"][f"FT_transformer_{i}"] for i in ensemble_range], axis=0)
    y_pred["test"][f"Ensemble_FT_transformer_{index}"] = np.mean([y_pred["test"][f"FT_transformer_{i}"] for i in ensemble_range], axis=0)

    Ensemble_FT_transformer_results = Results(model=f"Ensemble_FT_transformer (run: {index})",
                                epochs=0,
                                run_time=0,
                                nr_parameters=0,
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"Ensemble_FT_transformer_{index}"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"Ensemble_FT_transformer_{index}"]),
                                pred_avg_freq_train=y_pred["train"][f"Ensemble_FT_transformer_{index}"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"Ensemble_FT_transformer_{index}"].sum()/exposure["test"].sum())

    # # store the results in the result-dataframe:
    store_results_in_df(Ensemble_FT_transformer_results)

# because notebooks have no garbage collector, we delete here the unneeded data:
del learn_data, test_data

range(0, 5)
Model: 0



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 1
Model: 2
Model: 3
Model: 4
range(5, 10)
Model: 5



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 6
Model: 7
Model: 8
Model: 9
range(10, 15)
Model: 10



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 11
Model: 12
Model: 13
Model: 14


## 5.6 Ensembles CAFTT:

In [None]:
# create the new exposure times GLM3_pred column for CANN models.
df_freq_prep_nn["Exposure_x_GLM3_pred"] = list(poisson_glm3.predict(X_glm3)*df_freq_prep_nn["Exposure"])

# Create the dataframes needed for evaluation:
# --------------------
# NOTE: in the 2021 Gorishniy et al paper the batch size is different for the different Datasets
# but is not hyperparameter tuned. Bigger datasets they used a batch size of 1024 and
# for smaller datasets a batch size of (256/512).
batch_size = 1024
learn_data = df_to_tensor(df_freq_prep_nn[bool_in_learn], feature_cols=nr_col+cat_col, exposure="Exposure_x_GLM3_pred", target="ClaimNb", batch_size=batch_size)
test_data = df_to_tensor(df_freq_prep_nn[bool_in_test], feature_cols=nr_col+cat_col, exposure="Exposure_x_GLM3_pred", target="ClaimNb", batch_size=batch_size)

# NOTE we use at first just a fraction of the data to test the code:
learn_train_dummy_data = df_to_tensor(df_freq_prep_nn[bool_in_learn_train_dummy], feature_cols=nr_col+cat_col, exposure="Exposure_x_GLM3_pred", target="ClaimNb", batch_size=batch_size,
                                      dummy_data_for_build=True)

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        print(f"Model: {run_index}")
        # Define FT-Transformer Models:
        # ----------------------
        # NOTE: we use here tensorflow/keras model subclasses (not the functional or sequential api)
        # NOTE: we use here instead of the .fit function a costum training loop

        # create the model:
        # ----------------------
        set_random_seeds(int(random_seeds[run_index]))

        FT_transformer = EnhActuar.Feature_Tokenizer_Transformer(
                emb_dim = 32, # NOTE: In the default setting for the 2021 Gorishniy paper they used emb_dim = 192 (but the parameter size would here go trough the roof, so we use something smaller)
                nr_features = nr_col,
                cat_features = cat_col,
                cat_vocabulary = cat_vocabulary,
                count_transformer_blocks = 3,
                attention_n_heads = 8,
                attention_dropout = 0.2,
                ffn_d_hidden = None, # NOTE: change to None if ReGLU should be used -> None uses default value (4/3*emb_dim), they write that they used 2*emb_dim if not ReGLU.
                ffn_activation_ReGLU = True, # NOTE: set True if ReGLU should be used
                ffn_dropout = 0.1,
                prenormalization = True,
                output_dim = 1,
                last_activation = 'exponential',
                exposure_name = "Exposure_x_GLM3_pred",
                last_layer_initial_weights = "zeros",
                last_layer_initial_bias = "zeros",
                seed_nr = int(random_seeds[run_index])
        )

        FT_transformer.predict(learn_train_dummy_data,verbose=0,batch_size=100000)

        # load the best saved model and epochs_and_time from the pickle file:
        # ----------------------
        FT_transformer.load_weights(f'{storage_path}/saved_models/Poisson_CAFTT_{run_index}.weights.h5')

        # predict with the model:
        # ----------------------
        y_pred["train"][f"CAFTT_{run_index}"] = np.array([x for [x] in FT_transformer.predict(learn_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])
        y_pred["test"][f"CAFTT_{run_index}"] = np.array([x for [x] in FT_transformer.predict(test_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])

    # evaluate the models:
    # ----------------------
    # store the results in the results class:
    y_pred["train"][f"Ensemble_CAFTT_{index}"] = np.mean([y_pred["train"][f"CAFTT_{i}"] for i in ensemble_range], axis=0)
    y_pred["test"][f"Ensemble_CAFTT_{index}"] = np.mean([y_pred["test"][f"CAFTT_{i}"] for i in ensemble_range], axis=0)

    Ensemble_CAFTT_results = Results(model=f"Ensemble_CAFTT (run: {index})",
                                epochs=0,
                                run_time=0,
                                nr_parameters=0,
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"Ensemble_CAFTT_{index}"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"Ensemble_CAFTT_{index}"]),
                                pred_avg_freq_train=y_pred["train"][f"Ensemble_CAFTT_{index}"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"Ensemble_CAFTT_{index}"].sum()/exposure["test"].sum())

    # # store the results in the result-dataframe:
    store_results_in_df(Ensemble_CAFTT_results)

# because notebooks have no garbage collector, we delete here the unneeded data:
del learn_data, test_data

range(0, 5)
Model: 0



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 1
Model: 2
Model: 3
Model: 4
range(5, 10)
Model: 5
  1/596 [..............................] - ETA: 15s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 6
Model: 7
Model: 8
Model: 9
range(10, 15)
Model: 10
  4/596 [..............................] - ETA: 10s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 11
Model: 12
Model: 13
Model: 14


## 5.7 Ensembles LocalGLMftt:

In [None]:
# Create the dataframes for creation of the glm-ohe-start model:
# --------------------
data_nn_ohe_learn, y_true_learn = create_ffn_ohe_data(bool_in_learn)

# create dummy glm for initial weights
# ----------------------
poisson_glm_dummy = PoissonRegressor(alpha = 0,max_iter=1000) # scikit-learn.org: alpha = 0 is equivalent to unpenalized GLMs
poisson_glm_dummy.fit(data_nn_ohe_learn[0],y_true_learn/data_nn_ohe_learn[1],sample_weight=data_nn_ohe_learn[1]) # note: data_nn_ohe_learn = [X_ohe,exposure]
# get the betas from the glm:
glm_nr_col_betas = poisson_glm_dummy.coef_[:len(nr_col)]
current_beta_index = len(nr_col)
glm_cat_col_betas = {}
for c in cat_vocabulary.keys():
    glm_cat_col_betas[c] = poisson_glm_dummy.coef_[current_beta_index:current_beta_index+len(cat_vocabulary[c])]
    current_beta_index += len(cat_vocabulary[c])
glm_intercept = poisson_glm_dummy.intercept_


# Create the dataframes needed for evaluation:
# --------------------
# NOTE: in the 2021 Gorishniy et al paper the batch size is different for the different Datasets
# but is not hyperparameter tuned. Bigger datasets they used a batch size of 1024 and
# for smaller datasets a batch size of (256/512).
batch_size = 1024
learn_data = df_to_tensor(df_freq_prep_nn[bool_in_learn], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)
test_data = df_to_tensor(df_freq_prep_nn[bool_in_test], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)

learn_train_dummy_data = df_to_tensor(df_freq_prep_nn[bool_in_learn_train_dummy], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size,
                                      dummy_data_for_build=True)

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        print(f"Model: {run_index}")
        # Define FT-Transformer Models:
        # ----------------------
        # NOTE: we use here tensorflow/keras model subclasses (not the functional or sequential api)
        # NOTE: we use here instead of the .fit function a costum training loop

        # create the model:
        # ----------------------
        set_random_seeds(int(random_seeds[run_index]))
        LocalGLMftt = EnhActuar.LocalGLM_FT_Transformer(
                emb_dim = 32, # NOTE: In the default setting for the 2021 Gorishniy paper they used emb_dim = 192 (but the parameter size would here go trough the roof, so we use something smaller)
                nr_features = nr_col,
                cat_features = cat_col,
                cat_vocabulary = cat_vocabulary,
                count_transformer_blocks = 3,
                attention_n_heads = 8,
                attention_dropout = 0.2,
                ffn_d_hidden = None, # NOTE: change to None if ReGLU should be used -> None uses default value (4/3*emb_dim), they write that they used 2*emb_dim if not ReGLU.
                ffn_activation_ReGLU = True, # NOTE: set True if ReGLU should be used
                ffn_dropout = 0.1,
                prenormalization = True,
                output_dim = 1,
                last_activation = 'exponential',
                exposure_name = "Exposure",
                last_layer_initial_weights = "zeros",
                last_layer_initial_bias = "ones",
                init_glm_cat_col_weights = glm_cat_col_betas,
                init_glm_nr_col_weights = glm_nr_col_betas,
                init_glm_bias = glm_intercept,
                trainable_glm_emb = False,
                seed_nr = int(random_seeds[run_index])
        )

        LocalGLMftt.predict(learn_train_dummy_data,verbose=0,batch_size=100000)

        # load the best saved model and epochs_and_time from the pickle file:
        # ----------------------
        LocalGLMftt.load_weights(f'{storage_path}/saved_models/Poisson_LocalGLMftt_{run_index}.weights.h5')

        # predict with the model:
        # ----------------------
        y_pred["train"][f"LocalGLMftt_{run_index}"] = np.array([x for [x] in LocalGLMftt.predict(learn_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])
        y_pred["test"][f"LocalGLMftt_{run_index}"] = np.array([x for [x] in LocalGLMftt.predict(test_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])


    # evaluate the models:
    # ----------------------
    # store the results in the results class:
    y_pred["train"][f"Ensemble_LocalGLMftt_{index}"] = np.mean([y_pred["train"][f"LocalGLMftt_{i}"] for i in ensemble_range], axis=0)
    y_pred["test"][f"Ensemble_LocalGLMftt_{index}"] = np.mean([y_pred["test"][f"LocalGLMftt_{i}"] for i in ensemble_range], axis=0)

    Ensemble_LocalGLMftt_results = Results(model=f"Ensemble_LocalGLMftt (run: {index})",
                                epochs=0,
                                run_time=0,
                                nr_parameters=0,
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"Ensemble_LocalGLMftt_{index}"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"Ensemble_LocalGLMftt_{index}"]),
                                pred_avg_freq_train=y_pred["train"][f"Ensemble_LocalGLMftt_{index}"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"Ensemble_LocalGLMftt_{index}"].sum()/exposure["test"].sum())

    # # store the results in the result-dataframe:
    store_results_in_df(Ensemble_LocalGLMftt_results)

# because notebooks have no garbage collector, we delete here the unneeded data:
del learn_data, test_data



range(0, 5)
Model: 0
  1/596 [..............................] - ETA: 21s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 1
Model: 2
Model: 3
Model: 4
range(5, 10)
Model: 5
  1/596 [..............................] - ETA: 14s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 6
Model: 7
Model: 8
Model: 9
range(10, 15)
Model: 10
  1/596 [..............................] - ETA: 13s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 11
Model: 12
Model: 13
Model: 14


In [None]:
# # save the results:
# with open(f'{storage_path}/Data/df_results.pickle', 'wb') as handle:
#     pickle.dump(df_results, handle, protocol=pickle.HIGHEST_PROTOCOL)

# load the results:
with open(f'{storage_path}/Data/df_results.pickle', 'rb') as handle:
    df_results = pickle.load(handle)


# 6. Rebase Models

In [None]:
def create_single_rebase_results(model_name,run_index):
    rebase_factor = np.mean(y_true["train"])/np.mean(y_pred["train"][f"{model_name}_{run_index}"])

    y_pred["train"][f"Rebase_{model_name}_{run_index}"] = y_pred["train"][f"{model_name}_{run_index}"] * rebase_factor
    y_pred["test"][f"Rebase_{model_name}_{run_index}"] = y_pred["test"][f"{model_name}_{run_index}"] * rebase_factor

    Rebase_results = Results(model=f"Rebase_{model_name} (run: {run_index})",
                                epochs=0,
                                run_time=0,
                                nr_parameters=0,
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"Rebase_{model_name}_{run_index}"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"Rebase_{model_name}_{run_index}"]),
                                pred_avg_freq_train=y_pred["train"][f"Rebase_{model_name}_{run_index}"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"Rebase_{model_name}_{run_index}"].sum()/exposure["test"].sum())

    #  store the results in the result-dataframe:
    store_results_in_df(Rebase_results)

def create_ensemble_results(model_name,index):
    # evaluate the models:
    # ----------------------
    # store the results in the results class:
    y_pred["train"][f"Ensemble_{model_name}_{index}"] = np.mean([y_pred["train"][f"{model_name}_{i}"] for i in ensemble_range], axis=0)
    y_pred["test"][f"Ensemble_{model_name}_{index}"] = np.mean([y_pred["test"][f"{model_name}_{i}"] for i in ensemble_range], axis=0)

    Ensemble_results = Results(model=f"Ensemble_{model_name} (run: {index})",
                                epochs=0,
                                run_time=0,
                                nr_parameters=0,
                                poisson_deviance_loss_train=poisson_deviance_loss(y_true["train"], y_pred["train"][f"Ensemble_{model_name}_{index}"]),
                                poisson_deviance_loss_test=poisson_deviance_loss(y_true["test"], y_pred["test"][f"Ensemble_{model_name}_{index}"]),
                                pred_avg_freq_train=y_pred["train"][f"Ensemble_{model_name}_{index}"].sum()/exposure["train"].sum(),
                                pred_avg_freq_test=y_pred["test"][f"Ensemble_{model_name}_{index}"].sum()/exposure["test"].sum())

    # # store the results in the result-dataframe:
    store_results_in_df(Ensemble_results)

## 6.1 Rebase FFN OHE:

In [None]:
# Create the dataframes needed for evaluation:
data_nn_ohe_learn, y_true_learn = create_ffn_ohe_data(bool_in_learn)
data_nn_ohe_test, y_true_test = create_ffn_ohe_data(bool_in_test)

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        print(f"Model: {run_index}")
        # Define FNN Model:
        # ----------------------
        # note we use here the function api instead of the model subclassing
        # to make the code more readable and easier to understand:
        # (for the transformer based models we will use model subclasses)
        def Create_Poisson_FFN_OHE(input_dim=42,mean_model_results=1):
            # set random seeds
            set_random_seeds(int(random_seeds[run_index]))
            # Build the network
            Input_Matrix_OHE = tf.keras.layers.Input(shape=(input_dim,), dtype='float32', name='Input_Matrix')
            Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')
            hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(Input_Matrix_OHE)
            hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
            hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
            Result_FFN1 = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_FFN1',
                            weights=[np.zeros((10, 1)), np.array([np.log(mean_model_results)])],
                            trainable=True)(hidden3)
            Response = tf.keras.layers.Multiply(name='Result')([Result_FFN1, Input_Exposure])
            # Define and Return the model
            return tf.keras.models.Model(inputs=[Input_Matrix_OHE, Input_Exposure], outputs=[Response], name='Poisson_FFN_OHE')

        # create the model:
        # ----------------------
        FFN_OHE = Create_Poisson_FFN_OHE(input_dim=40,mean_model_results=constant_model)

        # load the saved model weights:
        # ----------------------
        FFN_OHE.load_weights(f'{storage_path}/saved_models/Poisson_FFN_OHE_{run_index}.weights.h5')


        # predict with the models:
        # ----------------------
        y_pred["train"][f"FFN_OHE_{run_index}"] = np.array([x for [x] in FFN_OHE.predict(data_nn_ohe_learn, verbose=0,
                                                                            batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
        y_pred["test"][f"FFN_OHE_{run_index}"] = np.array([x for [x] in FFN_OHE.predict(data_nn_ohe_test, verbose=0,
                                                                            batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])

        create_single_rebase_results("FFN_OHE",run_index)
    create_ensemble_results("Rebase_FFN_OHE",index)

# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_ohe_learn, data_nn_ohe_test

range(0, 5)
Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
range(5, 10)
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
range(10, 15)
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


## 6.2 Rebase FFN CAT EMB:

In [None]:
# Create the dataframes needed for evaluation:
data_nn_emb_learn, y_true_learn = create_ffn_cat_emb_data(bool_in_learn)
data_nn_emb_test, y_true_test = create_ffn_cat_emb_data(bool_in_test)


for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        # Define FNN with Cat. Embedding Model:
        # ----------------------
        # note we use here the function api instead of the model subclassing
        # to make the code more readable and easier to understand:
        # (for the transformer based models we will use model subclasses)
        print(f"Model: {run_index}")
        def Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=1,mean_model_results=1):
            # set random seeds
            set_random_seeds(int(random_seeds[run_index]))

            Input_Matrix_Num = tf.keras.layers.Input(shape=(input_nr_dim,), dtype='float32', name='Input_Matrix_Num')
            Input_VehBrand = tf.keras.layers.Input(shape=(1,), name='Input_VehBrand')
            Input_Region = tf.keras.layers.Input(shape=(1,), name='Input_Region')
            Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')

            All_Inputs = [Input_Matrix_Num,Input_VehBrand,Input_Region,Input_Exposure]

            Emb_VehBrand = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["VehBrand"].keys()),output_dim=emb_dim,
                                    embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05 ),
                                    name="Embedding_VehBrand")(Input_VehBrand)
            Emb_Region = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["Region"].keys()),output_dim=emb_dim,
                                    embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05),
                                name="Embedding_Region")(Input_Region)

            Reshaped_Emb_VehBrand = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_VehBrand")(Emb_VehBrand)
            Reshaped_Emb_Region = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_Region")(Emb_Region)

            concatenation_layer = tf.keras.layers.Concatenate(name="concatenation_layer")([Input_Matrix_Num,Reshaped_Emb_VehBrand,Reshaped_Emb_Region])

            # Build the network
            hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(concatenation_layer)
            hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
            hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
            Result_FFN1 = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_FFN1',
                            weights=[np.zeros((10, 1)), np.array([np.log(mean_model_results)])],
                            trainable=True)(hidden3)

            Response = tf.keras.layers.Multiply(name='Result')([Result_FFN1, Input_Exposure])

            # Define the model
            return tf.keras.models.Model(inputs=All_Inputs, outputs=[Response], name='Poisson_CAT_EMB')

        # create the model:
        # ----------------------
        emb_dim=2
        FNN_CAT_EMB = Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=emb_dim,mean_model_results=constant_model)

        # load the saved model weights:
        # ----------------------
        FNN_CAT_EMB.load_weights(f'{storage_path}/saved_models/Poisson_FNN_CAT_EMB_{run_index}.weights.h5')

        # predict with the model:
        # ----------------------
        y_pred["train"][f"FNN_CAT_EMB_{run_index}"] = np.array([x for [x] in FNN_CAT_EMB.predict(data_nn_emb_learn, verbose=0,
                                                                            batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
        y_pred["test"][f"FNN_CAT_EMB_{run_index}"] = np.array([x for [x] in FNN_CAT_EMB.predict(data_nn_emb_test, verbose=0,
                                                                        batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])


        create_single_rebase_results("FNN_CAT_EMB",run_index)
    create_ensemble_results("Rebase_FNN_CAT_EMB",index)

# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_emb_learn, data_nn_emb_test

range(0, 5)
Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
range(5, 10)
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
range(10, 15)
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


## 6.3 Rebase CANN:

In [None]:
# create the new exposure times GLM3_pred column for CANN models.
df_freq_prep_nn["Exposure_x_GLM3_pred"] = list(poisson_glm3.predict(X_glm3)*df_freq_prep_nn["Exposure"])

# Create the dataframes needed for evaluation:
data_nn_emb_learn, y_true_learn = create_ffn_cat_emb_data(bool_in_learn, exposure_name = "Exposure_x_GLM3_pred")
data_nn_emb_test, y_true_test = create_ffn_cat_emb_data(bool_in_test, exposure_name = "Exposure_x_GLM3_pred")

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        # Define FNN with Cat. Embedding Model:
        # ----------------------
        # note we use here the function api instead of the model subclassing
        # to make the code more readable and easier to understand:
        # (for the transformer based models we will use model subclasses)
        print(f"Model: {run_index}")
        def Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=1,mean_model_results=1):
            # set random seeds
            set_random_seeds(int(random_seeds[run_index]))

            Input_Matrix_Num = tf.keras.layers.Input(shape=(input_nr_dim,), dtype='float32', name='Input_Matrix_Num')
            Input_VehBrand = tf.keras.layers.Input(shape=(1,), name='Input_VehBrand')
            Input_Region = tf.keras.layers.Input(shape=(1,), name='Input_Region')
            Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')

            All_Inputs = [Input_Matrix_Num,Input_VehBrand,Input_Region,Input_Exposure]

            Emb_VehBrand = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["VehBrand"].keys()),output_dim=emb_dim,
                                    embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05 ),
                                    name="Embedding_VehBrand")(Input_VehBrand)
            Emb_Region = tf.keras.layers.Embedding(input_dim=len(cat_encoder_all["Region"].keys()),output_dim=emb_dim,
                                    embeddings_initializer=keras.initializers.RandomNormal(mean=1.0, stddev=0.05),
                                name="Embedding_Region")(Input_Region)

            Reshaped_Emb_VehBrand = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_VehBrand")(Emb_VehBrand)
            Reshaped_Emb_Region = tf.keras.layers.Reshape(target_shape=(emb_dim,),name="Reshaped_Embedding_Region")(Emb_Region)

            concatenation_layer = tf.keras.layers.Concatenate(name="concatenation_layer")([Input_Matrix_Num,Reshaped_Emb_VehBrand,Reshaped_Emb_Region])

            # Build the network
            hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(concatenation_layer)
            hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
            hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
            Result_FFN1 = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_FFN1',
                            weights=[np.zeros((10, 1)), np.array([0])],
                            trainable=True)(hidden3)

            Response = tf.keras.layers.Multiply(name='Result')([Result_FFN1, Input_Exposure])

            # Define the model
            return tf.keras.models.Model(inputs=All_Inputs, outputs=[Response], name='Poisson_CAT_EMB')

        # create the model:
        # ----------------------
        emb_dim=2
        CANN = Create_Poisson_FNN_CAT_EMB(input_nr_dim=7,emb_dim=emb_dim,mean_model_results=constant_model)

        # load the saved model weights:
        # ----------------------
        CANN.load_weights(f'{storage_path}/saved_models/Poisson_CANN_{run_index}.weights.h5')


        # predict with the model:
        # ----------------------
        y_pred["train"][f"CANN_{run_index}"] = np.array([x for [x] in CANN.predict(data_nn_emb_learn, verbose=0,
                                                                            batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
        y_pred["test"][f"CANN_{run_index}"] = np.array([x for [x] in CANN.predict(data_nn_emb_test, verbose=0,
                                                                        batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])

        create_single_rebase_results("CANN",run_index)
    create_ensemble_results("Rebase_CANN",index)

# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_emb_learn, data_nn_emb_test


range(0, 5)
Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
range(5, 10)
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
range(10, 15)
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


## 6.4 Rebase LocalGLMnet:

In [None]:
# Create the dataframes needed for evaluation:
data_nn_ohe_learn, y_true_learn = create_ffn_ohe_data(bool_in_learn)
data_nn_ohe_test, y_true_test = create_ffn_ohe_data(bool_in_test)

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        print(f"Model: {run_index}")
        # Define FNN with Cat. Embedding Model:
        # ----------------------
        # note we use here the function api instead of the model subclassing
        # to make the code more readable and easier to understand:
        # (for the transformer based models we will use model subclasses)

        # create dummy glm for initial weights
        # ----------------------
        poisson_glm_dummy = PoissonRegressor(alpha = 0,max_iter=1000) # scikit-learn.org: alpha = 0 is equivalent to unpenalized GLMs
        poisson_glm_dummy.fit(data_nn_ohe_learn[0],y_true_learn/data_nn_ohe_learn[1],sample_weight=data_nn_ohe_learn[1]) # note: data_nn_ohe_learn = [X_ohe,exposure]

        # Define LocalGLMnet:
        # ----------------------
        def Create_Poisson_LocalGLMnet(input_dim=40,initial_glm_bias=1, initial_glm_betas=None):
            # set random seeds
            set_random_seeds(int(random_seeds[run_index]))
            Input_Matrix_OHE = tf.keras.layers.Input(shape=(input_dim,), dtype='float32', name='Input_Matrix')
            Input_Exposure = tf.keras.layers.Input(shape=(1,), dtype='float32', name='Input_Exposure')
            # Build the network
            hidden1 = tf.keras.layers.Dense(units=20, activation=tanh, name='hidden1')(Input_Matrix_OHE)
            hidden2 = tf.keras.layers.Dense(units=15, activation=tanh, name='hidden2')(hidden1)
            hidden3 = tf.keras.layers.Dense(units=10, activation=tanh, name='hidden3')(hidden2)
            Attention = tf.keras.layers.Dense(units=input_dim, activation='linear', name='attention',
                            weights=[np.zeros((10, input_dim)), initial_glm_betas])(hidden3)
            # note that the weights are set to 0 and the bias is set to the initial glm betas
            # create a layer that calculates the dot product between the attention weights (Attention) and the input matrix Input_Matrix_OHE:
            # (Attention has the same dimension as the input matrix Input_Matrix_OHE):
            weighted_input = tf.keras.layers.Multiply(name='feature_contributions')([Attention, Input_Matrix_OHE])
            scalar_product = tf.keras.layers.Dense(units=1, activation='linear', name='scalar_product',
                                weights=[np.ones((input_dim, 1)), np.array([0])],
                                trainable=False)(weighted_input)
            # Note that we actually don't want to make the following weights trainable,
            # but to get the bias to be trainable we need to do so. see comment in Book Wüthrich & Merz (2023) page 500
            Result_LocalGLMnet_without_Exposure = tf.keras.layers.Dense(units=1, activation='exponential', name='Result_LocalGLMnet_without_Exposure',
                            weights=[np.ones((1, 1)), np.array([initial_glm_bias])],
                            trainable=True)(scalar_product)
            Response = tf.keras.layers.Multiply(name='Result')([Result_LocalGLMnet_without_Exposure, Input_Exposure])
            return tf.keras.models.Model(inputs=[Input_Matrix_OHE, Input_Exposure], outputs=[Response], name='Poisson_LocalGLMnet')

        # create the model:
        # ----------------------
        LocalGLMnet = Create_Poisson_LocalGLMnet(input_dim=40,initial_glm_bias=poisson_glm_dummy.intercept_,initial_glm_betas=poisson_glm_dummy.coef_)

        # load the saved model weights:
        # ----------------------
        LocalGLMnet.load_weights(f'{storage_path}/saved_models/Poisson_LocalGLMnet_{run_index}.weights.h5')

        # predict with the model:
        # ----------------------
        y_pred["train"][f"LocalGLMnet_{run_index}"] = np.array([x for [x] in LocalGLMnet.predict(data_nn_ohe_learn, verbose=0,
                                                                            batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])
        y_pred["test"][f"LocalGLMnet_{run_index}"] = np.array([x for [x] in LocalGLMnet.predict(data_nn_ohe_test, verbose=0,
                                                                        batch_size=100000,use_multiprocessing=True, workers=os.cpu_count())])

        create_single_rebase_results("LocalGLMnet",run_index)
    create_ensemble_results("Rebase_LocalGLMnet",index)

# because notebooks have no garbage collector, we delete here the unneeded data:
del data_nn_ohe_learn, data_nn_ohe_test


range(0, 5)
Model: 0
Model: 1
Model: 2
Model: 3
Model: 4
range(5, 10)
Model: 5
Model: 6
Model: 7
Model: 8
Model: 9
range(10, 15)
Model: 10
Model: 11
Model: 12
Model: 13
Model: 14


## 6.5 Rebase FT-Transformer:

In [None]:
# Create the dataframes needed for evaluation:
# --------------------
# NOTE: in the 2021 Gorishniy paper the batch size is different for the different Datasets
# but is not hyperparameter tuned. Bigger datasets they used a batch size of 1024 and
# for smaller datasets a batch size of (256/512).
batch_size = 1024
learn_data = df_to_tensor(df_freq_prep_nn[bool_in_learn], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)
test_data = df_to_tensor(df_freq_prep_nn[bool_in_test], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)

# NOTE we use at first just a fraction of the data to test the code:
learn_train_dummy_data = df_to_tensor(df_freq_prep_nn[bool_in_learn_train_dummy], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size,
                                      dummy_data_for_build=True)

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        print(f"Model: {run_index}")
        # Define FT-Transformer Models:
        # ----------------------
        # NOTE: we use here tensorflow/keras model subclasses (not the functional or sequential api)
        # NOTE: we use here instead of the .fit function a costum training loop

        # create the model:
        # ----------------------
        set_random_seeds(int(random_seeds[run_index]))

        FT_transformer = EnhActuar.Feature_Tokenizer_Transformer(
                emb_dim = 32, # NOTE: In the default setting for the 2021 Gorishniy paper they used emb_dim = 192 (but the parameter size would here go trough the roof, so we use something smaller)
                nr_features = nr_col,
                cat_features = cat_col,
                cat_vocabulary = cat_vocabulary,
                count_transformer_blocks = 3,
                attention_n_heads = 8,
                attention_dropout = 0.2,
                ffn_d_hidden = None, # NOTE: change to None if ReGLU should be used -> None uses default value (4/3*emb_dim), they write that they used 2*emb_dim if not ReGLU.
                ffn_activation_ReGLU = True, # NOTE: set True if ReGLU should be used
                ffn_dropout = 0.1,
                prenormalization = True,
                output_dim = 1,
                last_activation = 'exponential',
                exposure_name = "Exposure",
                seed_nr = int(random_seeds[run_index])
        )

        FT_transformer.predict(learn_train_dummy_data,verbose=0,batch_size=100000)

        # load the best saved model and epochs_and_time from the pickle file:
        # ----------------------
        # FT_transformer = keras.models.load_model(save_path +'/Poisson_FT_transformer')
        FT_transformer.load_weights(f'{storage_path}/saved_models/Poisson_FT_transformer_{run_index}.weights.h5')

        # predict with the model:
        # ----------------------
        y_pred["train"][f"FT_transformer_{run_index}"] = np.array([x for [x] in FT_transformer.predict(learn_data,verbose=0,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])
        y_pred["test"][f"FT_transformer_{run_index}"] = np.array([x for [x] in FT_transformer.predict(test_data,verbose=0,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])

        create_single_rebase_results("FT_transformer",run_index)
    create_ensemble_results("Rebase_FT_transformer",index)

# because notebooks have no garbage collector, we delete here the unneeded data:
del learn_data, test_data

range(0, 5)
Model: 0



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 1



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 2



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 3



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 4



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



range(5, 10)
Model: 5



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 6



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 7



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 8



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 9



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



range(10, 15)
Model: 10



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 11



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 12



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 13



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 14



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



## 6.6 Rebase CAFTT:

In [None]:
# create the new exposure times GLM3_pred column for CANN models.
df_freq_prep_nn["Exposure_x_GLM3_pred"] = list(poisson_glm3.predict(X_glm3)*df_freq_prep_nn["Exposure"])

# Create the dataframes needed for evaluation:
# --------------------
# NOTE: in the 2021 Gorishniy et al paper the batch size is different for the different Datasets
# but is not hyperparameter tuned. Bigger datasets they used a batch size of 1024 and
# for smaller datasets a batch size of (256/512).
batch_size = 1024
learn_data = df_to_tensor(df_freq_prep_nn[bool_in_learn], feature_cols=nr_col+cat_col, exposure="Exposure_x_GLM3_pred", target="ClaimNb", batch_size=batch_size)
test_data = df_to_tensor(df_freq_prep_nn[bool_in_test], feature_cols=nr_col+cat_col, exposure="Exposure_x_GLM3_pred", target="ClaimNb", batch_size=batch_size)

# NOTE we use at first just a fraction of the data to test the code:
learn_train_dummy_data = df_to_tensor(df_freq_prep_nn[bool_in_learn_train_dummy], feature_cols=nr_col+cat_col, exposure="Exposure_x_GLM3_pred", target="ClaimNb", batch_size=batch_size,
                                      dummy_data_for_build=True)

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        print(f"Model: {run_index}")
        # Define FT-Transformer Models:
        # ----------------------
        # NOTE: we use here tensorflow/keras model subclasses (not the functional or sequential api)
        # NOTE: we use here instead of the .fit function a costum training loop

        # create the model:
        # ----------------------
        set_random_seeds(int(random_seeds[run_index]))

        FT_transformer = EnhActuar.Feature_Tokenizer_Transformer(
                emb_dim = 32, # NOTE: In the default setting for the 2021 Gorishniy paper they used emb_dim = 192 (but the parameter size would here go trough the roof, so we use something smaller)
                nr_features = nr_col,
                cat_features = cat_col,
                cat_vocabulary = cat_vocabulary,
                count_transformer_blocks = 3,
                attention_n_heads = 8,
                attention_dropout = 0.2,
                ffn_d_hidden = None, # NOTE: change to None if ReGLU should be used -> None uses default value (4/3*emb_dim), they write that they used 2*emb_dim if not ReGLU.
                ffn_activation_ReGLU = True, # NOTE: set True if ReGLU should be used
                ffn_dropout = 0.1,
                prenormalization = True,
                output_dim = 1,
                last_activation = 'exponential',
                exposure_name = "Exposure_x_GLM3_pred",
                last_layer_initial_weights = "zeros",
                last_layer_initial_bias = "zeros",
                seed_nr = int(random_seeds[run_index])
        )

        FT_transformer.predict(learn_train_dummy_data,verbose=0,batch_size=100000)

        # load the best saved model and epochs_and_time from the pickle file:
        # ----------------------
        FT_transformer.load_weights(f'{storage_path}/saved_models/Poisson_CAFTT_{run_index}.weights.h5')

        # predict with the model:
        # ----------------------
        y_pred["train"][f"CAFTT_{run_index}"] = np.array([x for [x] in FT_transformer.predict(learn_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])
        y_pred["test"][f"CAFTT_{run_index}"] = np.array([x for [x] in FT_transformer.predict(test_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])

        create_single_rebase_results("CAFTT",run_index)
    create_ensemble_results("Rebase_CAFTT",index)

# because notebooks have no garbage collector, we delete here the unneeded data:
del learn_data, test_data

range(0, 5)
Model: 0
  1/596 [..............................] - ETA: 20s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 1
  4/596 [..............................] - ETA: 11s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 2
  1/596 [..............................] - ETA: 13s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 3
  4/596 [..............................] - ETA: 10s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 4
  1/596 [..............................] - ETA: 12s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



range(5, 10)
Model: 5
  4/596 [..............................] - ETA: 11s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 6
  4/596 [..............................] - ETA: 10s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 7
  4/596 [..............................] - ETA: 10s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 8
  1/596 [..............................] - ETA: 13s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 9
  1/596 [..............................] - ETA: 13s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



range(10, 15)
Model: 10
  3/596 [..............................] - ETA: 15s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 11
  4/596 [..............................] - ETA: 10s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 12
  1/596 [..............................] - ETA: 12s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 13
  1/596 [..............................] - ETA: 12s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 14
  4/596 [..............................] - ETA: 10s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.





## 6.7 Rebase LocalGLMftt:

In [None]:
# Create the dataframes for creation of the glm-ohe-start model:
# --------------------
data_nn_ohe_learn, y_true_learn = create_ffn_ohe_data(bool_in_learn)

# create dummy glm for initial weights
# ----------------------
poisson_glm_dummy = PoissonRegressor(alpha = 0,max_iter=1000) # scikit-learn.org: alpha = 0 is equivalent to unpenalized GLMs
poisson_glm_dummy.fit(data_nn_ohe_learn[0],y_true_learn/data_nn_ohe_learn[1],sample_weight=data_nn_ohe_learn[1]) # note: data_nn_ohe_learn = [X_ohe,exposure]
# get the betas from the glm:
glm_nr_col_betas = poisson_glm_dummy.coef_[:len(nr_col)]
current_beta_index = len(nr_col)
glm_cat_col_betas = {}
for c in cat_vocabulary.keys():
    glm_cat_col_betas[c] = poisson_glm_dummy.coef_[current_beta_index:current_beta_index+len(cat_vocabulary[c])]
    current_beta_index += len(cat_vocabulary[c])
glm_intercept = poisson_glm_dummy.intercept_


# Create the dataframes needed for evaluation:
# --------------------
# NOTE: in the 2021 Gorishniy paper the batch size is different for the different Datasets
# but is not hyperparameter tuned. Bigger datasets they used a batch size of 1024 and
# for smaller datasets a batch size of (256/512).
batch_size = 1024
learn_data = df_to_tensor(df_freq_prep_nn[bool_in_learn], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)
test_data = df_to_tensor(df_freq_prep_nn[bool_in_test], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size)

learn_train_dummy_data = df_to_tensor(df_freq_prep_nn[bool_in_learn_train_dummy], feature_cols=nr_col+cat_col, exposure="Exposure", target="ClaimNb", batch_size=batch_size,
                                      dummy_data_for_build=True)

for index, ensemble_range in enumerate([range(0,5),range(5,10),range(10,15)]):
    print(ensemble_range)
    for run_index in ensemble_range:

        print(f"Model: {run_index}")
        # Define FT-Transformer Models:
        # ----------------------
        # NOTE: we use here tensorflow/keras model subclasses (not the functional or sequential api)
        # NOTE: we use here instead of the .fit function a costum training loop

        # create the model:
        # ----------------------
        set_random_seeds(int(random_seeds[run_index]))
        LocalGLMftt = EnhActuar.LocalGLM_FT_Transformer(
                emb_dim = 32, # NOTE: In the default setting for the 2021 Gorishniy paper they used emb_dim = 192 (but the parameter size would here go trough the roof, so we use something smaller)
                nr_features = nr_col,
                cat_features = cat_col,
                cat_vocabulary = cat_vocabulary,
                count_transformer_blocks = 3,
                attention_n_heads = 8,
                attention_dropout = 0.2,
                ffn_d_hidden = None, # NOTE: change to None if ReGLU should be used -> None uses default value (4/3*emb_dim), they write that they used 2*emb_dim if not ReGLU.
                ffn_activation_ReGLU = True, # NOTE: set True if ReGLU should be used
                ffn_dropout = 0.1,
                prenormalization = True,
                output_dim = 1,
                last_activation = 'exponential',
                exposure_name = "Exposure",
                last_layer_initial_weights = "zeros",
                last_layer_initial_bias = "ones",
                init_glm_cat_col_weights = glm_cat_col_betas,
                init_glm_nr_col_weights = glm_nr_col_betas,
                init_glm_bias = glm_intercept,
                trainable_glm_emb = False,
                seed_nr = int(random_seeds[run_index])
        )

        LocalGLMftt.predict(learn_train_dummy_data,verbose=0,batch_size=100000)

        # load the best saved model and epochs_and_time from the pickle file:
        # ----------------------
        LocalGLMftt.load_weights(f'{storage_path}/saved_models/Poisson_LocalGLMftt_{run_index}.weights.h5')

        # predict with the model:
        # ----------------------
        y_pred["train"][f"LocalGLMftt_{run_index}"] = np.array([x for [x] in LocalGLMftt.predict(learn_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])
        y_pred["test"][f"LocalGLMftt_{run_index}"] = np.array([x for [x] in LocalGLMftt.predict(test_data,batch_size=100000,use_multiprocessing=True, workers=os.cpu_count()
                                                                                    )["output"]])

        create_single_rebase_results("LocalGLMftt",run_index)
    create_ensemble_results("Rebase_LocalGLMftt",index)

# because notebooks have no garbage collector, we delete here the unneeded data:
del learn_data, test_data



range(0, 5)
Model: 0
  1/596 [..............................] - ETA: 22s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 1
  1/596 [..............................] - ETA: 13s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 2
  1/596 [..............................] - ETA: 14s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 3
  1/596 [..............................] - ETA: 13s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 4
  1/596 [..............................] - ETA: 14s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



range(5, 10)
Model: 5



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 6



The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 7
  1/596 [..............................] - ETA: 14s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 8
  1/596 [..............................] - ETA: 13s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 9
  1/596 [..............................] - ETA: 14s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



range(10, 15)
Model: 10
  1/596 [..............................] - ETA: 14s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 11
  1/596 [..............................] - ETA: 13s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 12
  1/596 [..............................] - ETA: 14s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 13
  1/596 [..............................] - ETA: 13s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.



Model: 14
  1/596 [..............................] - ETA: 13s


The `keras.initializers.serialize()` API should only be used for objects of type `keras.initializers.Initializer`. Found an instance of type <class 'tensorflow.python.ops.init_ops_v2.RandomUniform'>, which may lead to improper serialization.





In [None]:
# # save the results:
# with open(f'{storage_path}/Data/df_results.pickle', 'wb') as handle:
#     pickle.dump(df_results, handle, protocol=pickle.HIGHEST_PROTOCOL)

# load the results:
with open(f'{storage_path}/Data/df_results.pickle', 'rb') as handle:
    df_results = pickle.load(handle)


# Result:

In [46]:
# display(df_results)

In [48]:
print("Results Average:")
display(calc_avg_df(["homogeneous model","GLM1","GLM2","GLM3","FFN_OHE","FNN_CAT_EMB","CANN","LocalGLMnet","FT_transformer","CAFTT","LocalGLMftt"]))
print("Results Standard-Deviation:")
display(calc_std_df(["homogeneous model","GLM1","GLM2","GLM3","FFN_OHE","FNN_CAT_EMB","CANN","LocalGLMnet","FT_transformer","CAFTT","LocalGLMftt"]))

Results Average:


Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model,0.0,0.054847,1.0,0.252132,0.254454,0.073631,0.073631
1,GLM1,0.0,2.220158,49.0,0.241015,0.241463,0.073631,0.0739
2,GLM2,0.0,2.752693,48.0,0.240911,0.241125,0.073631,0.073981
3,GLM3,0.0,1.900497,50.0,0.240844,0.241022,0.073631,0.074048
4,FFN_OHE,42.2,37.80556,1306.0,0.237535,0.238652,0.073906,0.07431
5,FNN_CAT_EMB,72.933333,58.728892,792.0,0.237682,0.238267,0.073774,0.074238
6,CANN,90.333333,68.55912,792.0,0.23742,0.238102,0.074019,0.074438
7,LocalGLMnet,25.333333,29.720892,1737.0,0.237095,0.239211,0.073825,0.074267
8,FT_transformer,78.866667,1569.86041,27133.0,0.237803,0.239389,0.06114,0.06129
9,CAFTT,57.333333,1170.160723,27133.0,0.237146,0.238072,0.065975,0.066235


Results Standard-Deviation:


Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,homogeneous model,0.0,0.002512,0.0,0.0,5.74595e-17,2.872975e-17,2.872975e-17
1,GLM1,0.0,0.473819,0.0,5.74595e-17,5.74595e-17,1.436488e-17,0.0
2,GLM2,0.0,0.970156,0.0,5.74595e-17,2.872975e-17,0.0,1.436488e-17
3,GLM3,0.0,0.408252,0.0,2.872975e-17,2.872975e-17,0.0,1.436488e-17
4,FFN_OHE,14.663853,8.624492,0.0,0.0003255191,0.0001570462,0.001223993,0.001209107
5,FNN_CAT_EMB,21.661245,13.907265,0.0,0.0001590947,0.0001514444,0.001071399,0.001088943
6,CANN,53.898935,33.162059,0.0,0.0006076588,0.0003253586,0.001111365,0.001103431
7,LocalGLMnet,7.622023,5.097405,0.0,0.000334063,0.0002176521,0.0008787938,0.0009078453
8,FT_transformer,16.638452,302.316624,0.0,0.000898291,0.0005281384,0.001459836,0.001458034
9,CAFTT,14.185841,220.449926,0.0,0.0004739992,0.0001743234,0.0004993066,0.0004707813


## Compare Single Model Results to Ensemble (not rebalanced) Results:

In [49]:
print("Results Average:")
display(calc_avg_df(["FFN_OHE","Ensemble_FFN_OHE",
                     "FNN_CAT_EMB","Ensemble_FNN_CAT_EMB",
                     "CANN","Ensemble_CANN",
                     "LocalGLMnet","Ensemble_LocalGLMnet",
                     "FT_transformer","Ensemble_FT_transformer",
                     "CAFTT", "Ensemble_CAFTT",
                     "LocalGLMftt", "Ensemble_LocalGLMftt"]))
print("Results Standard-Deviation:")
display(calc_std_df(["FFN_OHE","Ensemble_FFN_OHE",
                     "FNN_CAT_EMB","Ensemble_FNN_CAT_EMB",
                     "CANN","Ensemble_CANN",
                     "LocalGLMnet","Ensemble_LocalGLMnet",
                     "FT_transformer","Ensemble_FT_transformer",
                     "CAFTT", "Ensemble_CAFTT",
                     "LocalGLMftt", "Ensemble_LocalGLMftt"]))

Results Average:


Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,FFN_OHE,42.2,37.80556,1306.0,0.237535,0.238652,0.073906,0.07431
1,Ensemble_FFN_OHE,0.0,0.0,0.0,0.237152,0.23826,0.073906,0.07431
2,FNN_CAT_EMB,72.933333,58.728892,792.0,0.237682,0.238267,0.073774,0.074238
3,Ensemble_FNN_CAT_EMB,0.0,0.0,0.0,0.237432,0.238009,0.073773,0.074238
4,CANN,90.333333,68.55912,792.0,0.23742,0.238102,0.074019,0.074438
5,Ensemble_CANN,0.0,0.0,0.0,0.237013,0.237699,0.074019,0.074438
6,LocalGLMnet,25.333333,29.720892,1737.0,0.237095,0.239211,0.073825,0.074267
7,Ensemble_LocalGLMnet,0.0,0.0,0.0,0.236635,0.238734,0.073825,0.074267
8,FT_transformer,78.866667,1569.86041,27133.0,0.237803,0.239389,0.06114,0.06129
9,Ensemble_FT_transformer,0.0,0.0,0.0,0.237176,0.238803,0.06114,0.06129


Results Standard-Deviation:


Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,FFN_OHE,14.663853,8.624492,0.0,0.000326,0.000157,0.001224,0.001209
1,Ensemble_FFN_OHE,0.0,0.0,0.0,0.000224,9.8e-05,0.000648,0.000667
2,FNN_CAT_EMB,21.661245,13.907265,0.0,0.000159,0.000151,0.001071,0.001089
3,Ensemble_FNN_CAT_EMB,0.0,0.0,0.0,3.4e-05,0.000109,0.000603,0.00063
4,CANN,53.898935,33.162059,0.0,0.000608,0.000325,0.001111,0.001103
5,Ensemble_CANN,0.0,0.0,0.0,0.000434,0.000299,0.000732,0.000739
6,LocalGLMnet,7.622023,5.097405,0.0,0.000334,0.000218,0.000879,0.000908
7,Ensemble_LocalGLMnet,0.0,0.0,0.0,0.000133,1.7e-05,0.000105,0.000106
8,FT_transformer,16.638452,302.316624,0.0,0.000898,0.000528,0.00146,0.001458
9,Ensemble_FT_transformer,0.0,0.0,0.0,6.6e-05,0.000143,0.000132,0.000109


## Rebased Results (FT_transformer):

In [51]:
print("Results Average:")
display(calc_avg_df(["FT_transformer","Rebase_FT_transformer","Ensemble_Rebase_FT_transformer","CAFTT","Rebase_CAFTT","Ensemble_Rebase_CAFTT","LocalGLMftt","Rebase_LocalGLMftt","Ensemble_Rebase_LocalGLMftt"]))
print("Results Standard-Deviation:")
display(calc_std_df(["FT_transformer","Rebase_FT_transformer","Ensemble_Rebase_FT_transformer","CAFTT","Rebase_CAFTT","Ensemble_Rebase_CAFTT","LocalGLMftt","Rebase_LocalGLMftt","Ensemble_Rebase_LocalGLMftt"]))

Results Average:


Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,FT_transformer,78.866667,1569.86041,27133.0,0.237803,0.239389,0.06114,0.06129
1,Rebase_FT_transformer,0.0,0.0,0.0,0.236517,0.238149,0.073631,0.073812
2,Ensemble_Rebase_FT_transformer,0.0,0.0,0.0,0.235922,0.237587,0.073631,0.073812
3,CAFTT,57.333333,1170.160723,27133.0,0.237146,0.238072,0.065975,0.066235
4,Rebase_CAFTT,0.0,0.0,0.0,0.236692,0.237659,0.073631,0.073921
5,Ensemble_Rebase_CAFTT,0.0,0.0,0.0,0.236296,0.237263,0.073631,0.073921
6,LocalGLMftt,53.2,1187.006637,27430.0,0.237214,0.238801,0.067904,0.068316
7,Rebase_LocalGLMftt,0.0,0.0,0.0,0.236958,0.238589,0.073631,0.074077
8,Ensemble_Rebase_LocalGLMftt,0.0,0.0,0.0,0.236448,0.238111,0.073631,0.074077


Results Standard-Deviation:


Unnamed: 0,model,epochs,run_time,nr_parameters,loss_train,loss_test,pred_avg_freq_train,pred_avg_freq_test
0,FT_transformer,16.638452,302.316624,0.0,0.000898,0.000528,0.001459836,0.001458
1,Rebase_FT_transformer,0.0,0.0,0.0,0.000636,0.00036,1.299345e-08,0.000106
2,Ensemble_Rebase_FT_transformer,0.0,0.0,0.0,3.5e-05,0.000144,1.947447e-08,2.7e-05
3,CAFTT,14.185841,220.449926,0.0,0.000474,0.000174,0.0004993066,0.000471
4,Rebase_CAFTT,0.0,0.0,0.0,0.000458,0.000171,1.597395e-08,5.5e-05
5,Ensemble_Rebase_CAFTT,0.0,0.0,0.0,0.000185,6e-05,3.497719e-09,1.4e-05
6,LocalGLMftt,16.384662,310.648783,0.0,0.000593,0.00016,0.0009643511,0.000992
7,Rebase_LocalGLMftt,0.0,0.0,0.0,0.000659,0.00017,1.735819e-08,6.7e-05
8,Ensemble_Rebase_LocalGLMftt,0.0,0.0,0.0,0.000338,9.9e-05,2.731805e-08,2.8e-05


# Information about the Packages and Environment:

In [54]:
#print python version
print("the python version is:")
print(sys.version)
print(sys.version_info)
print()
print("print pip list:")
!pip list
print()
print("print conda list:")
!conda list

the python version is:
3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
sys.version_info(major=3, minor=10, micro=12, releaselevel='final', serial=0)

print pip list:
Package                          Version
-------------------------------- ---------------------
absl-py                          1.4.0
aiohttp                          3.8.6
aiosignal                        1.3.1
alabaster                        0.7.13
albumentations                   1.3.1
altair                           4.2.2
anyio                            3.7.1
appdirs                          1.4.4
argon2-cffi                      23.1.0
argon2-cffi-bindings             21.2.0
array-record                     0.5.0
arviz                            0.15.1
astropy                          5.3.4
astunparse                       1.6.3
async-timeout                    4.0.3
atpublic                         4.0
attrs                            23.1.0
audioread                        3.0.1
autograd                      