# Student Performance from Game Play Using TensorFlow Decision Forests

---

This notebook will take you through the steps needed to train a baseline Gradient Boosted Trees Model using TensorFlow Decision Forests on the `Student Performance from Game Play` dataset made available for this competition, to predict if players will answer questions correctly.
We will load the data from a CSV file. Roughly, the code will look as follows:

```
import tensorflow_decision_forests as tfdf
import pandas as pd
  
dataset = pd.read_csv("project/dataset.csv")
tf_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(dataset, label="my_label")

model = tfdf.keras.GradientBoostedTreesModel()
model.fit(tf_dataset)
  
print(model.summary())
```

We will also learn how to optimize reading of big datasets, do some feature engineering, data visualization and calculate better results using the F1-score


Decision Forests are a family of tree-based models including Random Forests and Gradient Boosted Trees. They are the best place to start when working with tabular data, and will often outperform (or provide a strong baseline) before you begin experimenting with neural networks.

One of the key aspects of TensorFlow Decision Forests that makes it even more suitable for this competition, particularly given the runtime limitations, is that it has been extensively tested for training and inference on CPUs, making it possible to train it on lower-end machines.

# Import the Required Libraries

In [2]:
import tensorflow as tf
import tensorflow_addons as tfa
import tensorflow_decision_forests as tfdf

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


import gc
import os

import pandas as pd
import numpy as np
import warnings
import pickle
import polars as pl

from collections import defaultdict
from itertools import combinations
import pyarrow as pa

from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.model_selection import train_test_split


2023-06-07 12:02:12.643830: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-06-07 12:02:12.662815: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://gi

In [3]:
print("TensorFlow Decision Forests v" + tfdf.__version__)
print("TensorFlow Addons v" + tfa.__version__)
print("TensorFlow v" + tf.__version__)

TensorFlow Decision Forests v1.3.0
TensorFlow Addons v0.20.0
TensorFlow v2.12.0


# Load the Dataset

Since the dataset is huge, some people may face memory errors while reading the dataset from the csv. To avoid this, we will try to optimize the memory used by Pandas to load and store the dataset.


When Pandas loads a dataset, by default, it automatically detects the data types of the different columns.
Irresepective of the maximum value that is stored in these columns, Pandas assigns `int64` for numerical columns, `float64` for float columns, `object` dtype for string columns etc.


We may be able to reduce the size of these columns in memory by downcasting numerical columns to smaller types (like `int8`, `int32`, `float32` etc.), if their maximum values don't need the larger types for storage, (like `int64`, `float64` etc.).


Similarly, Pandas automatically detects string columns as `object` datatype. To reduce memory usage of string columns which store categorical data, we specify their datatype as `category`.


Many of the columns in this dataset can be downcast to smaller types.

We will provide a dict of `dtypes` for columns to pandas while reading the dataset.

In [4]:
dtypes={'session_id':'category', 
'elapsed_time':np.int32,
    'index':np.int32,
    'event_name':'category',
    'name':'category',
    'level':np.uint8,
    'page':'category',
    'room_coor_x':np.float32,
    'room_coor_y':np.float32,
    'screen_coor_x':np.float32,
    'screen_coor_y':np.float32,
    'hover_duration':np.float32,
     'text':'category',
     'fqid':'category',
     'room_fqid':'category',
     'text_fqid':'category',
     'fullscreen':'category',
     'hq':'category',
     'music':'category',
     'level_group':'category'}
work_path = 'data/predict-student-performance-from-game-play/'
train_df=pd.read_csv(work_path+'train.csv', dtype=dtypes)
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26296946 entries, 0 to 26296945
Data columns (total 20 columns):
 #   Column          Dtype   
---  ------          -----   
 0   session_id      category
 1   index           int32   
 2   elapsed_time    int32   
 3   event_name      category
 4   name            category
 5   level           uint8   
 6   page            category
 7   room_coor_x     float32 
 8   room_coor_y     float32 
 9   screen_coor_x   float32 
 10  screen_coor_y   float32 
 11  hover_duration  float32 
 12  text            category
 13  fqid            category
 14  room_fqid       category
 15  text_fqid       category
 16  fullscreen      category
 17  hq              category
 18  music           category
 19  level_group     category
dtypes: category(12), float32(5), int32(2), uint8(1)
memory usage: 1.1 GB


### Useful info

In [5]:
# CATS = ['event_name', 'name', 'fqid', 'room_fqid', 'text_fqid']
# NUMS = ['page', 'room_coor_x', 'room_coor_y', 'screen_coor_x', 'screen_coor_y',
#         'hover_duration', 'elapsed_time_diff']
# fqid_lists = ['worker', 'archivist', 'gramps', 'wells', 'toentry', 'confrontation', 'crane_ranger', 'groupconvo', 'flag_girl', 'tomap', 'tostacks', 'tobasement', 'archivist_glasses', 'boss', 'journals', 'seescratches', 'groupconvo_flag', 'cs', 'teddy', 'expert', 'businesscards', 'ch3start', 'tunic.historicalsociety', 'tofrontdesk', 'savedteddy', 'plaque', 'glasses', 'tunic.drycleaner', 'reader_flag', 'tunic.library', 'tracks', 'tunic.capitol_2', 'trigger_scarf', 'reader', 'directory', 'tunic.capitol_1', 'journals.pic_0.next', 'unlockdoor', 'tunic', 'what_happened', 'tunic.kohlcenter', 'tunic.humanecology', 'colorbook', 'logbook', 'businesscards.card_0.next', 'journals.hub.topics', 'logbook.page.bingo', 'journals.pic_1.next', 'journals_flag', 'reader.paper0.next', 'tracks.hub.deer', 'reader_flag.paper0.next', 'trigger_coffee', 'wellsbadge', 'journals.pic_2.next', 'tomicrofiche', 'journals_flag.pic_0.bingo', 'plaque.face.date', 'notebook', 'tocloset_dirty', 'businesscards.card_bingo.bingo', 'businesscards.card_1.next', 'tunic.wildlife', 'tunic.hub.slip', 'tocage', 'journals.pic_2.bingo', 'tocollectionflag', 'tocollection', 'chap4_finale_c', 'chap2_finale_c', 'lockeddoor', 'journals_flag.hub.topics', 'tunic.capitol_0', 'reader_flag.paper2.bingo', 'photo', 'tunic.flaghouse', 'reader.paper1.next', 'directory.closeup.archivist', 'intro', 'businesscards.card_bingo.next', 'reader.paper2.bingo', 'retirement_letter', 'remove_cup', 'journals_flag.pic_0.next', 'magnify', 'coffee', 'key', 'togrampa', 'reader_flag.paper1.next', 'janitor', 'tohallway', 'chap1_finale', 'report', 'outtolunch', 'journals_flag.hub.topics_old', 'journals_flag.pic_1.next', 'reader.paper2.next', 'chap1_finale_c', 'reader_flag.paper2.next', 'door_block_talk', 'journals_flag.pic_1.bingo', 'journals_flag.pic_2.next', 'journals_flag.pic_2.bingo', 'block_magnify', 'reader.paper0.prev', 'block', 'reader_flag.paper0.prev', 'block_0', 'door_block_clean', 'reader.paper2.prev', 'reader.paper1.prev', 'doorblock', 'tocloset', 'reader_flag.paper2.prev', 'reader_flag.paper1.prev', 'block_tomap2', 'journals_flag.pic_0_old.next', 'journals_flag.pic_1_old.next', 'block_tocollection', 'block_nelson', 'journals_flag.pic_2_old.next', 'block_tomap1', 'block_badge', 'need_glasses', 'block_badge_2', 'fox', 'block_1']

# name_feature = ['basic', 'undefined', 'close', 'open', 'prev', 'next']
# event_name_feature = ['cutscene_click', 'person_click', 'navigate_click',
#        'observation_click', 'notification_click', 'object_click',
#        'object_hover', 'map_hover', 'map_click', 'checkpoint',
#        'notebook_click']
# text_lists = ['tunic.historicalsociety.cage.confrontation', 'tunic.wildlife.center.crane_ranger.crane', 'tunic.historicalsociety.frontdesk.archivist.newspaper', 'tunic.historicalsociety.entry.groupconvo', 'tunic.wildlife.center.wells.nodeer', 'tunic.historicalsociety.frontdesk.archivist.have_glass', 'tunic.drycleaner.frontdesk.worker.hub', 'tunic.historicalsociety.closet_dirty.gramps.news', 'tunic.humanecology.frontdesk.worker.intro', 'tunic.historicalsociety.frontdesk.archivist_glasses.confrontation', 'tunic.historicalsociety.basement.seescratches', 'tunic.historicalsociety.collection.cs', 'tunic.flaghouse.entry.flag_girl.hello', 'tunic.historicalsociety.collection.gramps.found', 'tunic.historicalsociety.basement.ch3start', 'tunic.historicalsociety.entry.groupconvo_flag', 'tunic.library.frontdesk.worker.hello', 'tunic.library.frontdesk.worker.wells', 'tunic.historicalsociety.collection_flag.gramps.flag', 'tunic.historicalsociety.basement.savedteddy', 'tunic.library.frontdesk.worker.nelson', 'tunic.wildlife.center.expert.removed_cup', 'tunic.library.frontdesk.worker.flag', 'tunic.historicalsociety.frontdesk.archivist.hello', 'tunic.historicalsociety.closet.gramps.intro_0_cs_0', 'tunic.historicalsociety.entry.boss.flag', 'tunic.flaghouse.entry.flag_girl.symbol', 'tunic.historicalsociety.closet_dirty.trigger_scarf', 'tunic.drycleaner.frontdesk.worker.done', 'tunic.historicalsociety.closet_dirty.what_happened', 'tunic.wildlife.center.wells.animals', 'tunic.historicalsociety.closet.teddy.intro_0_cs_0', 'tunic.historicalsociety.cage.glasses.afterteddy', 'tunic.historicalsociety.cage.teddy.trapped', 'tunic.historicalsociety.cage.unlockdoor', 'tunic.historicalsociety.stacks.journals.pic_2.bingo', 'tunic.historicalsociety.entry.wells.flag', 'tunic.humanecology.frontdesk.worker.badger', 'tunic.historicalsociety.stacks.journals_flag.pic_0.bingo', 'tunic.historicalsociety.closet.intro', 'tunic.historicalsociety.closet.retirement_letter.hub', 'tunic.historicalsociety.entry.directory.closeup.archivist', 'tunic.historicalsociety.collection.tunic.slip', 'tunic.kohlcenter.halloffame.plaque.face.date', 'tunic.historicalsociety.closet_dirty.trigger_coffee', 'tunic.drycleaner.frontdesk.logbook.page.bingo', 'tunic.library.microfiche.reader.paper2.bingo', 'tunic.kohlcenter.halloffame.togrampa', 'tunic.capitol_2.hall.boss.haveyougotit', 'tunic.wildlife.center.wells.nodeer_recap', 'tunic.historicalsociety.cage.glasses.beforeteddy', 'tunic.historicalsociety.closet_dirty.gramps.helpclean', 'tunic.wildlife.center.expert.recap', 'tunic.historicalsociety.frontdesk.archivist.have_glass_recap', 'tunic.historicalsociety.stacks.journals_flag.pic_1.bingo', 'tunic.historicalsociety.cage.lockeddoor', 'tunic.historicalsociety.stacks.journals_flag.pic_2.bingo', 'tunic.historicalsociety.collection.gramps.lost', 'tunic.historicalsociety.closet.notebook', 'tunic.historicalsociety.frontdesk.magnify', 'tunic.humanecology.frontdesk.businesscards.card_bingo.bingo', 'tunic.wildlife.center.remove_cup', 'tunic.library.frontdesk.wellsbadge.hub', 'tunic.wildlife.center.tracks.hub.deer', 'tunic.historicalsociety.frontdesk.key', 'tunic.library.microfiche.reader_flag.paper2.bingo', 'tunic.flaghouse.entry.colorbook', 'tunic.wildlife.center.coffee', 'tunic.capitol_1.hall.boss.haveyougotit', 'tunic.historicalsociety.basement.janitor', 'tunic.historicalsociety.collection_flag.gramps.recap', 'tunic.wildlife.center.wells.animals2', 'tunic.flaghouse.entry.flag_girl.symbol_recap', 'tunic.historicalsociety.closet_dirty.photo', 'tunic.historicalsociety.stacks.outtolunch', 'tunic.library.frontdesk.worker.wells_recap', 'tunic.historicalsociety.frontdesk.archivist_glasses.confrontation_recap', 'tunic.capitol_0.hall.boss.talktogramps', 'tunic.historicalsociety.closet.photo', 'tunic.historicalsociety.collection.tunic', 'tunic.historicalsociety.closet.teddy.intro_0_cs_5', 'tunic.historicalsociety.closet_dirty.gramps.archivist', 'tunic.historicalsociety.closet_dirty.door_block_talk', 'tunic.historicalsociety.entry.boss.flag_recap', 'tunic.historicalsociety.frontdesk.archivist.need_glass_0', 'tunic.historicalsociety.entry.wells.talktogramps', 'tunic.historicalsociety.frontdesk.block_magnify', 'tunic.historicalsociety.frontdesk.archivist.foundtheodora', 'tunic.historicalsociety.closet_dirty.gramps.nothing', 'tunic.historicalsociety.closet_dirty.door_block_clean', 'tunic.capitol_1.hall.boss.writeitup', 'tunic.library.frontdesk.worker.nelson_recap', 'tunic.library.frontdesk.worker.hello_short', 'tunic.historicalsociety.stacks.block', 'tunic.historicalsociety.frontdesk.archivist.need_glass_1', 'tunic.historicalsociety.entry.boss.talktogramps', 'tunic.historicalsociety.frontdesk.archivist.newspaper_recap', 'tunic.historicalsociety.entry.wells.flag_recap', 'tunic.drycleaner.frontdesk.worker.done2', 'tunic.library.frontdesk.worker.flag_recap', 'tunic.humanecology.frontdesk.block_0', 'tunic.library.frontdesk.worker.preflag', 'tunic.historicalsociety.basement.gramps.seeyalater', 'tunic.flaghouse.entry.flag_girl.hello_recap', 'tunic.historicalsociety.closet.doorblock', 'tunic.drycleaner.frontdesk.worker.takealook', 'tunic.historicalsociety.basement.gramps.whatdo', 'tunic.library.frontdesk.worker.droppedbadge', 'tunic.historicalsociety.entry.block_tomap2', 'tunic.library.frontdesk.block_nelson', 'tunic.library.microfiche.block_0', 'tunic.historicalsociety.entry.block_tocollection', 'tunic.historicalsociety.entry.block_tomap1', 'tunic.historicalsociety.collection.gramps.look_0', 'tunic.library.frontdesk.block_badge', 'tunic.historicalsociety.cage.need_glasses', 'tunic.library.frontdesk.block_badge_2', 'tunic.kohlcenter.halloffame.block_0', 'tunic.capitol_0.hall.chap1_finale_c', 'tunic.capitol_1.hall.chap2_finale_c', 'tunic.capitol_2.hall.chap4_finale_c', 'tunic.wildlife.center.fox.concern', 'tunic.drycleaner.frontdesk.block_0', 'tunic.historicalsociety.entry.gramps.hub', 'tunic.humanecology.frontdesk.block_1', 'tunic.drycleaner.frontdesk.block_1']
# room_lists = ['tunic.historicalsociety.entry', 'tunic.wildlife.center', 'tunic.historicalsociety.cage', 'tunic.library.frontdesk', 'tunic.historicalsociety.frontdesk', 'tunic.historicalsociety.stacks', 'tunic.historicalsociety.closet_dirty', 'tunic.humanecology.frontdesk', 'tunic.historicalsociety.basement', 'tunic.kohlcenter.halloffame', 'tunic.library.microfiche', 'tunic.drycleaner.frontdesk', 'tunic.historicalsociety.collection', 'tunic.historicalsociety.closet', 'tunic.flaghouse.entry', 'tunic.historicalsociety.collection_flag', 'tunic.capitol_1.hall', 'tunic.capitol_0.hall', 'tunic.capitol_2.hall']

# LEVELS = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
# level_groups = ["0-4", "5-12", "13-22"]

CATEGORICAL = [ 'event_name', 'name', 'text', 'fqid', 'room_fqid', 'text_fqid', 'text_value']
NUMERICAL = ['elapsed_time', 'room_coor_x', 'room_coor_y', 'screen_coor_x', 'screen_coor_y', 'hover_duration', 'time_diff', 'room_coor_x_diff', 'room_coor_y_diff', 'screen_coor_x_diff', 'screen_coor_y_diff']

In [6]:
def feature_engineer(df, gr):

    #selecting the group
    df = df.query(f'level_group == "{gr}"') #"0-4"

    #generating new coloumns
    df = df[['session_id', 'elapsed_time', 'event_name', 'name', 'level',
    'room_coor_x', 'room_coor_y', 'screen_coor_x', 'screen_coor_y',
    'hover_duration', 'text', 'fqid', 'room_fqid', 'text_fqid',
    'level_group']]
    df['time_diff'] = df['elapsed_time'] - df['elapsed_time'].shift(1)
    df['room_coor_x_diff'] = df['room_coor_x'] - df['room_coor_x'].shift(1)
    df['room_coor_y_diff'] = df['room_coor_y'] - df['room_coor_y'].shift(1)
    df['screen_coor_x_diff'] = df['screen_coor_x'] - df['screen_coor_x'].shift(1)
    df['screen_coor_y_diff'] = df['screen_coor_y'] - df['screen_coor_y'].shift(1)

    # text Not nan
    df['text_value'] = df['text'].isna().astype('int')

    
    # Define aggregation operations for numerical and categorical columns
    agg_numerical = {num_col: ['mean', 'median', 'std', 'sum', 'min', 'max'] for num_col in NUMERICAL}
    agg_categorical = {cat_col: ['nunique','count'] for cat_col in CATEGORICAL}  # 'lambda x:x.value_counts().index[0] if x.nunique() else None' will compute mode

    agg_dict = {**agg_numerical, **agg_categorical}

    # Perform groupby operation for ['session_id', 'level']
    df_level = df.groupby(['session_id', 'level']).agg(agg_dict)
    df_level.columns = ['_'.join(col).strip() for col in df_level.columns.values]
    df_level = df_level.fillna(-1)
    df_level = df_level.unstack('level')
    df_level.columns = ['_'.join(map(str, col)) for col in df_level.columns]

    

    # Perform groupby operation for ['session_id', 'level_group']
    df_level_group = df.groupby(['session_id']).agg(agg_dict)
    df_level_group.columns = ['_'.join(col).strip() for col in df_level_group.columns.values]
    df_level_group = df_level_group.fillna(-1)

    # Concatenate the two resulting dataframes
    df_final = pd.concat([df_level, df_level_group], axis=1)

    return df_final

In [7]:
#feature generation no split
df1_features = feature_engineer(train_df, "0-4" )
print(df1_features.shape)
df2_features = feature_engineer(train_df, "5-12" )
print(df2_features.shape)
df3_features = feature_engineer(train_df, "13-22")
print(df3_features.shape)

(23562, 480)
(23562, 720)
(23562, 880)


### Dataframe generation

# Model

In [8]:
# Fetch the unique list of user sessions in the validation dataset. We assigned 
# `session_id` as the index of our feature engineered dataset. Hence fetching 
# the unique values in the index column will give us a list of users in the 
# validation set.
VALID_USER_LIST = df1_features.index.unique()

# Create a dataframe for storing the predictions of each question for all users
# in the validation set.
# For this, the required size of the data frame is: 
# (no: of users in validation set  x no of questions).
# We will initialize all the predicted values in the data frame to zero.
# The dataframe's index column is the user `session_id`s. 
prediction_df = pd.DataFrame(data=np.zeros((len(VALID_USER_LIST),18)), index=VALID_USER_LIST)

# Create an empty dictionary to store the models created for each question.
models = {}

# Create an empty dictionary to store the evaluation score for each question.
evaluation_dict ={}

In [9]:

from tensorflow.keras.metrics import Precision, Recall

class F1Score(tf.keras.metrics.Metric):
    def __init__(self, name='f1_score', **kwargs):
        super(F1Score, self).__init__(name=name, **kwargs)
        self.precision = Precision()
        self.recall = Recall()

    def update_state(self, y_true, y_pred, sample_weight=None):
        self.precision.update_state(y_true, y_pred, sample_weight)
        self.recall.update_state(y_true, y_pred, sample_weight)

    def result(self):
        precision = self.precision.result()
        recall = self.recall.result()
        return 2 * ((precision * recall) / (precision + recall + tf.keras.backend.epsilon()))

    def reset_states(self):
        self.precision.reset_states()
        self.recall.reset_states()

In [10]:
#labels

work_dir = 'data/predict-student-performance-from-game-play/'
labels = pd.read_csv(work_dir + 'train_labels.csv')
labels['session'] = labels.session_id.apply(lambda x: int(x.split('_')[0]) )
labels['q'] = labels.session_id.apply(lambda x: int(x.split('_')[-1][1:]) )

In [12]:
  

for q_no in range(1,19):
    # USE THIS TRAIN DATA WITH THESE QUESTIONS
    if q_no<=3: 
        grp = '0-4'
        df = df1_features
        FEATURES = df1_features.columns

    elif q_no<=13: 
        grp = '5-12'
        df = df2_features
        FEATURES = df2_features.columns

    elif q_no<=22: 
        grp = '13-22'
        df = df3_features
        FEATURES = df3_features.columns
        
    print("### q_no", q_no, "grp", grp)
     
    # LABELS
    train_users = df.index.values.astype('int')
    y = labels.loc[labels.q==q_no].set_index('session').loc[train_users]['correct']
    #TRAIN DATA
    X = df[FEATURES].astype('float32')

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=7)
    
    batch_size = 1000

    train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
    train_ds = train_dataset.batch(batch_size)
        
    valid_dataset = tf.data.Dataset.from_tensor_slices((X_test, y_test))
    valid_ds = valid_dataset.batch(batch_size)

    gbtm = tfdf.keras.GradientBoostedTreesModel(verbose=0)
    gbtm.compile(metrics=["accuracy", F1Score()])

    # Train the model.
    gbtm.fit(x=train_ds)

        # Store the model
    models[f'{grp}_{q_no}'] = gbtm

    #Save the model
    

    # Evaluate the trained model on the validation dataset and store the 
    # evaluation accuracy in the `evaluation_dict`.
    inspector = gbtm.make_inspector()
    inspector.evaluation()
    evaluation = gbtm.evaluate(x=valid_ds,return_dict=True)
    evaluation_dict[q_no] = {"accuracy": evaluation["accuracy"], "f1_score": evaluation["f1_score"]}      

    # # Use the trained model to make predictions on the validation dataset and 
    # # store the predicted values in the `prediction_df` dataframe.
    # predict = gbtm.predict(x=valid_ds)
    # prediction_df.loc[valid_users, q_no-1] = predict.flatten()  


### q_no 1 grp 0-4


2023-06-07 12:04:16.011357: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:04:24.0036 CEST kernel.cc:1242] Loading model from path /tmp/tmplbr3d6tz/model/ with prefix e8ebcd9a64fd4ba3
[INFO 23-06-07 12:04:24.0053 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:04:24.0054 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:04:24.013579: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 2 grp 0-4


2023-06-07 12:04:25.553438: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:04:34.6859 CEST kernel.cc:1242] Loading model from path /tmp/tmpn2_zym_b/model/ with prefix c004011243204600
[INFO 23-06-07 12:04:34.6873 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:04:34.6873 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:04:34.695383: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 3 grp 0-4


2023-06-07 12:04:36.270453: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:04:44.2268 CEST kernel.cc:1242] Loading model from path /tmp/tmp_oohhh1m/model/ with prefix cb330779a9b141de
[INFO 23-06-07 12:04:44.2283 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:04:44.236431: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
2023-06-07 12:04:45.143958: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Exe

### q_no 4 grp 5-12


2023-06-07 12:04:45.690392: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]




[INFO 23-06-07 12:05:09.3408 CEST kernel.cc:1242] Loading model from path /tmp/tmpn80otbbl/model/ with prefix 26ac1499ad74463b
[INFO 23-06-07 12:05:09.3450 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:05:09.3451 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:05:09.357926: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]




2023-06-07 12:05:10.822621: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [4713]
	 [[{{node Placeholder/_1}}]]


### q_no 5 grp 5-12


2023-06-07 12:05:11.667175: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]




[INFO 23-06-07 12:05:27.2878 CEST kernel.cc:1242] Loading model from path /tmp/tmpvb93xq0o/model/ with prefix f088cc6bff93421a
[INFO 23-06-07 12:05:27.2907 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:05:27.2908 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:05:27.302944: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]




2023-06-07 12:05:28.799148: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [4713]
	 [[{{node Placeholder/_1}}]]


### q_no 6 grp 5-12


2023-06-07 12:05:29.669149: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:05:47.7965 CEST kernel.cc:1242] Loading model from path /tmp/tmpow0f16p_/model/ with prefix db2f4297b5e0420e
[INFO 23-06-07 12:05:47.7995 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:05:47.7996 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:05:47.811754: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 7 grp 5-12


2023-06-07 12:05:50.164535: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:06:04.5948 CEST kernel.cc:1242] Loading model from path /tmp/tmpgj5n3gzi/model/ with prefix 07b450edde4e44f0
[INFO 23-06-07 12:06:04.5970 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:06:04.5971 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:06:04.609258: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 8 grp 5-12


2023-06-07 12:06:07.002434: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:06:16.8643 CEST kernel.cc:1242] Loading model from path /tmp/tmpx0iactck/model/ with prefix 074510ae26d844d6
[INFO 23-06-07 12:06:16.8658 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:06:16.8659 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:06:16.877920: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 9 grp 5-12


2023-06-07 12:06:19.251786: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:06:34.2553 CEST kernel.cc:1242] Loading model from path /tmp/tmplm86kwji/model/ with prefix dadb0a7eab194917
[INFO 23-06-07 12:06:34.2578 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:06:34.2579 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:06:34.270054: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 10 grp 5-12


2023-06-07 12:06:36.890108: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:06:48.5877 CEST kernel.cc:1242] Loading model from path /tmp/tmpplg6y5lc/model/ with prefix 2541be474a50422a
[INFO 23-06-07 12:06:48.5896 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:06:48.5897 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:06:48.601690: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 11 grp 5-12


2023-06-07 12:06:50.959719: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:07:04.0450 CEST kernel.cc:1242] Loading model from path /tmp/tmp46nfdzdi/model/ with prefix f2e60783467341ad
[INFO 23-06-07 12:07:04.0469 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:07:04.0470 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:07:04.058394: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 12 grp 5-12


2023-06-07 12:07:06.339005: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:07:15.7472 CEST kernel.cc:1242] Loading model from path /tmp/tmpk0_g2o6j/model/ with prefix 988befe3755c4e37
[INFO 23-06-07 12:07:15.7483 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:07:15.7484 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:07:15.759673: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 13 grp 5-12


2023-06-07 12:07:18.073777: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:07:30.8140 CEST kernel.cc:1242] Loading model from path /tmp/tmpwskh33tt/model/ with prefix a232438afb384d06
[INFO 23-06-07 12:07:30.8160 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:07:30.8161 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:07:30.828046: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 14 grp 13-22


2023-06-07 12:07:33.192265: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:07:51.7331 CEST kernel.cc:1242] Loading model from path /tmp/tmp7e566bx8/model/ with prefix 552229f62d154745
[INFO 23-06-07 12:07:51.7357 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:07:51.7358 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:07:51.750750: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 15 grp 13-22


2023-06-07 12:07:54.749025: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:08:26.8739 CEST kernel.cc:1242] Loading model from path /tmp/tmpkjlguhsx/model/ with prefix 9e1cfab30b6b4153
[INFO 23-06-07 12:08:26.8788 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:08:26.8790 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:08:26.893760: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 16 grp 13-22


2023-06-07 12:08:29.931155: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:08:43.7192 CEST kernel.cc:1242] Loading model from path /tmp/tmpoj1ixtlp/model/ with prefix cefa605f82e346e4
[INFO 23-06-07 12:08:43.7207 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:08:43.7209 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:08:43.735232: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 17 grp 13-22


2023-06-07 12:08:47.076699: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:08:57.3372 CEST kernel.cc:1242] Loading model from path /tmp/tmpw317tg16/model/ with prefix 9d0a8b0e15074c1e
[INFO 23-06-07 12:08:57.3381 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:08:57.3383 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:08:57.351904: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]

### q_no 18 grp 13-22


2023-06-07 12:09:00.250704: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]]
[INFO 23-06-07 12:09:19.5494 CEST kernel.cc:1242] Loading model from path /tmp/tmp2mqjmutt/model/ with prefix 98de0443a46145ff
[INFO 23-06-07 12:09:19.5515 CEST abstract_model.cc:1311] Engine "GradientBoostedTreesQuickScorerExtended" built
[INFO 23-06-07 12:09:19.5516 CEST kernel.cc:1074] Use fast generic engine
2023-06-07 12:09:19.565323: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype int64 and shape [18849]
	 [[{{node Placeholder/_1}}]



In [25]:
import pickle


# Save the dictionary to a file in binary format
with open('models.pickle', 'wb') as f:
    pickle.dump(models, f)

In [13]:
evaluation_dict

{1: {'accuracy': 0.7475069165229797, 'f1_score': 0.8452132344245911},
 2: {'accuracy': 0.9766603112220764, 'f1_score': 0.9881897568702698},
 3: {'accuracy': 0.9325270652770996, 'f1_score': 0.9650625586509705},
 4: {'accuracy': 0.8190112709999084, 'f1_score': 0.8954015374183655},
 5: {'accuracy': 0.6301718354225159, 'f1_score': 0.6883603930473328},
 6: {'accuracy': 0.7733927369117737, 'f1_score': 0.8668992519378662},
 7: {'accuracy': 0.7358370423316956, 'f1_score': 0.8419047594070435},
 8: {'accuracy': 0.6257160902023315, 'f1_score': 0.759476363658905},
 9: {'accuracy': 0.7349883317947388, 'f1_score': 0.8398922681808472},
 10: {'accuracy': 0.6148949861526489, 'f1_score': 0.6341462731361389},
 11: {'accuracy': 0.6501166820526123, 'f1_score': 0.7674516439437866},
 12: {'accuracy': 0.8667515516281128, 'f1_score': 0.9285388588905334},
 13: {'accuracy': 0.7237428426742554, 'f1_score': 0.18318691849708557},
 14: {'accuracy': 0.7226819396018982, 'f1_score': 0.8274586200714111},
 15: {'accuracy

In [35]:
evaluation_dict

{1: {'accuracy': 0.7511139512062073, 'f1_score': 0.844984769821167},
 2: {'accuracy': 0.9755994081497192, 'f1_score': 0.9876462817192078},
 3: {'accuracy': 0.9321026802062988, 'f1_score': 0.9648274183273315},
 4: {'accuracy': 0.8251644372940063, 'f1_score': 0.8992910981178284},
 5: {'accuracy': 0.6528750061988831, 'f1_score': 0.7037304639816284},
 6: {'accuracy': 0.7848504185676575, 'f1_score': 0.8729322552680969},
 7: {'accuracy': 0.730320394039154, 'f1_score': 0.8368211388587952},
 8: {'accuracy': 0.6284744143486023, 'f1_score': 0.76347416639328},
 9: {'accuracy': 0.747082531452179, 'f1_score': 0.846431314945221},
 10: {'accuracy': 0.6244430541992188, 'f1_score': 0.6414099931716919},
 11: {'accuracy': 0.6558455228805542, 'f1_score': 0.7720628976821899},
 12: {'accuracy': 0.8650541305541992, 'f1_score': 0.9272809624671936},
 13: {'accuracy': 0.7307447195053101, 'f1_score': 0.29145723581314087},
 14: {'accuracy': 0.7281985878944397, 'f1_score': 0.8317351341247559},
 15: {'accuracy': 0.

 Let us take a look at the first 5 entries of `labels` using the following code:

# How can I configure a tree-based model?

TensorFlow Decision Forests provides good defaults for you (e.g., the top ranking hyperparameters on our benchmarks, slightly modified to run in reasonable time). If you would like to configure the learning algorithm, you will find many options you can explore to get the highest possible accuracy.

You can select a template and/or set parameters as follows:
```
rf = tfdf.keras.GradientBoostedTreesModel(hyperparameter_template="benchmark_rank1")
```

You can read more [here](https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/GradientBoostedTreesModel).

# Helper functions

In [112]:
def compute_class_weights(y_train):
    num_samples_class_0 = np.sum(y_train == 0)
    num_samples_class_1 = np.sum(y_train == 1)
    total_samples = y_train.shape[0]

    weight_for_class_0 = total_samples / (2 * num_samples_class_0)
    weight_for_class_1 = total_samples / (2 * num_samples_class_1)

    class_weights = np.where(y_train == 1, weight_for_class_1, weight_for_class_0)
    print(weight_for_class_0)
    print(weight_for_class_1)
    return class_weights



In [122]:
# from collections import Counter
# from imblearn.over_sampling import SMOTE

# def apply_smote_if_imbalanced(X_train, y_train, imbalance_threshold=0.2):
#     """
#     Apply SMOTE to the training data if the dataset is imbalanced.

#     Parameters:
#     X_train (numpy array): Feature matrix of the training data.
#     y_train (numpy array): Label vector of the training data.
#     imbalance_threshold (float): Threshold for deciding if the dataset is imbalanced. Default is 0.1.

#     Returns:
#     X_train_resampled (numpy array): Resampled feature matrix of the training data.
#     y_train_resampled (numpy array): Resampled label vector of the training data.
#     """
    
#     # Calculate class distribution
#     class_counts = Counter(y_train)
#     majority_count = max(class_counts.values())
#     minority_count = min(class_counts.values())
#     total_count = sum(class_counts.values())

#     # Calculate the imbalance ratio
#     imbalance_ratio = minority_count / majority_count

#     # Apply SMOTE if the imbalance ratio is below the threshold
#     if imbalance_ratio < imbalance_threshold:
#         smote = SMOTE(sampling_strategy='minority', k_neighbors=5, random_state=42)
#         X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)
#         print('smote applied')
#     else:
#         X_train_resampled, y_train_resampled = X_train, y_train

#     return X_train_resampled, y_train_resampled

# Submission

Here you'll use the `best_threshold` calculate in the previous cell

In [26]:
# Reference
# https://www.kaggle.com/code/philculliton/basic-submission-demo
# https://www.kaggle.com/code/cdeotte/random-forest-baseline-0-664/notebook


import jo_wilder
env = jo_wilder.make_env()
iter_test = env.iter_test()

limits = {'0-4':(1,4), '5-12':(4,14), '13-22':(14,19)}

for (test, sample_submission) in iter_test:
    test_df = feature_engineer(test)
    grp = test_df.level_group.values[0]
    a,b = limits[grp]
    for t in range(a,b):
        gbtm = models[f'{grp}_{t}']
        test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_df.loc[:, test_df.columns != 'level_group'])
        predictions = gbtm.predict(test_ds)
        mask = sample_submission.session_id.str.contains(f'q{t}')
        n_predictions = (predictions > best_threshold).astype(int)
        sample_submission.loc[mask,'correct'] = n_predictions.flatten()
    
    env.predict(sample_submission)

ModuleNotFoundError: No module named 'jo_wilder'

In [None]:
! head submission.csv