The ``TreeModel`` object can be instantiated without providing any parameter:

In [1]:
import aaanalysis as aa
tm = aa.TreeModel()

You can provide a list of tree-based models and their respective arguments using the ``list_model_classes`` and ``list_model_kwargs`` parameters:

In [2]:
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, GradientBoostingClassifier

# Classes used as default
list_model_classes = [RandomForestClassifier, ExtraTreesClassifier, GradientBoostingClassifier]
print("Default model arguments: ", tm._list_model_kwargs)

# Adjust default parameters
list_model_kwargs = [dict(n_estimators=64)] * 3
tm = aa.TreeModel(list_model_classes=list_model_classes, list_model_kwargs=list_model_kwargs)
print("New model arguments: ", tm._list_model_kwargs)

Default model arguments:  [{'random_state': None}, {'random_state': None}]
New model arguments:  [{'n_estimators': 64, 'random_state': None}, {'n_estimators': 64, 'random_state': None}, {'n_estimators': 64, 'random_state': None}]


You can set the ``random_state`` and ``verbose`` parameters: 

In [3]:
# Set random sed and disable verbosity
tm = aa.TreeModel(random_state=42, verbose=False)
print("New model arguments: ", tm._list_model_kwargs)

New model arguments:  [{'random_state': 42}, {'random_state': 42}]


You compare different feature pre-filtering strategies by utilizing the ``is_preselected`` parameter, which we will demonstrate using the ``DOM_GSEC`` example dataset and its respective feature set (see [Breimann25a]_):

In [14]:
import numpy as np
aa.options["verbose"] = False # Disable verbosity

df_seq = aa.load_dataset(name="DOM_GSEC")
labels = df_seq["label"].to_list()
df_feat = aa.load_features(name="DOM_GSEC").head(100)

# Create feature matrix
sf = aa.SequenceFeature()
df_parts = sf.get_df_parts(df_seq=df_seq)
X = sf.feature_matrix(features=df_feat["feature"], df_parts=df_parts)

# Pre-select top 10 and top 50 features
mask_top10 = np.asarray(df_feat.index < 10)
mask_top50 = np.asarray(df_feat.index < 50)

We can now compare the prediction performance for these preselected feature sets using the ``TreeModel().eval()`` method:

In [15]:
df_eval = tm.eval(X, labels=labels, list_is_selected=[np.array([mask_top10]), np.array([mask_top50])])
aa.display_df(df_eval)

Unnamed: 0,name,accuracy,precision,recall,f1
1,Set 1,0.7622,0.7699,0.7692,0.7626
2,Set 2,0.8422,0.8386,0.875,0.849
