# Training with a Searched Policy Example

In this example, we'll demonstrate how to train a model using a policy schedule.
 
We'll start by loading the data, one-hot encoding the labels in the format required by `PyBoost`, training a base classifier for initial weights, and initializing the model.

In [1]:
# imort packages
from sklearn.preprocessing import OneHotEncoder
from py_boost import GradientBoosting
from sklearn.preprocessing import OneHotEncoder
import numpy as np
import copy
from sklearn.preprocessing import PolynomialFeatures
from gcforest import *
from data.dataset import *
from sklearn.model_selection import KFold, StratifiedKFold
import random
import ray
from hyperparameters import *

## load data
We have encapsulated the calls to all the datasets used in our experiments for simplicity. Due to space constraints, we have included only two datasets: `adult` and `kdd`. You can easily call them like this:

In [2]:
# get data
X_train, y_train, X_test, y_test=get_data('adult')

## initialize model 
The initialization of the model consists of two steps. First, we perform one-hot encoding of the labels for use by the base classifier `PyBoost`. Then, we train a PyBoost model to obtain the necessary weight values for initialization.

In [3]:
# one-hot encoding
one_hot = OneHotEncoder(sparse_output=False).fit(y_train.reshape(-1,1))
# get weights
clf = GradientBoosting('bce', lr=0.3, colsample=0.2, verbose=-1)
clf.fit(X_train, one_hot.transform(y_train.reshape(-1,1)))
weights = clf.get_feature_importance()
weights = [w / sum(weights) for w in weights]

## train model
To train the model, we assume that we have already obtained the boosting parameters for each layer during model initialization. We then invoke the `fit` function for training. It's important to note that our `fit`` function does not support the use of an evaluation set.

In [4]:
# searched policy schedule
aug_policy_schedule = aug_schedule['adult']
# initial gcForest with aug
df = gcForest(encoder=one_hot, weights=weights, classifier='sketch',means=np.mean(X_train, axis=0),std=np.std(X_train, axis=0), random_state=42, num_estimator=100, num_forests=1, max_features=0.2, num_classes=2, n_fold=5, max_layer=15, aug_policy_schedule=aug_policy_schedule,aug_type='cutmix')
# fit gcForest
df.fit(X_train, y_train)

layer index:0
val  acc:87.04278124136236 
layer index:1
val  acc:86.30570314179539 
layer index:2
val  acc:86.4531187617088 
layer index:3
val  acc:86.86465403396701 
layer index:4
val  acc:86.79401738275851 
layer index:5
val  acc:86.7295230490464 
layer index:6
val  acc:86.71416725530543 
layer index:7
val  acc:86.99978501888762 
layer index:8
val  acc:86.90457909769356 
layer index:9
val  acc:86.32720125303277 
layer index:10
val  acc:87.14412948005283 
layer index:11
val  acc:87.21476613126133 
layer index:12
val  acc:86.97521574890206 
layer index:13
val  acc:86.9506464789165 
layer index:14
val  acc:86.85851171647062 


## Making Predictions

You can use the model for predictions using the `predict` and `predict_proba` functions, similar to Scikit-Learn:

In [5]:
# predict 
y_pred=df.predict(X_test)
y_pred_proba=df.predict_proba(X_test)

## Evaluating the Model

To evaluate the model's performance, you can use the custom `score` function, which provides accuracy metrics for each layer and the overall ensemble accuracy. For personal use, you can obtain raw outputs using the predict_proba function to utilize other evaluation criteria.


In [6]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.8758675757017382


In [7]:
# or you can use this to see the detail accuracy
df.score(X_test, y_test)

layer  0
test acc:  87.28579325594251
layer  1
test acc:  87.21822983846201
layer  2
test acc:  87.21822983846201
ensemble acc:  87.26122474049505
layer  3
test acc:  87.12609790553407
ensemble acc:  87.26736686935692
layer  4
test acc:  87.22437196732388
ensemble acc:  87.33493028683742
layer  5
test acc:  87.42706221976538
ensemble acc:  87.40249370431792
layer  6
test acc:  87.623610343345
ensemble acc:  87.44548860635096
layer  7
test acc:  87.60518395675942
ensemble acc:  87.41477796204164
layer  8
test acc:  87.48234137952214
ensemble acc:  87.45777286407468
layer  9
test acc:  87.48848350838401
ensemble acc:  87.5069098949696
layer  10
test acc:  87.63589460106873
ensemble acc:  87.54376266814077
layer  11
test acc:  87.46391499293655
ensemble acc:  87.54990479700264
layer  12
test acc:  87.53147841041705
ensemble acc:  87.54376266814077
layer  13
test acc:  87.53762053927892
ensemble acc:  87.56833118358823
layer  14
test acc:  87.6420367299306
ensemble acc:  87.58675757017383


array([0, 0, 0, ..., 1, 0, 1])

## grid search
The following demonstrates how to perform `grid search` for each layer. Grid search helps us find the 'optimal' augmentation policy for the current layer. However, it's important to note that a greedy search is very time-consuming, and may not necessarily yield the overall best result in the end.

We have implemented two logically equivalent versions, differ in whether they utilize the parallel computing library `ray` to accelerate the policy schedule learning procedure (you can control it with the parameter 'ray'). The parallel computation approach is experimental, and we have observed that it may occasionally cause some threads to terminate prematurely, thereby affecting the results. We warmly welcome contributions to enhance the efficiency and stability of the parallel computing implementation

In [8]:
# policy schedule learning with grid search for one layer (no parallel computation)
# time consumed: 2m 25.8s
df = gcForest(encoder=one_hot, weights=weights, classifier='sketch', means=np.mean(X_train, axis=0),std=np.std(X_train, axis=0), random_state=42, num_estimator=100, num_forests=1, max_features=0.2, num_classes=2, n_fold=5, max_layer=15, aug_type='cutmix')
df.load_data(X_train,y_train) 
df.get_best_policy(X_train,y_train)


--------------
layer 0,   X_train shape:(32561, 14)...
 
cutmix   0   0.1   87.23933540124689  
cutmix   0   0.2   87.23933540124689  
cutmix   0   0.3   87.23933540124689  
cutmix   0   0.4   87.23933540124689  
cutmix   0   0.5   87.23933540124689  
cutmix   0   0.6   87.23933540124689  
cutmix   0   0.7   87.23933540124689  
cutmix   0.05   0.2   87.20555265501673  
cutmix   0.05   0.3   87.16869875003839  
cutmix   0.05   0.4   87.20248149626855  
cutmix   0.05   0.5   87.18098338503117  
cutmix   0.05   0.6   87.24854887749149  
cutmix   0.05   0.7   87.23933540124689  
cutmix   0.1   0.2   87.0888486225853  
cutmix   0.1   0.3   87.14105832130463  
cutmix   0.1   0.4   87.29768741746261  
cutmix   0.1   0.5   87.19941033752035  
cutmix   0.1   0.6   87.21169497251313  
cutmix   0.1   0.7   87.23012192500231  
cutmix   0.2   0.4   87.12877368631185  
cutmix   0.2   0.5   87.18098338503117  
cutmix   0.2   0.6   87.03049660636958  
cutmix   0.2   0.7   87.13184484506003  
cutmix  

('cutmix', 0.1, 0.4)

In [9]:
# policy schedule learning with grid search for one layer (parallel computation with ray)
# time consumed: 2m 4.0s
df = gcForest(encoder=one_hot, weights=weights, classifier='sketch', means=np.mean(X_train, axis=0), std=np.std(X_train, axis=0), random_state=42, num_estimator=100, num_forests=1, max_features=0.2, num_classes=2, n_fold=5, max_layer=15, aug_type='cutmix',ray=True)
df.load_data(X_train,y_train)
df.get_best_policy(X_train,y_train)


--------------
layer 0,   X_train shape:(32561, 14)...
 


2023-10-10 11:16:09,646	INFO worker.py:1621 -- Started a local Ray instance.


cutmix   0   0.1   87.24240655999509  
cutmix   0   0.2   87.23933540124689  
cutmix   0   0.3   87.23933540124689  
cutmix   0   0.4   87.23933540124689  
cutmix   0   0.5   87.23933540124689  
cutmix   0   0.6   87.23933540124689  
cutmix   0   0.7   87.23933540124689  
cutmix   0.05   0.2   87.27004698872885  
cutmix   0.05   0.3   87.20248149626855  
cutmix   0.05   0.4   87.26697582998065  
cutmix   0.05   0.5   87.18405454377937  
cutmix   0.05   0.6   87.14720063880101  
cutmix   0.05   0.7   87.11956021006726  
cutmix   0.1   0.2   87.26083351248425  
cutmix   0.1   0.3   87.19941033752035  
cutmix   0.1   0.4   87.16562759129019  
cutmix   0.1   0.5   87.20248149626855  
cutmix   0.1   0.6   87.22397960750591  
cutmix   0.1   0.7   87.3007585762108  
cutmix   0.2   0.4   87.19941033752035  
cutmix   0.2   0.5   87.05506587635516  
cutmix   0.2   0.6   87.27311814747705  
cutmix   0.2   0.7   87.20862381376493  
cutmix   0.3   0.4   87.12877368631185  
cutmix   0.3   0.5   86.8

('cutmix', 0.1, 0.7)

# augDF
The following code demonstrates how to easily invoke `augDF` for the complete training process.

In [14]:
from aug import aug_DF
use_ray=False
if use_ray:
    ray.init(ignore_reinit_error=True)
aug=aug_DF(classifier='sketch', num_estimator=100, max_features=0.2, num_classes=2, max_layer=15, gpu_id = 0, aug_type='cutmix', ray=use_ray)
aug_policy_schedule = aug.fit(X_train, y_train)
if use_ray:
    ray.shutdown()


--------------
layer 0,   X_train shape:(32561, 14)...
 
cutmix   0   0.1   87.23933540124689  
cutmix   0   0.2   87.23933540124689  
cutmix   0   0.3   87.23933540124689  
cutmix   0   0.4   87.23933540124689  
cutmix   0   0.5   87.23933540124689  
cutmix   0   0.6   87.23933540124689  
cutmix   0   0.7   87.23933540124689  
cutmix   0.05   0.2   87.20555265501673  
cutmix   0.05   0.3   87.17791222628297  
cutmix   0.05   0.4   87.15027179754922  
cutmix   0.05   0.5   87.23933540124689  
cutmix   0.05   0.6   87.29768741746261  
cutmix   0.05   0.7   87.24854887749149  
cutmix   0.1   0.2   87.18712570252757  
cutmix   0.1   0.3   87.19941033752035  
cutmix   0.1   0.4   87.31304321120359  
cutmix   0.1   0.5   87.24240655999509  
cutmix   0.1   0.6   87.27004698872885  
cutmix   0.1   0.7   87.16869875003839  
cutmix   0.2   0.4   86.83087128773687  
cutmix   0.2   0.5   87.10727557507447  
cutmix   0.2   0.6   87.06120819385154  
cutmix   0.2   0.7   87.11648905131906  
cutmix 

In [15]:
print(aug.predict_proba(X_test))

[[0.9309114  0.00242194]
 [0.72167576 0.21165758]
 [0.60229218 0.33104114]
 ...
 [0.18181314 0.75152021]
 [0.89387434 0.03945898]
 [0.28139586 0.65193746]]


In [16]:
aug.score(X_test, y_test)

layer  0
test acc:  87.11381364781033
layer  1
test acc:  87.34107241569929
layer  2
test acc:  87.47005712179842
ensemble acc:  87.51305202383146
layer  3
test acc:  87.50076776610773
ensemble acc:  87.53147841041705
layer  4
test acc:  87.47005712179842
ensemble acc:  87.5069098949696
layer  5
test acc:  87.51305202383146
ensemble acc:  87.50076776610773
layer  6
test acc:  87.54990479700264
ensemble acc:  87.56833118358823
layer  7
test acc:  87.40863583317979
ensemble acc:  87.58061544131196
layer  8
test acc:  87.53147841041705
ensemble acc:  87.56833118358823
layer  9
test acc:  87.47619925066029
ensemble acc:  87.58675757017383
layer  10
test acc:  87.53147841041705
ensemble acc:  87.53762053927892
layer  11
test acc:  87.46391499293655
ensemble acc:  87.5560469258645
layer  12
test acc:  87.47005712179842
ensemble acc:  87.5744733124501
layer  13
test acc:  87.56833118358823
ensemble acc:  87.5560469258645
layer  14
test acc:  87.2550826116332
ensemble acc:  87.54990479700264


array([0, 0, 0, ..., 1, 0, 1])

In [17]:
aug.get_searched_policy()

[('cutmix', 0, 0.4),
 ('cutmix', 0, 0.4),
 ('cutmix', 0, 0.2),
 ('cutmix', 0, 0.2),
 ('cutmix', 0, 0.2),
 ('cutmix', 0, 0.6),
 ('cutmix', 0.05, 0.7),
 ('cutmix', 0.05, 0.7),
 ('cutmix', 0.05, 0.3),
 ('cutmix', 0.05, 0.3),
 ('cutmix', 0.05, 0.3),
 ('cutmix', 0, 0.5),
 ('cutmix', 0, 0.5),
 ('cutmix', 0, 0.4),
 ('cutmix', 0, 0.2)]