First, we import the flavor we want to use (here we use online - AutoProfileSemiSupervisedPdMExperiment)

Other options:
* Incremental (IncrementalSemiSupervisedPdMExperiment)
* Unsupervised (UnsupervisedPdMExperiment)
* Semisupervised with historical data (SemiSupervisedPdMExperiment)

Each flavor comes with flavor-dependent constraints (you can also define a custom constraint for your use case) that ensure that the hyperparameter optimization performed obbeys real-world  use case constraints.

In [None]:
from experiment.batch.auto_profile_semi_supervised_experiment import AutoProfileSemiSupervisedPdMExperiment
from constraint_functions.constraint import auto_profile_max_wait_time_constraint

Then, we import the pipeline and the method we want to use (here we use OCSVM).

NOTE: The supervision of the method you choose should be matched with the flavor you chose in the previous step (some methods cannot be applied in an unsupervised way for example). Semisupervised flavors (online, incremental, semi with historic) expect a method that inherits the SemiSupervisedMethodInterface, while the unsupervised flavor expects a method that inherits the UnsupervisedMethodInterface.

In [2]:
from pipeline.pipeline import PdMPipeline
from method.ocsvm import OneClassSVM

Next, we import the dataset of our choice using the loadDataset method from utils (we will provide another example for using your own dataset). 

Here, we will use the IMS dataset that consists of 3 multivariate time-series each one generated from different source (i.e. machine).

In [3]:
from utils import loadDataset

dataset = loadDataset.get_dataset("ims")

For instantiating our pipeline, apart from the **method** and the **dataset** we also need a **preprocessor**, a **postprocessor** and a **thresholder**. The **preprocessor** applies some kind of transformation before the anomaly scores are generated, the **postprocessor** applies some kind of transformation on the anomaly scores the respective method produces and the **thresholder** applies a thresholding scheme to the postprocessed scores.

Here we use the default ones for the preprocessing and postprocessing step (that do not apply anything to their input - you can also remove them from the dictionary and they will be automatically selected) and the constant thresholder for the thresholing step.

In [4]:
from preprocessing.record_level.default import DefaultPreProcessor
from postprocessing.default import DefaultPostProcessor
from thresholding.constant import ConstantThresholder

my_pipeline = PdMPipeline(
    steps={
        'preprocessor': DefaultPreProcessor,
        'method': OneClassSVM,
        'postprocessor': DefaultPostProcessor,
        'thresholder': ConstantThresholder,
    },
    dataset=dataset,
    auc_resolution=100
)

Next, we define the method parameter search space that the optimizer will search through. For each parameter that corresponds to a specific step (preprocessor, method, postprocessor, thresholding) a prefix must be specified ("preprocessor", "method", "postprocessor", "thresholer" respectively).

In [5]:
param_space = {
    'method_kernel': ['linear', 'rbf', 'poly', 'sigmoid'],
    'method_nu': [0.01, 0.05, 0.1, 0.15, 0.2, 0.5],
    'method_gamma': ['scale', 'auto'],
    'method_degree': [2, 3, 4, 5],
}

We also add the flavor parameters that will be optimized ('profile_size' for online and 'initial_incremental_window_length', 'incremental_window_length' and 'incremental_slide' for incremental. Unsupervised and semi with historical data do not expect any flavor-dependent parameters).

In [6]:
param_space['profile_size'] = [50, 100, 150, 200]

Finally, we define our experiment which performs hyperparameter tuning using Bayesian optimization with Mango (https://github.com/ARM-software/mango). We define our experiment name and pass the previously created parameter space and pipeline. Also, we pass the flavor constraint function. For optimizing with Mango we must set 4 more parameters. The first one is the optimization objective based on which the hyperparameters will be determined. The rest 3 are num_iteration, n_jobs and initial_random. initial_random defines the number of parameter configurations that will be executed before Mango starts applying its Bayesian optimization algorithm, n_jobs defines the number of jobs that will be run in parallel and num_iteration defines the number of rounds that will be executed during optimization. The final number of configurations tested will be `n_jobs x num_iteration  + initial_random` (although initial_random is a suggestion to the optimizer and in somecases more or less configurations than that will be tested).

Other options for optimization_param:
* AD1_AUC	
* AD1_f1	
* AD1_rcl	
* AD2_AUC	
* AD2_f1	
* AD2_rcl	
* AD3_AUC	
* AD3_f1	
* AD3_rcl	
* VUS_AUC_PR
* VUS_AUC_ROC
* VUS_Affiliation_Precision
* VUS_Affiliation_Recall
* VUS_F
* VUS_Precision
* VUS_Precision_at_k
* VUS_RF
* VUS_R_AUC_PR
* VUS_R_AUC_ROC
* VUS_Recall
* VUS_Rprecision
* VUS_Rrecall
* VUS_VUS_PR
* VUS_VUS_ROC

In [7]:
my_experiment = AutoProfileSemiSupervisedPdMExperiment(
    experiment_name='my first experiment',
    pipeline=my_pipeline,
    param_space=param_space,
    num_iteration=4,
    n_jobs=4,
    initial_random=4,
    constraint_function=auto_profile_max_wait_time_constraint(my_pipeline),
    debug=True,
    optimization_param='VUS_AUC_PR'
)

rm -f main.o evaluator.o evaluate
c++ -fPIC -Wall -std=c++11 -O2 -g   -c -o main.o main.cpp
c++ -fPIC -Wall -std=c++11 -O2 -g   -c -o evaluator.o evaluator.cpp
c++ -fPIC -Wall -std=c++11 -O2 -g   -o evaluate main.o evaluator.o


In [8]:
best_params = my_experiment.execute()
print(best_params)



{'nu': 0.1, 'kernel': 'linear', 'gamma': 'auto', 'degree': 2, 'profile_size': 150}
{'nu': 0.05, 'kernel': 'linear', 'gamma': 'auto', 'degree': 2, 'profile_size': 150}
{'nu': 0.01, 'kernel': 'poly', 'gamma': 'scale', 'degree': 4, 'profile_size': 200}
{}
{}
{}
{'nu': 0.05, 'kernel': 'sigmoid', 'gamma': 'scale', 'degree': 3, 'profile_size': 150}
{}


  0%|          | 0/4 [00:00<?, ?it/s]

{'degree': 2, 'gamma': 'scale', 'kernel': 'poly', 'nu': 0.5, 'profile_size': 200}
{}
{'degree': 5, 'gamma': 'auto', 'kernel': 'rbf', 'nu': 0.05, 'profile_size': 50}
{}
{'degree': 5, 'gamma': 'scale', 'kernel': 'rbf', 'nu': 0.15, 'profile_size': 150}
{}
{'degree': 2, 'gamma': 'scale', 'kernel': 'linear', 'nu': 0.5, 'profile_size': 50}
{}


Best score: 0.28408692041229566:  25%|██▌       | 1/4 [00:08<00:26,  8.71s/it]

{'degree': 2, 'gamma': 'auto', 'kernel': 'rbf', 'nu': 0.01, 'profile_size': 100}
{}
{'degree': 4, 'gamma': 'scale', 'kernel': 'sigmoid', 'nu': 0.1, 'profile_size': 150}
{}
{'degree': 3, 'gamma': 'scale', 'kernel': 'linear', 'nu': 0.5, 'profile_size': 150}
{}
{'degree': 3, 'gamma': 'auto', 'kernel': 'rbf', 'nu': 0.2, 'profile_size': 150}
{}


Best score: 0.34775647653517366:  50%|█████     | 2/4 [00:17<00:17,  8.51s/it]

{'degree': 4, 'gamma': 'scale', 'kernel': 'sigmoid', 'nu': 0.01, 'profile_size': 150}
{}
{'degree': 2, 'gamma': 'scale', 'kernel': 'sigmoid', 'nu': 0.15, 'profile_size': 200}
{}
{'degree': 3, 'gamma': 'auto', 'kernel': 'sigmoid', 'nu': 0.15, 'profile_size': 50}
{}
{'degree': 2, 'gamma': 'auto', 'kernel': 'linear', 'nu': 0.01, 'profile_size': 200}
{}


Best score: 0.34775647653517366:  75%|███████▌  | 3/4 [00:25<00:08,  8.44s/it]

{'degree': 5, 'gamma': 'scale', 'kernel': 'sigmoid', 'nu': 0.15, 'profile_size': 200}
{'degree': 2, 'gamma': 'auto', 'kernel': 'rbf', 'nu': 0.1, 'profile_size': 150}
{}
{}
{'degree': 5, 'gamma': 'scale', 'kernel': 'sigmoid', 'nu': 0.15, 'profile_size': 100}
{}
{'degree': 3, 'gamma': 'auto', 'kernel': 'linear', 'nu': 0.5, 'profile_size': 50}
{}


Best score: 0.34775647653517366: 100%|██████████| 4/4 [00:33<00:00,  8.45s/it]

{'method_degree': 3, 'method_gamma': 'auto', 'method_kernel': 'rbf', 'method_nu': 0.2, 'profile_size': 150}



