## Creating a new classifier

* Demonstrate how to create a new classifier
* We show how one can implement Online Bagging (OzaBag)

**notebook last updated on 15/12/2023**

## 0 Auxiliary function for Online Bagging

In [1]:
import random
import math

def poisson(lambd, random_generator):
    if lambd < 100.0:
        product = 1.0
        _sum = 1.0
        threshold = random_generator.random() * math.exp(lambd)
        i = 1
        max_val = max(100, 10 * math.ceil(lambd))
        while i < max_val and _sum <= threshold:
            product *= (lambd / i)
            _sum += product
            i += 1
        return i - 1
    x = lambd + math.sqrt(lambd) * random_generator.gauss(0, 1)
    if x < 0.0:
        return 0
    return int(math.floor(x))

## 1. Creating the classifier

* To create a classifier, one just need to implement the methods from ```Classifier``` from the ```learners``` module.
* In this example, we are using a ```MOA base learner``` and internally we create ```MOAClassifier``` to be added to the ensemble
* Methods to be implemented:
  * ```__init__(self, schema=None, random_seed=1, ...)```
  * ```train(self, instance)```
  * ```predict(self, instance)```
  * ```predict_proba(self, instance)```

In [2]:
from capymoa.learner.learners import Classifier, MOAClassifier
from moa.classifiers.trees import HoeffdingTree
from collections import Counter
import numpy as np

class SimpleOnlineBagging(Classifier):
    def __init__(self, schema=None, random_seed=1, ensemble_size=5, moa_base_learner_class=None):
        super().__init__(schema=schema, random_seed=random_seed)

        self.random_generator = random.Random()
        
        self.ensemble_size = ensemble_size
        self.moa_base_learner_class = moa_base_learner_class
        
        # Default base learner if None is specified
        if self.moa_base_learner_class is None:
            self.moa_base_learner_class = HoeffdingTree
        
        self.ensemble = []
        # Create several instances for the base_learners
        for i in range(self.ensemble_size): 
            self.ensemble.append(MOAClassifier(schema=self.schema, moa_learner=self.moa_base_learner_class()))
        
    def __str__(self):
        return 'SimpleOnlineBagging'

    def train(self, instance):
        for i in range(self.ensemble_size):
            k = poisson(1.0, self.random_generator)
            for _ in range(k):
                self.ensemble[i].train(instance)

    def predict(self, instance):
        predictions = []
        for i in range(self.ensemble_size):
            predictions.append(self.ensemble[i].predict(instance))
        majority_vote = Counter(predictions)
        prediction = majority_vote.most_common(1)[0][0]
        return prediction

    def predict_proba(self, instance):
        probabilities = []
        for i in range(self.ensemble_size):
            classifier_proba = self.ensemble[i].predict_proba(instance)
            classifier_proba = classifier_proba / np.sum(classifier_proba)
            probabilities.append(classifier_proba)
        avg_proba = np.mean(probabilities, axis=0)
        return avg_proba

capymoa_root: /home/antonlee/github.com/tachyonicClock/MOABridge/src/capymoa
MOA jar path location (config.ini): /home/antonlee/github.com/tachyonicClock/MOABridge/src/capymoa/jar/moa.jar
JVM Location (system): 
JAVA_HOME: /usr/lib/jvm/java-17-openjdk
JVM args: ['-Xmx8g', '-Xss10M']


Sucessfully started the JVM and added MOA jar to the class path


## 2. Using prequential evaluation

In [3]:
from capymoa.evaluation import prequential_evaluation
from capymoa.stream.stream import stream_from_file
from moa.classifiers.trees import HoeffdingAdaptiveTree

DATA_PATH = "../data/"

## Opening a file as a stream
elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+"electricity.csv")

# Creating a learner: using a hoeffding adaptive tree as the base learner
NEW_OB = SimpleOnlineBagging(schema=elec_stream.get_schema(), ensemble_size=5, moa_base_learner_class=HoeffdingAdaptiveTree)

results_NEW_OB = prequential_evaluation(stream=elec_stream, learner=NEW_OB, window_size=4500)

results_NEW_OB['cumulative'].accuracy()

82.21001059322035

## 3. Unpacking the train-test loop

In [4]:
%%time
from capymoa.stream.stream import stream_from_file
from capymoa.evaluation import ClassificationEvaluator
from moa.classifiers.trees import HoeffdingTree, HoeffdingAdaptiveTree

DATA_PATH = "../data/"

## Opening a file as a stream
elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+"electricity.csv")

# Creating a learner
NEW_OB = SimpleOnlineBagging(schema=elec_stream.get_schema(), ensemble_size=5, moa_base_learner_class=HoeffdingAdaptiveTree)

# Creating the evaluator
NEW_OB_evaluator = ClassificationEvaluator(schema=elec_stream.get_schema())

MAX_instances = 100
i = 0
while elec_stream.has_more_instances(): # and i < MAX_instances:
    instance = elec_stream.next_instance()

    prediction = NEW_OB.predict(instance)
    NEW_OB_evaluator.update(instance.y_index, prediction)
    NEW_OB.train(instance)

    i+=1

print(f'NEW OB acc: {NEW_OB_evaluator.accuracy()}')

NEW OB acc: 82.27621822033898
CPU times: user 5.12 s, sys: 14.8 ms, total: 5.14 s
Wall time: 5.63 s
