# Learners API Examples

* ```MOAClassifier``` and ```MOARegressor```
* ```AdaptiveRandomForest```, ```OnlineBagging``` and ```AdaptiveRandomForestRegressor```
* ```ClassificationEvaluator```, ```ClassificationWindowedEvaluator```, ```RegressionEvaluator``` and ```RegressionWindowedEvaluator```

Some comments:

* We can use the ```MOAClassifier``` and ```MOARegressor``` to execute any MOA classifier and regressor, respectively.
* We can create wrappers for each class, like the AdaptiveRandomForest wrapper. Using an alias in here as we don't want to conflict with the MOA AdaptiveRandomForest. ```from classifiers import AdaptiveRandomForest as ARF```

**Notebook last update: 17/02/2024**

In [1]:
import pandas as pd
# CapyMOA code imports
from capymoa.evaluation import *
from capymoa.stream import *
from capymoa.learner.classifier import OnlineBagging, AdaptiveRandomForest
from capymoa.learner.regressor import AdaptiveRandomForestRegressor
from capymoa.learner import MOAClassifier, MOARegressor

# MOA imports
from moa.core import Example, Utils
from moa.streams.generators import RandomTreeGenerator as MOA_RTG
from moa.streams import ArffFileStream
# For regression tests
from moa.classifiers.trees import FIMTDD
from moa.evaluation import BasicRegressionPerformanceEvaluator

# For classification tests
from moa.classifiers.trees import HoeffdingAdaptiveTree



arff_fried_path = '../data/fried.arff'


capymoa_root: /home/antonlee/github.com/tachyonicClock/MOABridge/src/capymoa
MOA jar path location (config.ini): /home/antonlee/github.com/tachyonicClock/MOABridge/src/capymoa/jar/moa.jar
JVM Location (system): 
JAVA_HOME: /usr/lib/jvm/java-17-openjdk
JVM args: ['-Xmx8g', '-Xss10M']


Sucessfully started the JVM and added MOA jar to the class path


## MOARegressor

In [2]:
maxInstancesToProcess = 50000
sampleFrequency = 10000
instancesProcessed = 1

# stream = Stream(moa_stream=ArffFileStream(arff_fried_path, -1))
stream = ARFFStream(path=arff_fried_path)
learner = MOARegressor(schema=stream.get_schema(), moa_learner=FIMTDD())

evaluator_TTT = RegressionEvaluator(schema=stream.get_schema(), window_size=sampleFrequency)
evaluator_windowed = RegressionWindowedEvaluator(schema=stream.get_schema(), window_size=sampleFrequency)

while stream.has_more_instances() and instancesProcessed <= maxInstancesToProcess:
    instance = stream.next_instance()

    prediction = learner.predict(instance)
    evaluator_TTT.update(instance.y_value, prediction)
    evaluator_windowed.update(instance.y_value, prediction)
    learner.train(instance)
    
    instancesProcessed += 1

print(evaluator_TTT.adjusted_R2())
evaluator_windowed.metrics_per_window()

0.6996453058512179


Unnamed: 0,classified instances,mean absolute error,root mean squared error,relative mean absolute error,relative root mean squared error,coefficient of determination,adjusted coefficient of determination
0,10000.0,2.845125,3.872532,0.702732,0.77799,0.394731,0.394064
1,10000.0,1.891743,2.423392,0.465342,0.484313,0.765441,0.765183
2,10000.0,1.744441,2.227354,0.432927,0.449547,0.797907,0.797685
3,10000.0,1.649495,2.106839,0.40025,0.416741,0.826327,0.826136


## AdaptiveRandomForest (ARF) wrapper example

In [3]:
maxInstancesToProcess = 5000
sampleFrequency = 1000
instancesProcessed = 1

stream = RandomTreeGenerator()
learner = AdaptiveRandomForest(schema=stream.get_schema(), ensemble_size=5, max_features=0.6)

evaluator_TTT = ClassificationEvaluator(schema=stream.get_schema(), window_size=sampleFrequency)
evaluator_windowed = ClassificationWindowedEvaluator(schema=stream.get_schema(), window_size=sampleFrequency)

while stream.has_more_instances() and instancesProcessed <= maxInstancesToProcess:
    instance = stream.next_instance()

    prediction = learner.predict(instance)

    # evaluator.addResult(instance, prediction)
    evaluator_TTT.update(instance.y_label,prediction)
    evaluator_windowed.update(instance.y_label,prediction)
    learner.train(instance)
    
    instancesProcessed += 1

print(evaluator_TTT.accuracy())
evaluator_windowed.metrics_per_window()

77.24


Unnamed: 0,classified instances,classifications correct (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent),Kappa M Statistic (percent),F1 Score (percent),F1 Score for class 0 (percent),F1 Score for class 1 (percent),Precision (percent),Precision for class 0 (percent),Precision for class 1 (percent),Recall (percent),Recall for class 0 (percent),Recall for class 1 (percent)
0,1000.0,70.5,32.261768,38.669439,21.122995,67.002956,52.188006,78.669559,69.148368,66.528926,71.76781,64.986667,42.933333,87.04
1,2000.0,75.8,47.834035,44.748858,40.09901,74.415527,66.201117,81.152648,75.844141,75.961538,75.726744,73.039737,58.663366,87.416107
2,3000.0,78.5,54.505442,55.208333,47.303922,77.456566,71.673254,82.675262,78.268781,77.492877,79.044684,76.661036,66.666667,86.655405
3,4000.0,79.8,57.780685,57.651992,51.789976,79.064603,74.300254,83.360791,79.750421,79.564033,79.936809,78.39048,69.689737,87.091222
4,5000.0,81.6,60.958221,59.825328,54.114713,80.627444,75.661376,85.209003,81.366961,80.56338,82.170543,79.901249,71.321696,88.480801


## Using the ```ClassificationEvaluator``` and ```ClassificationWindowedEvaluator``` with the ```MOA AdaptiveRandomForest```
* This example uses a MOA generator directly, the ```MOA_RTG()``` i.e. ```moa.streams.generators.RandomTreeGenerator```

In [4]:
from moa.classifiers.meta import AdaptiveRandomForest as MOA_ARF

maxInstancesToProcess = 10000
instancesProcessed = 1

learner = MOA_ARF()
# Setting parameters using setViaCLIString
learner.getOptions().setViaCLIString("-s 5")
learner.prepareForUse()

rtg = MOA_RTG()
# Setting parameters using setViaCLIString
rtg.getOptions().setViaCLIString("-c 3 -u 10 -o 0")
rtg.prepareForUse()

evaluator = ClassificationWindowedEvaluator(schema=Schema(labels=[0, 1, 2]), window_size=100, recall_per_class=True)
evaluator_TTT = ClassificationEvaluator(schema=Schema(moa_header=rtg.getHeader()), recall_per_class=True)


# learner.setModelContext(rtg.getHeader())

while rtg.hasMoreInstances() and instancesProcessed <= maxInstancesToProcess:
    instance = rtg.nextInstance()

    prediction = learner.getVotesForInstance(instance)

    evaluator.update(int(instance.getData().classValue()),
                     int(Utils.maxIndex(prediction)))

    # The evaluator is not supposed to be used by raw MOA objects like this, that is why it has such weird syntax. 
    # If we investigate the values of the class labels for RTG we will see that they are actually strings and 
    # update() takes on the actual values (not indexes)
    evaluator_TTT.update(evaluator_TTT.schema.get_value_for_index(int(instance.getData().classValue())), 
                         evaluator_TTT.schema.get_value_for_index(int(Utils.maxIndex(prediction))))
    
    learner.trainOnInstance(instance)
    instancesProcessed += 1

print(f"Test-Then-Train accuracy = {evaluator_TTT.accuracy()}")
evaluator.metrics_per_window()

Test-Then-Train accuracy = 90.38000000000001


Unnamed: 0,classified instances,classifications correct (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent),Kappa M Statistic (percent),F1 Score (percent),F1 Score for class 0 (percent),F1 Score for class 1 (percent),F1 Score for class 2 (percent),Precision (percent),Precision for class 0 (percent),Precision for class 1 (percent),Precision for class 2 (percent),Recall (percent),Recall for class 0 (percent),Recall for class 1 (percent),Recall for class 2 (percent)
0,100.0,75.0,51.399689,56.896552,34.210526,51.788526,81.666667,73.239437,,51.011029,76.562500,76.470588,0.000000,52.590090,87.500000,70.270270,0.000000
1,200.0,77.0,56.340167,58.181818,52.083333,,81.034483,80.000000,,,73.437500,83.333333,,55.769231,90.384615,76.923077,0.000000
2,300.0,79.0,61.671838,67.187500,54.347826,71.516630,86.238532,78.481013,16.666667,85.303030,85.454545,70.454545,100.000000,61.566458,87.037037,88.571429,9.090909
3,400.0,85.0,73.661106,73.684211,71.698113,73.776414,90.909091,87.058824,37.500000,77.528324,86.538462,86.046512,60.000000,70.370882,95.744681,88.095238,27.272727
4,500.0,83.0,70.164970,73.846154,64.583333,75.074313,88.679245,82.666667,52.631579,77.038661,87.037037,81.578947,62.500000,73.207648,90.384615,83.783784,45.454545
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,9600.0,94.0,89.021043,87.755102,85.365854,91.333254,95.726496,93.750000,84.210526,92.116568,96.551724,90.909091,88.888889,90.563149,94.915254,96.774194,80.000000
96,9700.0,97.0,94.773519,94.642857,94.545455,93.130108,97.872340,98.876404,82.352941,94.444444,95.833333,100.000000,87.500000,91.851852,100.000000,97.777778,77.777778
97,9800.0,86.0,77.437550,79.411765,70.833333,83.024429,86.597938,95.522388,66.666667,82.212415,89.361702,94.117647,63.157895,83.852644,84.000000,96.969697,70.588235
98,9900.0,94.0,90.302247,90.163934,89.473684,92.131403,95.454545,96.296296,83.870968,93.770809,93.333333,95.121951,92.857143,90.548336,97.674419,97.500000,76.470588


## Using two MOA Objects with the framework evaluators
* The ```MOAClassifier``` wrapper is used to wrap around the MOA learner ```HoeffdingAdaptiveTree()```
* The ```Instance``` wrapper is used to wrap around the MOA instance returned from ```rtg.nextInstance()```
* The evaluators are initialised with different ways of setting the ```Schema``` just to demonstrate flexibility
  * ```schema=Schema(labels=[0,1,2])```: when we know what are the class label values
  * ```schema=Schema(moa_header=rtg.getHeader())```: when we are using a moa generator

In [5]:
from capymoa.stream.instance import _JavaLabeledInstance

rtg = MOA_RTG()
# Setting parameters using setViaCLIString
rtg.getOptions().setViaCLIString("-c 3 -u 10 -o 0")
rtg.prepareForUse()

learner = MOAClassifier(schema=Schema(moa_header=rtg.getHeader()), random_seed=1, moa_learner=HoeffdingAdaptiveTree())

# Shows the CLI help, basically what each char in the CLI mean
print(learner.CLI_help())

evaluator = ClassificationWindowedEvaluator(schema=Schema(moa_header=rtg.getHeader()), window_size=100, recall_per_class=True)
evaluator_TTT = ClassificationEvaluator(schema=Schema(moa_header=rtg.getHeader()), recall_per_class=True)

maxInstancesToProcess = 1000
instancesProcessed = 1
while rtg.hasMoreInstances() and instancesProcessed <= maxInstancesToProcess:
    instance = _JavaLabeledInstance(Schema(moa_header=rtg.getHeader()), rtg.nextInstance())
    prediction = learner.predict(instance)
    
    evaluator.update(instance.y_label, prediction)
    evaluator_TTT.update(instance.y_label, prediction)
    
    learner.train(instance)
    instancesProcessed += 1

print(f"Test-Then-Train accuracy = {evaluator_TTT.accuracy()}")
evaluator.metrics_per_window()

-m maxByteSize (default: 33554432)
Maximum memory consumed by the tree.
-n numericEstimator (default: GaussianNumericAttributeClassObserver)
Numeric estimator to use.
-d nominalEstimator (default: NominalAttributeClassObserver)
Nominal estimator to use.
-e memoryEstimatePeriod (default: 1000000)
How many instances between memory consumption checks.
-g gracePeriod (default: 200)
The number of instances a leaf should observe between split attempts.
-s splitCriterion (default: InfoGainSplitCriterion)
Split criterion to use.
-c splitConfidence (default: 1.0E-7)
The allowable error in split decision, values closer to 0 will take longer to decide.
-t tieThreshold (default: 0.05)
Threshold below which a split will be forced to break ties.
-b binarySplits
Only allow binary splits.
-z stopMemManagement
Stop growing as soon as memory limit is hit.
-r removePoorAtts
Disable poor attributes.
-p noPrePrune
Disable pre-pruning.
-l leafprediction (default: NBAdaptive)
Leaf prediction to use.
-q nbThr

Unnamed: 0,classified instances,classifications correct (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent),Kappa M Statistic (percent),F1 Score (percent),F1 Score for class 0 (percent),F1 Score for class 1 (percent),F1 Score for class 2 (percent),Precision (percent),Precision for class 0 (percent),Precision for class 1 (percent),Precision for class 2 (percent),Recall (percent),Recall for class 0 (percent),Recall for class 1 (percent),Recall for class 2 (percent)
0,100.0,75.0,52.033768,56.896552,34.210526,66.106274,79.661017,72.222222,40.0,72.252944,75.806452,74.285714,66.666667,60.923423,83.928571,70.27027,28.571429
1,200.0,75.0,53.789279,54.545455,47.916667,59.686268,80.357143,76.315789,16.666667,62.237237,75.0,78.378378,33.333333,57.336182,86.538462,74.358974,11.111111
2,300.0,66.0,40.382255,46.875,26.086957,52.91628,75.247525,64.285714,13.333333,53.651035,80.851064,55.102041,25.0,52.201379,70.37037,77.142857,9.090909
3,400.0,76.0,57.431713,57.894737,54.716981,60.848458,81.632653,79.545455,14.285714,62.617221,78.431373,76.086957,33.333333,59.176875,85.106383,83.333333,9.090909
4,500.0,71.0,49.766153,55.384615,39.583333,54.984779,81.188119,72.5,10.526316,54.538443,83.673469,67.44186,12.5,55.43848,78.846154,78.378378,9.090909
5,600.0,78.0,62.749746,64.516129,56.0,66.948554,89.108911,77.333333,33.333333,67.531423,88.235294,74.358974,40.0,66.375661,90.0,80.555556,28.571429
6,700.0,77.0,60.413081,56.603774,57.407407,71.650475,82.608696,77.777778,44.444444,78.012422,82.608696,71.428571,80.0,66.247927,82.608696,85.365854,30.769231
7,800.0,76.0,59.404601,53.846154,50.0,71.863177,85.454545,66.666667,60.606061,74.4916,81.034483,65.517241,76.923077,69.413919,90.384615,67.857143,50.0
8,900.0,75.0,56.001408,50.980392,46.808511,60.758423,82.242991,78.378378,21.052632,60.932424,81.481481,76.315789,25.0,60.585414,83.018868,80.555556,18.181818
9,1000.0,78.0,60.55934,58.490566,52.173913,73.815377,84.482759,70.967742,63.636364,77.557368,79.032258,75.862069,77.777778,70.417854,90.740741,66.666667,53.846154


### Using the ```ClassificationEvaluator``` and ```ClassificationWindowedEvaluator``` with the wrapper for the AdaptiveRandomForest

In [6]:
rtg = MOA_RTG()
# Setting parameters using setViaCLIString
rtg.getOptions().setViaCLIString("-c 3 -u 10 -o 0")
rtg.prepareForUse()

schema = Schema(moa_header=rtg.getHeader())
learner = AdaptiveRandomForest(schema=schema, ensemble_size=5)

evaluator = ClassificationWindowedEvaluator(schema=schema, window_size=100, recall_per_class=True)
evaluator_TTT = ClassificationEvaluator(schema=schema, recall_per_class=True)

maxInstancesToProcess = 1000
instancesProcessed = 1
while rtg.hasMoreInstances() and instancesProcessed <= maxInstancesToProcess:
    instance = _JavaLabeledInstance(schema, rtg.nextInstance())
    prediction = learner.predict(instance)
    
    evaluator.update(instance.y_label, prediction)
    evaluator_TTT.update(instance.y_label, prediction)
    
    learner.train(instance)
    instancesProcessed += 1

print(f"Test-Then-Train accuracy = {evaluator_TTT.accuracy()}")
evaluator.metrics_per_window()

Test-Then-Train accuracy = 82.0


Unnamed: 0,classified instances,classifications correct (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent),Kappa M Statistic (percent),F1 Score (percent),F1 Score for class 0 (percent),F1 Score for class 1 (percent),F1 Score for class 2 (percent),Precision (percent),Precision for class 0 (percent),Precision for class 1 (percent),Precision for class 2 (percent),Recall (percent),Recall for class 0 (percent),Recall for class 1 (percent),Recall for class 2 (percent)
0,100.0,75.0,51.399689,56.896552,34.210526,51.788526,81.666667,73.239437,,51.011029,76.5625,76.470588,0.0,52.59009,87.5,70.27027,0.0
1,200.0,77.0,56.340167,58.181818,52.083333,,81.034483,80.0,,,73.4375,83.333333,,55.769231,90.384615,76.923077,0.0
2,300.0,79.0,61.671838,67.1875,54.347826,71.51663,86.238532,78.481013,16.666667,85.30303,85.454545,70.454545,100.0,61.566458,87.037037,88.571429,9.090909
3,400.0,85.0,73.661106,73.684211,71.698113,73.776414,90.909091,87.058824,37.5,77.528324,86.538462,86.046512,60.0,70.370882,95.744681,88.095238,27.272727
4,500.0,83.0,70.16497,73.846154,64.583333,75.074313,88.679245,82.666667,52.631579,77.038661,87.037037,81.578947,62.5,73.207648,90.384615,83.783784,45.454545
5,600.0,89.0,81.443995,82.258065,78.0,85.089737,93.069307,89.189189,72.0,86.93905,92.156863,86.842105,81.818182,83.31746,94.0,91.666667,64.285714
6,700.0,82.0,69.141094,66.037736,66.666667,73.60438,88.172043,84.090909,42.105263,77.541371,87.234043,78.723404,66.666667,70.047856,89.130435,90.243902,30.769231
7,800.0,83.0,72.039474,67.307692,64.583333,80.418547,92.156863,79.411765,60.0,83.833333,94.0,67.5,90.0,77.271062,90.384615,96.428571,45.0
8,900.0,84.0,72.096268,68.627451,65.957447,75.53901,91.089109,82.926829,47.058824,78.804348,95.833333,73.913043,66.666667,72.533511,86.792453,94.444444,36.363636
9,1000.0,83.0,70.613656,67.924528,63.043478,77.597758,89.908257,78.787879,64.0,78.181818,89.090909,78.787879,66.666667,77.02236,90.740741,78.787879,61.538462


### Using the ```ClassificationEvaluator``` and ```ClassificationWindowedEvaluator``` with the wrapper for the OnlineBagging

In [7]:
from moa.classifiers.trees import HoeffdingTree

rtg = MOA_RTG()
# Setting parameters using setViaCLIString
rtg.getOptions().setViaCLIString("-c 3 -u 10 -o 0")
rtg.prepareForUse()

schema = Schema(moa_header=rtg.getHeader())
learner = OnlineBagging(schema=schema, ensemble_size=10, base_learner=HoeffdingTree)
# learner = OnlineBagging(schema=schema, CLI="-s 10")

evaluator = ClassificationWindowedEvaluator(schema=schema, window_size=100, recall_per_class=True)
evaluator_TTT = ClassificationEvaluator(schema=schema, recall_per_class=True)

maxInstancesToProcess = 1000
instancesProcessed = 1
while rtg.hasMoreInstances() and instancesProcessed <= maxInstancesToProcess:
    instance = _JavaLabeledInstance(schema, rtg.nextInstance())
    prediction = learner.predict(instance)
    
    evaluator.update(instance.y_label, prediction)
    evaluator_TTT.update(instance.y_label, prediction)
    
    learner.train(instance)
    instancesProcessed += 1

print(f"Test-Then-Train accuracy = {evaluator_TTT.accuracy()}")
evaluator.metrics_per_window()

Test-Then-Train accuracy = 75.6


Unnamed: 0,classified instances,classifications correct (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent),Kappa M Statistic (percent),F1 Score (percent),F1 Score for class 0 (percent),F1 Score for class 1 (percent),F1 Score for class 2 (percent),Precision (percent),Precision for class 0 (percent),Precision for class 1 (percent),Precision for class 2 (percent),Recall (percent),Recall for class 0 (percent),Recall for class 1 (percent),Recall for class 2 (percent)
0,100.0,74.0,49.455677,55.172414,31.578947,60.579742,80.0,70.422535,22.222222,66.176471,75.0,73.529412,50.0,55.855856,85.714286,67.567568,14.285714
1,200.0,77.0,57.177434,58.181818,52.083333,61.377883,81.73913,79.452055,16.666667,64.410209,74.603175,85.294118,33.333333,58.618234,90.384615,74.358974,11.111111
2,300.0,73.0,50.675923,57.8125,41.304348,,81.481481,71.604938,,,81.481481,63.043478,,54.779541,81.481481,82.857143,0.0
3,400.0,72.0,50.062422,50.877193,47.169811,60.296608,75.510204,76.404494,15.384615,64.963148,72.54902,72.340426,50.0,56.255565,78.723404,80.952381,9.090909
4,500.0,73.0,51.015965,58.461538,43.75,60.699042,79.62963,73.417722,15.384615,65.277778,76.785714,69.047619,50.0,56.720532,82.692308,78.378378,9.090909
5,600.0,78.0,60.70025,64.516129,56.0,68.818688,82.142857,84.507042,23.529412,75.524834,74.193548,85.714286,66.666667,63.206349,92.0,83.333333,14.285714
6,700.0,75.0,56.070989,52.830189,53.703704,62.858377,80.808081,79.069767,13.333333,67.009085,75.471698,75.555556,50.0,59.191886,86.956522,82.926829,7.692308
7,800.0,73.0,52.195467,48.076923,43.75,69.902068,83.050847,68.965517,33.333333,80.30303,74.242424,66.666667,100.0,61.886447,94.230769,71.428571,20.0
8,900.0,81.0,65.697779,62.745098,59.574468,74.349661,86.486486,79.452055,50.0,80.379,82.758621,78.378378,80.0,69.161743,90.566038,80.555556,36.363636
9,1000.0,80.0,63.728691,62.264151,56.521739,76.557412,86.666667,70.175439,69.565217,80.707071,78.787879,83.333333,80.0,72.813606,96.296296,60.606061,61.538462


### Using the ```ClassificationEvaluator``` and ```ClassificationWindowedEvaluator``` with the wrapper for the AdaptiveRandomForest

* This experiment shows how we can use the CLI to setup the object directly instead of setting each parameter individually. 

In [8]:
rtg = MOA_RTG()
# Setting parameters using setViaCLIString
rtg.getOptions().setViaCLIString("-c 3 -u 10 -o 0")
rtg.prepareForUse()

learner = AdaptiveRandomForest(schema=schema, random_seed=1, CLI="-s 5")

# Shows the CLI help, basically what each char in the CLI mean
# print(learner.CLI_help())

evaluator = ClassificationWindowedEvaluator(schema=schema, window_size=100, recall_per_class=True)
evaluator_TTT = ClassificationEvaluator(schema=schema, recall_per_class=True)

maxInstancesToProcess = 1000
instancesProcessed = 1
while rtg.hasMoreInstances() and instancesProcessed <= maxInstancesToProcess:
    instance = _JavaLabeledInstance(schema, rtg.nextInstance())
    prediction = learner.predict(instance)
    
    evaluator.update(instance.y_label, prediction)
    evaluator_TTT.update(instance.y_label, prediction)
    
    learner.train(instance)
    instancesProcessed += 1

print(f"Test-Then-Train accuracy = {evaluator_TTT.accuracy()}")
evaluator.metrics_per_window()

Test-Then-Train accuracy = 82.0


Unnamed: 0,classified instances,classifications correct (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent),Kappa M Statistic (percent),F1 Score (percent),F1 Score for class 0 (percent),F1 Score for class 1 (percent),F1 Score for class 2 (percent),Precision (percent),Precision for class 0 (percent),Precision for class 1 (percent),Precision for class 2 (percent),Recall (percent),Recall for class 0 (percent),Recall for class 1 (percent),Recall for class 2 (percent)
0,100.0,75.0,51.399689,56.896552,34.210526,51.788526,81.666667,73.239437,,51.011029,76.5625,76.470588,0.0,52.59009,87.5,70.27027,0.0
1,200.0,77.0,56.340167,58.181818,52.083333,,81.034483,80.0,,,73.4375,83.333333,,55.769231,90.384615,76.923077,0.0
2,300.0,79.0,61.671838,67.1875,54.347826,71.51663,86.238532,78.481013,16.666667,85.30303,85.454545,70.454545,100.0,61.566458,87.037037,88.571429,9.090909
3,400.0,85.0,73.661106,73.684211,71.698113,73.776414,90.909091,87.058824,37.5,77.528324,86.538462,86.046512,60.0,70.370882,95.744681,88.095238,27.272727
4,500.0,83.0,70.16497,73.846154,64.583333,75.074313,88.679245,82.666667,52.631579,77.038661,87.037037,81.578947,62.5,73.207648,90.384615,83.783784,45.454545
5,600.0,89.0,81.443995,82.258065,78.0,85.089737,93.069307,89.189189,72.0,86.93905,92.156863,86.842105,81.818182,83.31746,94.0,91.666667,64.285714
6,700.0,82.0,69.141094,66.037736,66.666667,73.60438,88.172043,84.090909,42.105263,77.541371,87.234043,78.723404,66.666667,70.047856,89.130435,90.243902,30.769231
7,800.0,83.0,72.039474,67.307692,64.583333,80.418547,92.156863,79.411765,60.0,83.833333,94.0,67.5,90.0,77.271062,90.384615,96.428571,45.0
8,900.0,84.0,72.096268,68.627451,65.957447,75.53901,91.089109,82.926829,47.058824,78.804348,95.833333,73.913043,66.666667,72.533511,86.792453,94.444444,36.363636
9,1000.0,83.0,70.613656,67.924528,63.043478,77.597758,89.908257,78.787879,64.0,78.181818,89.090909,78.787879,66.666667,77.02236,90.740741,78.787879,61.538462


### MOARegressor tests

### Using the MOA regressor FIMTDD directly (without wrapping it with ```MOARegressor```)

In [9]:
maxInstancesToProcess = 50000
sampleFrequency = 10000
instancesProcessed = 1

fried_arff = ArffFileStream(arff_fried_path, -1)
fried_arff.prepareForUse()

learner = FIMTDD()
learner.prepareForUse()

# Setting parameters using the option attribute directly
evaluator = BasicRegressionPerformanceEvaluator()
learner.setModelContext(fried_arff.getHeader())

# Create empty lists to store the data
data = []
performance_names = []
performance_values = []

while fried_arff.hasMoreInstances() and instancesProcessed <= maxInstancesToProcess:
    instance = fried_arff.nextInstance()

    prediction = learner.getVotesForInstance(instance)

    evaluator.addResult(instance, prediction)
    learner.trainOnInstance(instance)

    if instancesProcessed == 1:
        performance_measurements = evaluator.getPerformanceMeasurements()
        performance_names = ["".join(measurement.getName()) for measurement in performance_measurements]

    if instancesProcessed % sampleFrequency == 0:
        performance_values = [measurement.getValue() for measurement in evaluator.getPerformanceMeasurements()]
        data.append(performance_values)
    
    instancesProcessed += 1

# Create a DataFrame using collected data
results_df = pd.DataFrame(data, columns=performance_names)

# Print the DataFrame
results_df

Unnamed: 0,classified instances,mean absolute error,root mean squared error,relative mean absolute error,relative root mean squared error,coefficient of determination,adjusted coefficient of determination
0,10000.0,2.845125,3.872532,0.702842,0.778102,0.394557,0.39389
1,20000.0,2.368434,3.230273,0.583849,0.647311,0.580988,0.580758
2,30000.0,2.160436,2.934305,0.533814,0.589412,0.652593,0.652466
3,40000.0,2.032701,2.750873,0.499972,0.550427,0.69703,0.696947


## Testing the ```MOARegressor``` wrapper and the ```BasicRegressionPerformanceEvaluator``` from MOA directly. 

In [10]:
from capymoa.evaluation import RegressionEvaluator
from capymoa.stream import ARFFStream

maxInstancesToProcess = 50000
sampleFrequency = 10000
instanceProcessed = 1

fried_stream = ARFFStream(path=arff_fried_path, class_index=-1)

learner = MOARegressor(schema=fried_stream.get_schema(), moa_learner=FIMTDD())
evaluator = RegressionEvaluator(schema=fried_stream.get_schema(), window_size=sampleFrequency)

while fried_stream.has_more_instances() and instancesProcessed <= maxInstancesToProcess:
    instance = fried_stream.next_instance()

    prediction = learner.predict(instance)

    evaluator.update(instance.y_value, prediction)
    learner.train(instance)

    instanceProcessed += 1

print(evaluator)

{'classified instances': 40768.0, 'mean absolute error': 2.024111913389299, 'root mean squared error': 2.739036408897116, 'relative mean absolute error': 0.4978639452538999, 'relative root mean squared error': 0.5479723082580009, 'coefficient of determination': 0.6997263493823984, 'adjusted coefficient of determination': 0.6996453058512179}


## Using the ```MOARegressor``` to wrap the FIMTDD learner and evaluating it using ```RegressionEvaluator``` and ```RegressionWindowedEvaluator```

In [11]:
from capymoa.stream import stream_from_file

maxInstancesToProcess = 50000
sampleFrequency = 10000
instancesProcessed = 1

fried_arff = stream_from_file(path_to_csv_or_arff=arff_fried_path, class_index=-1)

learner = MOARegressor(schema=fried_arff.get_schema(), moa_learner=FIMTDD())

evaluator_TTT = RegressionEvaluator(schema=fried_arff.get_schema(), window_size=sampleFrequency)
evaluator_windowed = RegressionWindowedEvaluator(schema=fried_arff.get_schema(), window_size=sampleFrequency)

while fried_arff.has_more_instances() and instancesProcessed <= maxInstancesToProcess:
    instance = fried_arff.next_instance()

    prediction = learner.predict(instance)

    evaluator_TTT.update(instance.y_value, prediction)
    evaluator_windowed.update(instance.y_value, prediction)
    learner.train(instance)
    
    instancesProcessed += 1

print(evaluator_TTT.MAE())
print(evaluator_TTT.RMSE())
print(evaluator_TTT.RMAE())
print(evaluator_TTT.R2())
print(evaluator_TTT.adjusted_R2())
evaluator_windowed.metrics_per_window()

2.024111913389299
2.739036408897116
0.4978639452538999
0.6997263493823984
0.6996453058512179


Unnamed: 0,classified instances,mean absolute error,root mean squared error,relative mean absolute error,relative root mean squared error,coefficient of determination,adjusted coefficient of determination
0,10000.0,2.845125,3.872532,0.702732,0.77799,0.394731,0.394064
1,10000.0,1.891743,2.423392,0.465342,0.484313,0.765441,0.765183
2,10000.0,1.744441,2.227354,0.432927,0.449547,0.797907,0.797685
3,10000.0,1.649495,2.106839,0.40025,0.416741,0.826327,0.826136


## Testing the ```AdaptiveRandomForestRegressor``` wrapper

In [12]:
maxInstancesToProcess = 50000
instancesProcessed = 1
sampleFrequency = 10000

fried_arff = stream_from_file(path_to_csv_or_arff=arff_fried_path, class_index=-1)

learner = AdaptiveRandomForestRegressor(schema=fried_arff.get_schema(), ensemble_size=5, max_features=0.3)

evaluator_TTT = RegressionEvaluator(schema=fried_arff.get_schema(), window_size=sampleFrequency)
evaluator_windowed = RegressionWindowedEvaluator(schema=fried_arff.get_schema(), window_size=sampleFrequency)

while fried_arff.has_more_instances() and instancesProcessed <= maxInstancesToProcess:
    instance = fried_arff.next_instance()

    prediction = learner.predict(instance)
    
    evaluator_TTT.update(instance.y_value, prediction)
    evaluator_windowed.update(instance.y_value, prediction)
    learner.train(instance)
    
    instancesProcessed += 1

print(evaluator_TTT.MAE())
print(evaluator_TTT.RMSE())
print(evaluator_TTT.RMAE())
print(evaluator_TTT.R2())
print(evaluator_TTT.adjusted_R2())
evaluator_windowed.metrics_per_window()

3.2786181111261965
4.228177670279516
0.8064305817226001
0.2844696324376049
0.28427651157090583


Unnamed: 0,classified instances,mean absolute error,root mean squared error,relative mean absolute error,relative root mean squared error,coefficient of determination,adjusted coefficient of determination
0,10000.0,3.516331,4.486692,0.868516,0.901375,0.187523,0.186628
1,10000.0,3.267398,4.207316,0.803733,0.840828,0.293008,0.292229
2,10000.0,3.201807,4.133389,0.794609,0.834242,0.30404,0.303274
3,10000.0,3.133928,4.080692,0.760447,0.807176,0.348467,0.347749
