# Testing the ```Classification Evaluators``` classes
* ```ClassificationEvaluator``` and ```ClassificationWindowedEvaluator```
* This notebook also includes **integration examples showing how to use ```MOA``` and ```River```** with these evaluators 

**Notebook last update: 17/02/2024**

In [1]:
from capymoa.stream.stream import Schema
from capymoa.evaluation import ClassificationEvaluator, ClassificationWindowedEvaluator
import pandas as pd

capymoa_root: /home/antonlee/github.com/tachyonicClock/MOABridge/src/capymoa
MOA jar path location (config.ini): /home/antonlee/github.com/tachyonicClock/MOABridge/src/capymoa/jar/moa.jar
JVM Location (system): 
JAVA_HOME: /usr/lib/jvm/java-17-openjdk
JVM args: ['-Xmx8g', '-Xss10M']


Sucessfully started the JVM and added MOA jar to the class path


## Test ClassificationEvaluator 0

* If the schema is not set, the evaluator cannot function. 

In [2]:
try: 
    evaluator = ClassificationEvaluator(schema=None)
except ValueError as e:
    print(f"Error while trying to create the evaluator. Exception: {e}")

Error while trying to create the evaluator. Exception: Schema is None, please define a proper Schema.


## Test ClassificationEvaluator 1

* The class label values (labels) match their corresponding indexes, whether the evaluator treat y and y_pred as values or indexes doesn't matter. 

In [3]:
schema = Schema(labels=[0,1])
evaluator = ClassificationEvaluator(schema=schema)

evaluator.update(1, 0)
evaluator.update(1, 1)
evaluator.update(0, 1)
evaluator.update(1, 1)

print(evaluator.metrics_header())
print(evaluator.metrics())
print(evaluator.accuracy())
print(evaluator.kappa())
print(evaluator.kappa_temporal())
print(evaluator.kappa_M())

['classified instances', 'classifications correct (percent)', 'Kappa Statistic (percent)', 'Kappa Temporal Statistic (percent)', 'Kappa M Statistic (percent)', 'F1 Score (percent)', 'F1 Score for class 0 (percent)', 'F1 Score for class 1 (percent)', 'Precision (percent)', 'Precision for class 0 (percent)', 'Precision for class 1 (percent)', 'Recall (percent)', 'Recall for class 0 (percent)', 'Recall for class 1 (percent)']
[4.0, 50.0, -33.33333333333333, 33.33333333333333, -100.0, 33.33333333333333, nan, 66.66666666666666, 33.33333333333333, 0.0, 66.66666666666666, 33.33333333333333, 0.0, 66.66666666666666]
50.0
-33.33333333333333
33.33333333333333
-100.0


## Test ClassificationEvaluator 2

* Adding more labels, still the values of the labels and their corresponding class labels match. 

In [4]:
schema = Schema(labels=[0,1,2])
evaluator = ClassificationEvaluator(schema=schema, precision_per_class=True)

evaluator.update(0, 2)
evaluator.update(1, 1)
evaluator.update(1, 1)
evaluator.update(1, 0)
evaluator.update(2, 2)
evaluator.update(2, 0)

print(evaluator.metrics_header())
print(evaluator.metrics())

['classified instances', 'classifications correct (percent)', 'Kappa Statistic (percent)', 'Kappa Temporal Statistic (percent)', 'Kappa M Statistic (percent)', 'F1 Score (percent)', 'F1 Score for class 0 (percent)', 'F1 Score for class 1 (percent)', 'F1 Score for class 2 (percent)', 'Precision (percent)', 'Precision for class 0 (percent)', 'Precision for class 1 (percent)', 'Precision for class 2 (percent)', 'Recall (percent)', 'Recall for class 0 (percent)', 'Recall for class 1 (percent)', 'Recall for class 2 (percent)']
[6.0, 50.0, 25.0, -49.999999999999986, 0.0, 43.74999999999999, nan, 80.0, 50.0, 50.0, 0.0, 100.0, 50.0, 38.888888888888886, 0.0, 66.66666666666666, 50.0]


## Test ClassificationEvaluator 3

* The class label values are strings. Using it as expected, providing ```y``` and ```y_pred``` as strings corresponding to possible values. 

In [5]:
schema = Schema(labels=["zero","one","two"])
evaluator = ClassificationEvaluator(schema=schema, recall_per_class=True)

evaluator.update("one", "one")
evaluator.update("zero", "zero")
evaluator.update("one", "one")
evaluator.update("two", "one")

print(evaluator.metrics_header())
print(evaluator.metrics())

['classified instances', 'classifications correct (percent)', 'Kappa Statistic (percent)', 'Kappa Temporal Statistic (percent)', 'Kappa M Statistic (percent)', 'F1 Score (percent)', 'F1 Score for class 0 (percent)', 'F1 Score for class 1 (percent)', 'F1 Score for class 2 (percent)', 'Precision (percent)', 'Precision for class 0 (percent)', 'Precision for class 1 (percent)', 'Precision for class 2 (percent)', 'Recall (percent)', 'Recall for class 0 (percent)', 'Recall for class 1 (percent)', 'Recall for class 2 (percent)']
[4.0, 75.0, 55.55555555555556, 75.0, 0.0, nan, 100.0, 80.0, nan, nan, 100.0, 66.66666666666666, nan, 66.66666666666666, 100.0, 100.0, 0.0]


## Test ClassificationEvaluator 4

* Example using an index in the ```y_pred``` while the ```y``` is provided as a value corresponding to the original labels in the schema. * It works as expected, since the value provided in ```y_pred``` is not a value it is interpreted as an index.

```evaluator.update("banana", 0)```

In [6]:
schema = Schema(labels=["banana","apple","plane"])
evaluator = ClassificationEvaluator(schema=schema, f1_per_class=True)

evaluator.update("plane", "banana")
evaluator.update("banana", 0) ## y="banana" and y_pred= index 0, surprisingly this works. 
evaluator.update("plane", "plane")
evaluator.update("apple", "apple")

print(evaluator.metrics_header())

print(evaluator.metrics())

['classified instances', 'classifications correct (percent)', 'Kappa Statistic (percent)', 'Kappa Temporal Statistic (percent)', 'Kappa M Statistic (percent)', 'F1 Score (percent)', 'F1 Score for class 0 (percent)', 'F1 Score for class 1 (percent)', 'F1 Score for class 2 (percent)', 'Precision (percent)', 'Precision for class 0 (percent)', 'Precision for class 1 (percent)', 'Precision for class 2 (percent)', 'Recall (percent)', 'Recall for class 0 (percent)', 'Recall for class 1 (percent)', 'Recall for class 2 (percent)']
[4.0, 75.0, 63.63636363636363, 75.0, 0.0, 83.33333333333334, 66.66666666666666, 100.0, 66.66666666666666, 83.33333333333334, 50.0, 100.0, 100.0, 83.33333333333334, 100.0, 100.0, 50.0]


## Test ClassificationEvaluator 5
* Example where we have the label values, but the ```y_pred``` and ```y``` are only provided as integers
* ~Working as expected, both ```y_pred``` and ```y``` can not be found in the list of values ```["banana","apple","plane"]```, therefore the update function uses then as indexes.~
* Changed this behavior as of 16/02/2024, the predictions and y values must be of the same type (i.e. match whatever were the original values). The previous approach allowed users to mix indexes with actual values, which could lead to confusion. 

In [7]:
schema = Schema(labels=["banana","apple","plane"])
evaluator = ClassificationEvaluator(schema=schema, f1_precision_recall=True)

# Old version, this used to work
# evaluator.update(2, 0)
# evaluator.update(0, 0)
# evaluator.update(2, 2)
# evaluator.update(1, 1)

# Must be like this now
evaluator.update("plane", "banana")
evaluator.update("banana", "banana")
evaluator.update("plane", "plane")
evaluator.update("apple", "apple")

print(evaluator.metrics_header())

print(evaluator.metrics())

['classified instances', 'classifications correct (percent)', 'Kappa Statistic (percent)', 'Kappa Temporal Statistic (percent)', 'Kappa M Statistic (percent)', 'F1 Score (percent)', 'F1 Score for class 0 (percent)', 'F1 Score for class 1 (percent)', 'F1 Score for class 2 (percent)', 'Precision (percent)', 'Precision for class 0 (percent)', 'Precision for class 1 (percent)', 'Precision for class 2 (percent)', 'Recall (percent)', 'Recall for class 0 (percent)', 'Recall for class 1 (percent)', 'Recall for class 2 (percent)']
[4.0, 75.0, 63.63636363636363, 75.0, 0.0, 83.33333333333334, 66.66666666666666, 100.0, 66.66666666666666, 83.33333333333334, 50.0, 100.0, 100.0, 83.33333333333334, 100.0, 100.0, 50.0]


## Test ClassificationEvaluator 6
* The label values are 5, 3 and 1, ~which could have been interpreted as indexes, but first the evaluator checks if the value exists in the value list.~ The parameters to update() are **only** interpreted as values now. 
* This is an example of ```y``` and ```y_pred``` being correctly identified as values

In [8]:
schema = Schema(labels=[5,3,1])
evaluator = ClassificationEvaluator(schema=schema, f1_precision_recall=True)

evaluator.update(5, 1)
evaluator.update(5, 5)
evaluator.update(3, 3)
evaluator.update(1, 1)

print(evaluator.metrics_header())

print(evaluator.metrics())

['classified instances', 'classifications correct (percent)', 'Kappa Statistic (percent)', 'Kappa Temporal Statistic (percent)', 'Kappa M Statistic (percent)', 'F1 Score (percent)', 'F1 Score for class 0 (percent)', 'F1 Score for class 1 (percent)', 'F1 Score for class 2 (percent)', 'Precision (percent)', 'Precision for class 0 (percent)', 'Precision for class 1 (percent)', 'Precision for class 2 (percent)', 'Recall (percent)', 'Recall for class 0 (percent)', 'Recall for class 1 (percent)', 'Recall for class 2 (percent)']
[4.0, 75.0, 63.63636363636363, 50.0, 50.0, 83.33333333333334, 66.66666666666666, 100.0, 66.66666666666666, 83.33333333333334, 100.0, 100.0, 50.0, 83.33333333333334, 50.0, 100.0, 100.0]


## Test ClassificationEvaluator 7
* The label values are 5, 3 and 1.
* if the ```y``` value is not a valid class label, update() will throw an exception
* if ```y_pred``` is not a valid class label, update() will continue assuming a default class (whichever is on index 0). That is why the following code doesn't raise an error, but if we were to put any value in the ```y``` that didn't correspond to a valid class value then an error would be raised. 
* ~What happens if we use integers that don't belong to the label values list but can be interpreted as valid indexes?~
* ~It works, but it is confusing...~
* ~**This is an example of how the ClassificationEvaluator shouldn't be used!**~

**This test was updated, update() always interpret the parameters ```y``` and ```y_pred``` as values**

In [9]:
# The label values are 5, 3 and 1. What happens if we use integers that do not 
# belong to the label list values but can be interpreted as valid indexes? It works, but it is confusing... 
schema = Schema(labels=[5,3,1])
evaluator = ClassificationEvaluator(schema=schema, f1_precision_recall=True)

evaluator.update(5, 0)
evaluator.update(5, 0)
evaluator.update(3, 1) 
evaluator.update(1, 1)

print(evaluator.metrics_header())

print(evaluator.metrics())

['classified instances', 'classifications correct (percent)', 'Kappa Statistic (percent)', 'Kappa Temporal Statistic (percent)', 'Kappa M Statistic (percent)', 'F1 Score (percent)', 'F1 Score for class 0 (percent)', 'F1 Score for class 1 (percent)', 'F1 Score for class 2 (percent)', 'Precision (percent)', 'Precision for class 0 (percent)', 'Precision for class 1 (percent)', 'Precision for class 2 (percent)', 'Recall (percent)', 'Recall for class 0 (percent)', 'Recall for class 1 (percent)', 'Recall for class 2 (percent)']
[4.0, 75.0, 60.0, 50.0, 50.0, nan, 100.0, nan, 66.66666666666666, nan, 100.0, nan, 50.0, 66.66666666666666, 100.0, 0.0, 100.0]


## Test ClassificationEvaluator 8
* Using the header from a MOA stream (in this case, a generator) to specify the header

In [10]:
from moa.streams.generators import RandomTreeGenerator

rtg = RandomTreeGenerator()
# Setting parameters using setViaCLIString
rtg.getOptions().setViaCLIString("-c 4 -u 10 -o 0")
rtg.prepareForUse()

schema = Schema(moa_header=rtg.getHeader())

print(schema.get_label_indexes())
print(schema.get_label_values())

[0, 1, 2, 3]
['class1', 'class2', 'class3', 'class4']


## Test ClassificationEvaluator 9
* Checking the behavior when the prediction is None

In [11]:
schema = Schema(labels=[0,1,2])
evaluator = ClassificationEvaluator(schema=schema, recall_per_class=True)

evaluator.update(0, None)
evaluator.update(1, None)
evaluator.update(1, 1)
evaluator.update(1, None)
evaluator.update(2, None)
evaluator.update(2, None)
evaluator.update(1, None)
evaluator.update(2, None)
evaluator.update(2, None)
evaluator.update(0, None)

print(evaluator.metrics_header())
print(evaluator.metrics())
print(evaluator.accuracy())

['classified instances', 'classifications correct (percent)', 'Kappa Statistic (percent)', 'Kappa Temporal Statistic (percent)', 'Kappa M Statistic (percent)', 'F1 Score (percent)', 'F1 Score for class 0 (percent)', 'F1 Score for class 1 (percent)', 'F1 Score for class 2 (percent)', 'Precision (percent)', 'Precision for class 0 (percent)', 'Precision for class 1 (percent)', 'Precision for class 2 (percent)', 'Recall (percent)', 'Recall for class 0 (percent)', 'Recall for class 1 (percent)', 'Recall for class 2 (percent)']
[10.0, 30.0, 10.25641025641025, -40.0, -16.666666666666675, nan, 36.36363636363636, 40.0, nan, nan, 22.22222222222222, 100.0, nan, 41.66666666666667, 100.0, 25.0, 0.0]
30.0


## Test ClassificationEvaluator 10

* Using the ClassificationWindowedEvaluator

In [12]:
schema = Schema(labels=[0,1])
evaluator = ClassificationWindowedEvaluator(schema=schema, window_size=3)

evaluator.update(1, 0)
evaluator.update(1, 1)
evaluator.update(0, 1)

evaluator.update(1, 1)
evaluator.update(1, 1)
evaluator.update(1, 1)

evaluator.update(1, 1)
evaluator.update(1, 1)
evaluator.update(1, 1)

evaluator.update(0, 1)
evaluator.update(0, 1)
evaluator.update(0, 0)


print(evaluator.metrics_header())
print(evaluator.metrics())
evaluator.metrics_per_window()

['classified instances', 'classifications correct (percent)', 'Kappa Statistic (percent)', 'Kappa Temporal Statistic (percent)', 'Kappa M Statistic (percent)', 'F1 Score (percent)', 'F1 Score for class 0 (percent)', 'F1 Score for class 1 (percent)', 'Precision (percent)', 'Precision for class 0 (percent)', 'Precision for class 1 (percent)', 'Recall (percent)', 'Recall for class 0 (percent)', 'Recall for class 1 (percent)']
[12.0, 33.33333333333333, 0.0, -99.99999999999999, -99.99999999999999, nan, 50.0, nan, 50.0, 100.0, 0.0, nan, 33.33333333333333, nan]


Unnamed: 0,classified instances,classifications correct (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent),Kappa M Statistic (percent),F1 Score (percent),F1 Score for class 0 (percent),F1 Score for class 1 (percent),Precision (percent),Precision for class 0 (percent),Precision for class 1 (percent),Recall (percent),Recall for class 0 (percent),Recall for class 1 (percent)
0,3.0,33.333333,-50.0,0.0,-100.0,25.0,,50.0,25.0,0.0,50.0,25.0,0.0,50.0
1,6.0,100.0,,100.0,,,,100.0,,,100.0,,,100.0
2,9.0,100.0,,,,,,100.0,,,100.0,,,100.0
3,12.0,33.333333,0.0,-100.0,-100.0,,50.0,,50.0,100.0,0.0,,33.333333,


## **IMPORTANT**: Window size does not perfectly divides the number os instances processed

* In these cases, the user wanting to use ```ClassificationWindowedEvaluator``` directly must know that the results for the last window (which is smaller than ```window_size```) are available through the ```metrics()```, i.e. the last results observed for the stream.
* If the user is using ```windowed_evaluation``` or ```prequential_evaluation``` (and variants) the user does not need to worry about this

In [13]:
schema = Schema(labels=[0,1])
evaluator = ClassificationWindowedEvaluator(schema=schema, window_size=5)

evaluator.update(1, 0)
evaluator.update(1, 1)
evaluator.update(0, 1)

evaluator.update(1, 1)
evaluator.update(1, 1)
evaluator.update(1, 1)

evaluator.update(1, 1)
evaluator.update(1, 1)
evaluator.update(1, 1)

evaluator.update(0, 1)
evaluator.update(0, 1)
evaluator.update(0, 0)


print(evaluator.metrics_header())
print(evaluator.metrics()) # Results for the last window.
evaluator.metrics_per_window()

['classified instances', 'classifications correct (percent)', 'Kappa Statistic (percent)', 'Kappa Temporal Statistic (percent)', 'Kappa M Statistic (percent)', 'F1 Score (percent)', 'F1 Score for class 0 (percent)', 'F1 Score for class 1 (percent)', 'Precision (percent)', 'Precision for class 0 (percent)', 'Precision for class 1 (percent)', 'Recall (percent)', 'Recall for class 0 (percent)', 'Recall for class 1 (percent)']
[12.0, 60.0, 28.57142857142856, -100.00000000000007, 0.0, 70.58823529411765, 50.0, 66.66666666666666, 75.0, 100.0, 50.0, 66.66666666666666, 33.33333333333333, 100.0]


Unnamed: 0,classified instances,classifications correct (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent),Kappa M Statistic (percent),F1 Score (percent),F1 Score for class 0 (percent),F1 Score for class 1 (percent),Precision (percent),Precision for class 0 (percent),Precision for class 1 (percent),Recall (percent),Recall for class 0 (percent),Recall for class 1 (percent)
0,5.0,60.0,-25.0,33.333333,-100.0,37.5,,75.0,37.5,0.0,75.0,37.5,0.0,75.0
1,10.0,80.0,0.0,0.0,0.0,,,88.888889,,,80.0,50.0,0.0,100.0


# Integration tests with ```MOA``` and ```River```


## Using a simple test-then-train loop with the MOA ARF class

* It is expected that the behavior of the ```ClassificationEvaluator``` mimics the ```BasicClassificationPerformanceEvaluator```
* This example does not use the ```ClassificationEvaluator```

In [14]:
from moa.classifiers.meta import AdaptiveRandomForest
from moa.classifiers.trees import HoeffdingTree
from moa.core import Example
from moa.evaluation import BasicClassificationPerformanceEvaluator
from moa.streams.generators import RandomTreeGenerator

maxInstancesToProcess = 10000
instancesProcessed = 1
sampleFrequency = 1000

learner = AdaptiveRandomForest()
# Setting parameters using setViaCLIString
learner.getOptions().setViaCLIString("-s 10")
learner.resetLearningImpl()
learner.prepareForUse()

rtg = RandomTreeGenerator()
# Setting parameters using setViaCLIString
rtg.getOptions().setViaCLIString("-c 3 -u 10 -o 0")
rtg.prepareForUse()

# Setting parameters using the option attribute directly
evaluator = BasicClassificationPerformanceEvaluator()
evaluator.recallPerClassOption.set()
evaluator.prepareForUse()

learner.setModelContext(rtg.getHeader())

# Create empty lists to store the data
data = []
performance_names = []
performance_values = []

while rtg.hasMoreInstances() and instancesProcessed <= maxInstancesToProcess:
    trainInst = rtg.nextInstance()
    testInst = trainInst

    prediction = learner.getVotesForInstance(testInst)

    evaluator.addResult(testInst, prediction)
    learner.trainOnInstance(trainInst)

    if instancesProcessed == 1:
        performance_measurements = evaluator.getPerformanceMeasurements()
        performance_names = ["".join(measurement.getName()) for measurement in performance_measurements]

    if instancesProcessed % sampleFrequency == 0:
        performance_values = [measurement.getValue() for measurement in evaluator.getPerformanceMeasurements()]
        data.append(performance_values)
    
    instancesProcessed += 1

# Create a DataFrame using collected data
results_df = pd.DataFrame(data, columns=performance_names)

# Print the DataFrame
results_df

Unnamed: 0,classified instances,classifications correct (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent),Kappa M Statistic (percent),Recall for class 0 (percent),Recall for class 1 (percent),Recall for class 2 (percent)
0,1000.0,80.4,65.084048,65.614035,58.995816,90.891473,81.868132,30.833333
1,2000.0,83.05,70.631094,70.926244,66.23506,91.111111,86.351706,40.725806
2,3000.0,86.2,76.072836,76.138329,72.691293,92.760487,90.55794,44.817927
3,4000.0,87.875,79.074292,79.273504,75.906607,93.437658,92.302674,51.234568
4,5000.0,88.86,80.862418,81.105834,77.861685,93.946731,93.022036,55.519481
5,6000.0,89.633333,82.262531,82.52809,79.599869,94.295416,93.771626,58.277254
6,7000.0,90.242857,83.349126,83.585677,80.884411,94.504531,94.350074,60.734788
7,8000.0,90.675,84.135309,84.354027,81.738066,94.755692,94.764228,62.598425
8,9000.0,90.944444,84.623638,84.791939,82.255606,94.910248,94.918699,63.982684
9,10000.0,91.16,85.010699,85.170273,82.615536,94.785089,95.129015,65.738592


## Using ```ClassificationEvaluator``` and a ```MOA learner``` (AdaptiveRandomForest)

In [15]:
from moa.classifiers.meta import AdaptiveRandomForest
from moa.classifiers.trees import HoeffdingTree
from moa.core import Example
from moa.core import Utils
from moa.evaluation import BasicClassificationPerformanceEvaluator
from moa.streams.generators import RandomTreeGenerator

maxInstancesToProcess = 10000
instancesProcessed = 1
sampleFrequency = 1000

learner = AdaptiveRandomForest()
# Setting parameters using setViaCLIString
learner.getOptions().setViaCLIString("-s 10")
learner.prepareForUse()

rtg = RandomTreeGenerator()
# Setting parameters using setViaCLIString
rtg.getOptions().setViaCLIString("-c 3 -u 10 -o 0")
rtg.prepareForUse()

# Creating the schema
schema = Schema(moa_header=rtg.getHeader())
# Setting parameters using the option attribute directly
evaluator = ClassificationEvaluator(schema=schema, recall_per_class=True)

# learner.setModelContext(rtg.getHeader())

# Create empty lists to store the data
data = []
performance_names = []
performance_values = []

while rtg.hasMoreInstances() and instancesProcessed <= maxInstancesToProcess:
    trainInst = rtg.nextInstance()
    testInst = trainInst

    prediction = learner.getVotesForInstance(testInst)

    # This unholy amount of conversions to get the original value is not needed when we use capymoa directly :)
    # This is an example of mixing a high-level function from capymoa (evaluator.update()) with raw usage of MOA objects. 
    evaluator.update(schema.get_value_for_index(int(testInst.getData().classValue())), 
                     schema.get_value_for_index(int(Utils.maxIndex(prediction))))
    learner.trainOnInstance(trainInst)
    
    instancesProcessed += 1

# Create a DataFrame using collected data
print(evaluator.metrics_header())
print(evaluator.metrics())

['classified instances', 'classifications correct (percent)', 'Kappa Statistic (percent)', 'Kappa Temporal Statistic (percent)', 'Kappa M Statistic (percent)', 'F1 Score (percent)', 'F1 Score for class 0 (percent)', 'F1 Score for class 1 (percent)', 'F1 Score for class 2 (percent)', 'Precision (percent)', 'Precision for class 0 (percent)', 'Precision for class 1 (percent)', 'Precision for class 2 (percent)', 'Recall (percent)', 'Recall for class 0 (percent)', 'Recall for class 1 (percent)', 'Recall for class 2 (percent)']
[10000.0, 91.16, 85.0106988680263, 85.170273444053, 82.61553588987218, 87.10219052615284, 93.47127360385696, 93.50414078674947, 73.40241796200345, 89.07205950104454, 92.19338220725183, 91.93384223918575, 83.088954056696, 85.21756543489148, 94.78508861275209, 95.12901527119536, 65.73859242072699]


## Using ```ClassificationEvaluator``` and River
* Using ```metrics.Accuracy()``` from River and ```ClassificationEvaluator``` defined in the evaluation module

In [16]:
#TODO: This cell is skipped because `CSV` files are no longer downloaded. In
# the future this will need to be fixed.
## River imports
from river import stream
from river import metrics

from river.forest import ARFClassifier
import pandas as pd

maxInstances=1000
sampleFrequency=1000
rtg_2abrupt_path = '../data/RTG_2abrupt.csv'
dataset = pd.read_csv(rtg_2abrupt_path).to_numpy()

model = ARFClassifier(
    n_models=5,
    max_features=0.20, 
    seed=1
)

# Setting the class labels as floats since river only deals with floats. 
schema = Schema(labels=[1.0, 2.0, 3.0, 4.0, 5.0])
# schema = Schema(moa_header=rtg.getHeader())
# Setting parameters using the option attribute directly
evaluator = ClassificationEvaluator(schema=schema, recall_per_class=True)

instancesProcessed = 0
accuracy = metrics.Accuracy()
cm = metrics.ConfusionMatrix()

X, Y = dataset[:, :-1], dataset[:, -1]

data = []
performance_names = ['Classified instances', 'accuracy']
performance_values = []

ds = stream.iter_array(X, Y)

for (x, y) in ds:
    if instancesProcessed > maxInstances:
        break

    yp = model.predict_one(x)
    accuracy.update(y, yp)

    # Forces an error in the first prediction because it is None, the default value of 0 would cause it to correctly classify
    if yp is None:
        yp = 2.0
    evaluator.update(y, yp)
    
    if yp is not None:
        cm.update(y,yp)
    model.learn_one(x, y)

    if instancesProcessed % sampleFrequency == 0:
        performance_values = [instancesProcessed, accuracy.get()]
        data.append(performance_values)

    instancesProcessed += 1

    if instancesProcessed == 1:
        performance_names = evaluator.metrics_header()

    if instancesProcessed % sampleFrequency == 0:
        performance_values = evaluator.metrics()
        data.append(performance_values)

print(f"{model}, accuracy.get(): {accuracy.get():.6f}, evaluator.accuracy(): {evaluator.accuracy():.4f}")
cm