## Preprocessing

**Accessing the input data x()**
* Accessing the input data as a double array from an ```Instance``` through function ```x()```
* Instances are represented internally as MOA Instances.


* Includes an example of how preprocessing can be accomplished.
* ```x()``` is read-only as of now, so one cannot preprocess instances

**notebook last updated on 08/12/2023**

## 0. Reading data and accessing x()

In [1]:
from capymoa.stream import stream_from_file

DATA_PATH = "../data/"

## Opening a file as a stream
elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+"electricity.csv")

elec_stream.restart()
i = 0
while elec_stream.has_more_instances():
    instance = elec_stream.next_instance()
    if i < 20: # prevent printing all the instances
        print(f'x: {instance.x}, y: {instance.y_index}')
    i+=1

capymoa_root: /home/antonlee/github.com/tachyonicClock/MOABridge/src/capymoa
MOA jar path location (config.ini): /home/antonlee/github.com/tachyonicClock/MOABridge/src/capymoa/jar/moa.jar
JVM Location (system): 
JAVA_HOME: /usr/lib/jvm/java-17-openjdk
JVM args: ['-Xmx8g', '-Xss10M']


Sucessfully started the JVM and added MOA jar to the class path


x: [0.       0.056443 0.439155 0.003467 0.422915 0.414912], y: 1
x: [0.021277 0.051699 0.415055 0.003467 0.422915 0.414912], y: 1
x: [0.042553 0.051489 0.385004 0.003467 0.422915 0.414912], y: 1
x: [0.06383  0.045485 0.314639 0.003467 0.422915 0.414912], y: 1
x: [0.085106 0.042482 0.251116 0.003467 0.422915 0.414912], y: 0
x: [0.106383 0.041161 0.207528 0.003467 0.422915 0.414912], y: 0
x: [0.12766  0.041161 0.171824 0.003467 0.422915 0.414912], y: 0
x: [0.148936 0.041161 0.152782 0.003467 0.422915 0.414912], y: 0
x: [0.170213 0.041161 0.13493  0.003467 0.422915 0.414912], y: 0
x: [0.191489 0.041161 0.140583 0.003467 0.422915 0.414912], y: 0
x: [0.212766 0.044374 0.168997 0.003467 0.422915 0.414912], y: 1
x: [0.234043 0.049868 0.212437 0.003467 0.422915 0.414912], y: 1
x: [0.255319 0.051489 0.298721 0.003467 0.422915 0.414912], y: 1
x: [0.276596 0.042482 0.39036  0.003467 0.422915 0.414912], y: 0
x: [0.297872 0.040861 0.402261 0.003467 0.422915 0.414912], y: 0
x: [0.319149 0.040711 0.4

In [2]:
# Getting some extra information about the instance through the MOA representation. 
moa_instance = instance.java_instance.getData()

for i in range(0, moa_instance.numInputAttributes()):
    print(moa_instance.attribute(i))
    print(moa_instance.value(i))

@attribute period numeric
1.0
@attribute nswprice numeric
0.050679
@attribute nswdemand numeric
0.288753
@attribute vicprice numeric
0.003542
@attribute vicdemand numeric
0.355256
@attribute transfer numeric
0.23114


## 1. Preprocessing using MOA



### 1.1 Running onlineBagging without any preprocessing

In [3]:
## Test-then-train loop
from capymoa.learner.classifier import OnlineBagging
from capymoa.evaluation import ClassificationEvaluator

## Opening a file as a stream
elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+"electricity.csv")

# Creating a learner
ob_learner = OnlineBagging(schema=elec_stream.get_schema(), ensemble_size=5)

# Creating the evaluator
ob_evaluator = ClassificationEvaluator(schema=elec_stream.get_schema())

while elec_stream.has_more_instances():
    instance = elec_stream.next_instance()
    
    prediction = ob_learner.predict(instance)
    ob_evaluator.update(instance.y_index, prediction)
    ob_learner.train(instance)

ob_evaluator.accuracy()

79.05190677966102

### 1.2 Online Bagging using the preprocessing method from MOA
* The API is still a bit rough

In [4]:
# shows the creation string, the __class__ is needed as a parameter to the function is the class used. 
# elec_stream.moa_stream.getCLICreationString(elec_stream.moa_stream.__class__)

In [5]:
from capymoa.stream import Stream
from moa.streams.filters import StandardisationFilter, NormalisationFilter
from moa.streams import FilteredStream

# Open the stream from an ARFF file
elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+"electricity.arff")
# Create a FilterStream and use the NormalisationFilter
elec_stream_normalised = Stream(CLI=f"-s ({elec_stream.moa_stream.getCLICreationString(elec_stream.moa_stream.__class__)}) \
-f NormalisationFilter ", moa_stream=FilteredStream())

# Creating a learner
ob_learner = OnlineBagging(schema=elec_stream.get_schema(), ensemble_size=5)

# Creating the evaluator
ob_evaluator = ClassificationEvaluator(schema=elec_stream_normalised.get_schema())

while elec_stream_normalised.has_more_instances():
    instance = elec_stream_normalised.next_instance()
    
    prediction = ob_learner.predict(instance)
    ob_evaluator.update(instance.y_index, prediction)
    ob_learner.train(instance)
    # print(instance.x)

ob_evaluator.accuracy()

79.69412076271186