## Pipelines and Transformers

* Includes an example of how preprocessing can be accomplished via pipelines and transformers.
* Transformers transform an instance, e.g., using standardization, normalization, etc.
* Pipelines bundle transformers and can also act as classifiers or regressors 

**notebook last updated on 04/04/2024**

### 1. Running onlineBagging without any preprocessing

In [1]:
## Test-then-train loop
from capymoa.stream import stream_from_file
from capymoa.learner.classifier import OnlineBagging
from capymoa.evaluation import ClassificationEvaluator

## Opening a file as a stream
DATA_PATH = "../data/"
elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+"electricity.csv")

# Creating a learner
ob_learner = OnlineBagging(schema=elec_stream.get_schema(), ensemble_size=5)

# Creating the evaluator
ob_evaluator = ClassificationEvaluator(schema=elec_stream.get_schema())

while elec_stream.has_more_instances():
    instance = elec_stream.next_instance()
    
    prediction = ob_learner.predict(instance)
    ob_evaluator.update(instance.y_index, prediction)
    ob_learner.train(instance)

ob_evaluator.accuracy()

capymoa_root: C:\Users\heyden\Documents\code\CapyMOA\src\capymoa
MOA jar path location (config.ini): C:\Users\heyden\Documents\code\CapyMOA\src\capymoa\jar\moa.jar
JVM Location (system): 
JAVA_HOME: C:\Program Files (x86)\Java\jre-1.8
JVM args: ['-Xmx8g', '-Xss10M']
Sucessfully started the JVM and added MOA jar to the class path


78.57079802259888

### 2. Online Bagging using pipelines and transformers

Creating a pipeline consists of the following steps:
1. Create a stream instance
2. Initialize the transformers
3. Initialize the learner
4. Create the pipeline. Here, we use a `ClassifierPipeline`
5. Use the pipeline the same way as any other learner.

In [2]:
from capymoa.stream.preprocessing import MOATransformer
from capymoa.stream.preprocessing import ClassifierPipeline
from capymoa.stream import Stream
from moa.streams.filters import AddNoiseFilter, NormalisationFilter
from moa.streams import FilteredStream

# Open the stream from an ARFF file
elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+"electricity.arff")

# Creating the transformer
normalisation_transformer = MOATransformer(schema=elec_stream.get_schema(), moa_filter=NormalisationFilter())
add_noise_transformer = MOATransformer(schema=normalisation_transformer.get_schema(), moa_filter=AddNoiseFilter())

# Creating a learner
ob_learner = OnlineBagging(schema=add_noise_transformer.get_schema(), ensemble_size=5)

# Creating and populating the pipeline
pipeline = ClassifierPipeline(transformers=[normalisation_transformer, add_noise_transformer],
                              learner=ob_learner)

# Alternative:
# pipeline = ClassifierPipeline()
# pipeline.add_transformer(normalization_transformer)
# pipeline.add_transformer(add_noise_transformer)
# pipeline.set_learner(ob_learner)

# Creating the evaluator
ob_evaluator = ClassificationEvaluator(schema=elec_stream.get_schema())  #TODO: Change to transformer.get_schema() to pipeline.get_schema() or something like that.

while elec_stream.has_more_instances():
    instance = elec_stream.next_instance()
    prediction = pipeline.predict(instance)
    pipeline.train(instance)

    ob_evaluator.update(instance.y_index, prediction)

ob_evaluator.accuracy()

74.70868644067797

We can also get a textual representation of the pipeline:

In [3]:
str(pipeline)

'Transformer(NormalisationFilter ) | Transformer(AddNoiseFilter ) | OnlineBagging'