## Pipelines and Transformers

This notebook showcases the current version of data processing pipelines in CapyMOA. 

* Includes an example of how preprocessing can be accomplished via pipelines and transformers.
* Transformers transform_instance an instance, e.g., using standardization, normalization, etc.
* Pipelines bundle transformers and can also act as classifiers or regressors

*Please note that this feature is still under development; some functionality might not yet be available or change in future releases.*

**notebook last updated on 28/05/2024**

### 1. Running onlineBagging without any preprocessing

First, let us have a look at a simple test-then-train classification example without pipelines. 
- We loop over the instances of the data stream
- make a prediction,
- update the evaluator with the prediction and label
- and then train the classifier on the instance.

In [1]:
## Test-then-train loop
from capymoa.stream import stream_from_file
from capymoa.classifier import OnlineBagging
from capymoa.evaluation import ClassificationEvaluator

## Opening a file as a stream
DATA_PATH = "../data/"
elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+"electricity.csv")

# Creating a learner
ob_learner = OnlineBagging(schema=elec_stream.get_schema(), ensemble_size=5)

# Creating the evaluator
ob_evaluator = ClassificationEvaluator(schema=elec_stream.get_schema())

while elec_stream.has_more_instances():
    instance = elec_stream.next_instance()
    
    prediction = ob_learner.predict(instance)
    ob_evaluator.update(instance.y_index, prediction)
    ob_learner.train(instance)

ob_evaluator.accuracy()

79.05190677966102

### 2. Online Bagging using pipelines and transformers

If we want to perform some preprocessing, such as normalization or feature transformation, or a combination of both, we can chain multiple such `Transformer`s within a pipeline. The last step of a pipeline is a learner, such as capymoa classifier or regressor.

Similar as classifiers and regressors, pipelines support `train` and `test`. Hence, we can use them in the same way as we would use other capymoa learners. Internally, the pipeline object passes an incoming instance from one transformer to the next. It then returns the prediction of the classifier / regressor using the transformed instance.

Creating a pipeline consists of the following steps:
1. Create a stream instance
2. Initialize the transformers
3. Initialize the learner
4. Create the pipeline. Here, we use a `ClassifierPipeline`
5. Use the pipeline the same way as any other learner.

In [2]:
from capymoa.stream.preprocessing import MOATransformer
from capymoa.stream.preprocessing import ClassifierPipeline
from capymoa.stream import Stream
from moa.streams.filters import AddNoiseFilter, NormalisationFilter
from moa.streams import FilteredStream

# Open the stream from an ARFF file
elec_stream = stream_from_file(path_to_csv_or_arff=DATA_PATH+"electricity.arff")

# Creating the transformer
normalisation_transformer = MOATransformer(schema=elec_stream.get_schema(), moa_filter=NormalisationFilter())
add_noise_transformer = MOATransformer(schema=normalisation_transformer.get_schema(), moa_filter=AddNoiseFilter())

# Creating a learner
ob_learner = OnlineBagging(schema=add_noise_transformer.get_schema(), ensemble_size=5)

# Creating and populating the pipeline
pipeline = ClassifierPipeline(transformers=[add_noise_transformer, normalisation_transformer],
                              learner=ob_learner)

# Alternative:
# pipeline = ClassifierPipeline()
# pipeline.add_transformer(add_noise_transformer)
# pipeline.add_transformer(normalization_transformer)
# pipeline.set_learner(ob_learner)

# Creating the evaluator
ob_evaluator = ClassificationEvaluator(schema=elec_stream.get_schema()) 

while elec_stream.has_more_instances():
    instance = elec_stream.next_instance()
    prediction = pipeline.predict(instance)
    ob_evaluator.update(instance.y_index, prediction)
    pipeline.train(instance)

ob_evaluator.accuracy()

77.59313206214689

Last, we can also get a textual representation of the pipeline:

In [3]:
str(pipeline)

'Transformer(AddNoiseFilter) | Transformer(NormalisationFilter) | OnlineBagging'