# 1. Introduction to MOA Machine Learning for Streams
https://moa.cms.waikato.ac.nz/  
"MOA is the most popular open source framework for data stream mining, with a very active growing community (blog). It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation. Related to the WEKA project, MOA is also written in Java, while scaling to more demanding problems."
**It is written in Java.** Unfortunatelly I am more comfortable with Python & PySpark. I also noticed that Python (and R) libraries often have much better documentation than anything in Java (my opinion).  

There are 3 ways to use MOA :
  1. GUI - download prebuild MOA from official webpage or build the most current version from GitHub
    GUI is easy to use - if you know what you are looking for :D
  2. command line - run experiments by commands (or scripts), commands are show also in GUI and can be grouped to create scripts, for examples see:  
    https://github.com/5uperpalo/Machine-Learning/blob/master/Machine-Learning-Tools/weka_moa_elki_pig.md
  3. java code - run the code or use jupyter notebook
    example in this notebook


# 2. How to run Java and MOA in Jupyter notebook

  1. Step-by-step procedure to install java kernel to jupyter notebook is available at :  
    https://github.com/SpencerPark/IJava
    If you have JRE (like me), than you have to install JDK and you have to change/adjust PATH variable in your OS to point to new java.
  2. Use 'Jupyter magic' to load MOA either online from maven or offline from the downloaded MOA directory. Jupyter magic starts with symbol % and **MUST BE** in the 1st line of the cell. Commented line can go only after Jupyter magic.  
    // online  
    %maven nz.ac.waikato.cms.moa:moa:2019.05.0  
    // offline  
    %jars C:/#CVUT/work_current/FIREMAN/moa-release-2019.05.0/lib/moa.jar

# 3. Example of experiment with predefined task (clusterer / classifier evaluation) 

## 3.1. Load local MOA library

In [2]:
%jars C:/#CVUT/GitHub/FIREMAN/moa-release-2019.05.0/lib/moa.jar

## 3.2. Load and prepare dataset

There are different load methods for classification and clustering. MOA works with arff files. CSV file stream is available only for clustering. ARFF file for clustering kept raising java lang error.
=> I am using arff for classificaiton and csv for clustering

In [3]:
import moa.streams.ArffFileStream;
import moa.streams.clustering.FileStream;
import moa.streams.clustering.SimpleCSVStream;

String file_arff = "Tennessee_Event-Driven/datasets/dataset_standard_scaled.arff";
String file_csv = "Tennessee_Event-Driven/datasets/dataset_standard_scaled_moa.csv";
// classification
// specify last column as class column(-1)
ArffFileStream class_stream = new ArffFileStream(file_arff,-1); 
class_stream.prepareForUse();

// clustering
SimpleCSVStream clust_stream = new SimpleCSVStream();
clust_stream.csvFileOption.setValueViaCLIString(file_csv);
clust_stream.classIndexOption.setValueViaCLIString("-1");
clust_stream.prepareForUse();

## 3.3. Prepare learners

In [7]:
import moa.classifiers.meta.AdaptiveRandomForest;
//import moa.classifiers.lazy.kNN;
import moa.clusterers.clustree.ClusTree;

// kNN knn_classifier = new kNN();
AdaptiveRandomForest ARF = new AdaptiveRandomForest();
// set some example options
// https://www.cs.waikato.ac.nz/~abifet/MOA/API/classmoa_1_1options_1_1_int_option.html;
ClusTree clustree = new ClusTree();
clustree.horizonOption.setValue(100);
clustree.maxHeightOption.setValue(2);

## 3.4. Evaluate learners
Thre are 3 ways to find out methods and options we can/must apply.
  1. GUI - check the the options, final command(visible below the tabs in GUI) and find methods in code/documentation
  2. documentation - easiest to understand, but often not up-to-date and some methods are not well documented
https://www.cs.waikato.ac.nz/~abifet/MOA/API/classmoa_1_1tasks_1_1_evaluate_clustering.html#afb62647b811d912c9e985f173f17d9bc
  3. read comments in the code and figure out the options
    https://github.com/Waikato/moa/blob/master/moa/src/main/java/moa/tasks/EvaluateClustering.java

### 3.4.1. Evaluate classifier
* create task
* set stream/learner
* add adwin concept drift detection  
**Note** **:** 
File sizeofag-1.0.4.jar is being used to calculate RAM memory time(s) and in most of the cases is not important but generates a WARNING if not included. At the moment I have no idea how to include it.

In [11]:
import moa.tasks.EvaluatePrequentialCV;
// import moa.evaluation.AdwinClassificationPerformanceEvaluator;
import moa.evaluation.WindowClassificationPerformanceEvaluator;

EvaluatePrequentialCV eval = new EvaluatePrequentialCV();
eval.streamOption.setCurrentObject(class_stream);
//eval.learnerOption.setCurrentObject(knn_classifier);
eval.learnerOption.setCurrentObject(ARF);

//AdwinClassificationPerformanceEvaluator evalopt = new AdwinClassificationPerformanceEvaluator();
WindowClassificationPerformanceEvaluator evalopt = new WindowClassificationPerformanceEvaluator();
evalopt.widthOption.setValue(200);
eval.evaluatorOption.setCurrentObject(evalopt);
eval.sampleFrequencyOption.setValue(200);
eval.prepareForUse();
eval.dumpFileOption.setValueViaCLIString("C:/#CVUT/GitHub/FIREMAN/Tennessee_Event-Driven/results/moa_ARF_results.csv");
eval.doTask();

learning evaluation instances,evaluation time (cpu seconds),model cost (RAM-Hours),[avg] classified instances,[err] classified instances,[avg] classifications correct (percent),[err] classifications correct (percent),[avg] Kappa Statistic (percent),[err] Kappa Statistic (percent),[avg] Kappa Temporal Statistic (percent),[err] Kappa Temporal Statistic (percent),[avg] Kappa M Statistic (percent),[err] Kappa M Statistic (percent)
200.0,0.484375,0.0,200.0,0.0,100.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0
400.0,0.578125,0.0,400.0,0.0,100.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0
600.0,0.671875,0.0,600.0,0.0,100.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0
800.0,0.765625,0.0,800.0,0.0,100.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0
1000.0,0.84375,0.0,1000.0,0.0,100.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0
1200.0,0.9375,0.0,1200.0,0.0,100.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0
1400.0,1.078125,0.0,1400.0,0.0,100.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0
1600.0,1.46875,0.0,1600.0,0.0,95.85,0.6258327785172871,95.85,0.6258327785172871,-730.0,125.166555703457

### 3.4.2. Evaluate clusterer
* create task
* set stream/learner
* set evaluation metric to F1

In [5]:
import moa.tasks.EvaluateClustering;
EvaluateClustering eval = new EvaluateClustering();
eval.streamOption.setCurrentObject(clust_stream);
eval.learnerOption.setCurrentObject(clustree);
//eval.instanceLimitOption.setValueViaCLIString("-1");
eval.f1Option.setValueViaCLIString("f");
eval.prepareForUse();
eval.doTask();

EvaluateClustering does not support custom output file (> [filename]).
Check out the dump file to see the results (if you haven't specified, dumpClustering.csv by default).

#### Read/print the dumpClustering.csv file

In [10]:
try (BufferedReader br = new BufferedReader(new FileReader("dumpClustering.csv"))) {
   String line;
   while ((line = br.readLine()) != null) {
       System.out.println(line);
   }
}

Nr;Event;F1-P;F1-R;Purity;
0;;0.0;0.0;0.0;
1;;0.0;0.0;0.0;
2;;0.0;0.0;0.0;
3;;0.0;0.0;0.0;
4;;0.0030257186081694403;0.0015128593040847202;1.0;
5;;0.0;0.0;0.0;
6;;0.0;0.0;0.0;
7;;0.0;0.0;0.0;
8;;0.0;0.0;0.0;
9;;0.0;0.0;0.0;
10;;0.0;0.0;0.0;
11;;0.0;0.0;0.0;
12;;0.0;0.0;0.0;
13;;0.0;0.0;0.0;
14;;0.0;0.0;0.0;
15;;0.014184397163120567;0.0070921985815602835;1.0;
16;;0.0;0.0;0.0;
17;;0.0;0.0;0.0;
18;;0.0;0.0;0.0;
19;;0.0;0.0;0.0;
20;;0.0;0.0;0.0;
21;;0.005249343832020998;0.002624671916010499;1.0;
22;;0.0;0.0;0.0;
23;;0.0021253985122210413;0.0010626992561105207;1.0;
24;;0.0;0.0;0.0;
25;;0.0;0.0;0.0;
26;;0.0;0.0;0.0;
27;;0.0;0.0;0.0;
28;;0.011049723756906079;0.005524861878453039;1.0;
29;;0.0;0.0;0.0;
30;;0.0;0.0;0.0;


# 4. Example of implementation from Prof. Alber Bifet
https://github.com/abifet/moa-notebooks/blob/master/MOA-Prequential-Evaluation.ipynb  

Prequential Evaluation Example

Let’s run a very simple experiment: using a decision tree (Hoeffding Tree) with data generated from an artificial stream generator (RandomRBFGenerator).

We should start importing the classes that we need, and defining the stream and the learner.

In [1]:
%maven nz.ac.waikato.cms.moa:moa:2018.6.0

import moa.classifiers.trees.HoeffdingTree;
import moa.streams.generators.RandomRBFGenerator;

HoeffdingTree learner = new HoeffdingTree();
RandomRBFGenerator stream = new RandomRBFGenerator();

Now, we need to initialize the stream and the classifier:

In [2]:
stream.prepareForUse();
learner.setModelContext(stream.getHeader());
learner.prepareForUse();

And finally, let’s run a prequential evaluation, as in Tutorial 2 (Introduction to the API of MOA).  
**Note[Pavol]: It seems the original example is not usable anymore.**

In [4]:
%maven org.knowm.xchart:xchart:3.5.2
import org.knowm.xchart.*;
import moa.core.TimingUtils;
import com.yahoo.labs.samoa.instances.Instance;

int numInstances = 1000000;
int sampleSize = 1000;
boolean isTesting = true;
double[] xData = new double[numInstances/sampleSize];
double[] yData = new double[numInstances/sampleSize];

int numberSamplesCorrect = 0;
int numberSamples = 0;
boolean preciseCPUTiming = TimingUtils.enablePreciseTiming();
long evaluateStartTime = TimingUtils.getNanoCPUTimeOfCurrentThread();
while (stream.hasMoreInstances() && numberSamples < numInstances) {
    Instance trainInst = stream.nextInstance().getData();
    if (isTesting) {
            if (learner.correctlyClassifies(trainInst)){
                    numberSamplesCorrect++;
            }
    }
    if (numberSamples % sampleSize == 0){
        xData[numberSamples / sampleSize] = numberSamples / sampleSize;
        yData[numberSamples / sampleSize] = 100.0 * (double) numberSamplesCorrect/ (double) numberSamples;
    }
    numberSamples++;
    learner.trainOnInstance(trainInst);
}
double accuracy = 100.0 * (double) numberSamplesCorrect/ (double) numberSamples;
double time = TimingUtils.nanoTimeToSeconds(TimingUtils.getNanoCPUTimeOfCurrentThread()- evaluateStartTime);
System.out.println(numberSamples + " instances processed with " + accuracy + "% accuracy in "+time+" seconds.");

XYChart chart = QuickChart.getChart("Prequential Evaluation", "#Instances", "Accuracy", "y(x)", xData, yData);
BitmapEncoder.getBufferedImage(chart);

EvalException: null

In [5]:
import moa.DoTask;
DoTask.main("EvaluatePrequential -l trees.HoeffdingTree -i 1000000".split(" "));


{M}assive {O}nline {A}nalysis
Version:  18.06 June 2018
Copyright: (C) 2007-2018 University of Waikato, Hamilton, New Zealand
Web: http://moa.cms.waikato.ac.nz/

Can not access instrumentation environment...                                  
Please check if jar file containing SizeOfAgent class is 
specified in the java's "-javaagent" command line argument.
                                                                               
Task completed in 5.20s (CPU time)



learning evaluation instances,evaluation time (cpu seconds),model cost (RAM-Hours),classified instances,classifications correct (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent),Kappa M Statistic (percent),model training instances,model serialized size (bytes),tree size (nodes),tree size (leaves),active learning leaves,tree depth,active leaf byte size estimate,inactive leaf byte size estimate,byte size estimate overhead
100000.0,1.125,0.0,100000.0,92.10000000000001,84.09118369648397,82.93736501079914,82.63736263736264,100000.0,0.0,187.0,118.0,118.0,5.0,0.0,0.0,1.0
200000.0,1.71875,0.0,200000.0,93.2,86.13619960610498,85.15283842794761,84.29561200923789,200000.0,0.0,290.0,180.0,180.0,6.0,0.0,0.0,1.0
300000.0,2.0625,0.0,300000.0,93.7,87.0415165128104,86.76470588235296,85.14150943396228,300000.0,0.0,368.0,228.0,228.0,6.0,0.0,0.0,1.0
400000.0,2.40625,0.0,400000.0,95.1,90.00701548300785,90.18036072144288,88.57808857808857,400000.0,0.0,489.0,311.0,311.0,7.0,0.0,0.0,1.