A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

This repository provides the source code, algorithms, experimental setup, and results for the experimental review on imbalanced data streams submitted for publication to the journal Machine Learning. The manuscript preprint is available at arXiv.

This website provides interactive plots to display the metrics over time and result tables for each experiment, algorithm, and benchmark.

Experiments

The package src/main/java/experiments provides the scripts for binary and multi-class experiments. It comprises the following experiments:

Binary class experiment	Script
Static imbalance ratio	`binary/Static_Imbalance_Ratio`
Increasing imbalance ratio	`binary/Dynamic_Imbalance_Ratio_Increasing`
Increasing then decreasing imbalance ratio	`binary/Dynamic_Imbalance_Ratio_Increasing_Decreasing`
Flipping imbalance ratio	`binary/Dynamic_Imbalance_Ratio_Flipping`
Flipping then reflipping imbalance ratio	`binary/Dynamic_Imbalance_Ratio_Flipping_Reflipping`
Instance-level difficulties	`binary/Instance_Level_Difficulties`
Concept drift and static imbalance ratio	`binary/Concept_Drift_Static_Imbalance_Ratio`
Concept drift and dynamic imbalance ratio	`binary/Concept_Drift_Dynamic_Imbalance_Ratio_Increasing`
Real-world imbalanced datasets	`binary/Datasets`

Multi-class experiment	Script
Static imbalance ratio	`multiclass/Static_Imbalance_Ratio`
Dynamic imbalance ratio	`multiclass/Dynamic_Imbalance_Ratio`
Concept drift and static imbalance ratio	`multiclass/Concept_Drift_Static_Imbalance_Ratio`
Concept drift and dynamic imbalance ratio	`multiclass/Concept_Drift_Dynamic_Imbalance_Ratio`
Real-world imbalanced datasets	`multiclass/Datasets`
Semi-synthetic imbalanced datasets	`multiclass/Semisynthetic`

Algorithms

The package src/main/java/moa/classifiers contains 24 state-of-the-art algorithms for data streams, including those inherited from the MOA 2021.07 dependency in the pom.xml file.

Algorithm	Script
IRL	`meta.imbalanced.RebalanceStream`
C-SMOTE	`meta.imbalanced.CSMOTE`
VFC-SMOTE	`meta.imbalanced.VFCSMOTE`
CSARF	`meta.CSARF`
GHVFDT	`trees.GHVFDT`
HDVFDT	`trees.HDVFDT`
ARF	`meta.AdaptiveRandomForest`
KUE	`meta.KUE`
LB	`meta.LeveragingBag`
OBA	`meta.OzaBagAdwin`
SRP	`meta.StreamingRandomPatches`
ESOS-ELM	`ann.meta.ESOS_ELM`
CALMID	`active.CALMID`
MICFOAL	`active.MicFoal`
ROSE	`meta.imbalanced.ROSE`
OADA	`meta.imbalanced.OnlineAdaBoost`
OADAC2	`meta.imbalanced.OnlineAdaC2`
ARFR	`meta.imbalanced.AdaptiveRandomForestResampling`
SMOTE-OB	`meta.imbalanced.SMOTEOB`
OSMOTE	`meta.imbalanced.OnlineSMOTEBagging`
OOB	`meta.OOB`
UOB	`meta.UOB`
ORUB	`meta.imbalanced.OnlineRUSBoost`
OUOB	`meta.imbalanced.OnlineUnderOverBagging`

Evaluators

The package src/main/java/moa/evaluation contains the performance evaluators.

ImbalancedPerformanceEvaluator is used for binary class experiments reporting G-Mean, AUC, and Kappa metrics.

MultiClassImbalancedPerformanceEvaluator is used for multi-class experiments reporting G-Mean, PMAUC, and Kappa metrics. The evaluators also report the runtime (seconds), memory consumption (RAM-Hours), and the complete confusion matrix for posterior analysis.

Results

This website provides interactive plots to display the metrics over time and result tables for each experiment, algorithm, and benchmark.

Complete csv results (median for 5 seeds) for all experiments, algorithms, and benchmarks reported on the manuscript are available to download to facilitate the transparency, reproducibility, and extendability of the experimental study.

Complete csv results are provided for 5 seeds:

ARFF files are available to download for binary class datasets, multi-class datasets, and semi-synthetic datasets.

How to add a new algorithm, generator, or evaluator in the framework

We use the MOA framework and its class hierarchy. Adding a new algorithm, generator, or evaluator is the same as adding it in MOA (see MOA documentation).

First, import the source code to your favorite IDE (Eclipse, VS code, IntelliJ, etc) using Git.

To add a new algorithm, e.g. MyAlgorithmName, create a new Java file at src/main/java/moa/classifiers/MyAlgorithmName.java. The class must extend the AbstractClassifier class and implement the public void trainOnInstanceImpl(Instance instance) and public double[] getVotesForInstance(Instance instance) methods.

To add a new generator, e.g. MyGeneratorName, create a new Java file at src/main/java/moa/streams/generators/MyGeneratorName.java. The class must implement the InstanceStream interface and the public InstanceExample nextInstance() method.

To add a new performance metric you can edit an existing evaluator (e.g. src/main/java/moa/evaluation/ImbalancedPerformanceEvaluator.java) to add the metric calculation. Alternatively, you can add a new evaluator, e.g. MyEvaluatorName. To do so, create a new Java file at src/main/java/moa/evaluation/MyEvaluatorName.java. The class must implement the ClassificationPerformanceEvaluator interface, and the public void addResult(Example<Instance> exampleInstance, double[] classVotes) and public Measurement[] getPerformanceMeasurements() methods.

The next step is to compile the source code using Maven (pom.xml file). Use the command mvn package or your IDE options to build the jar file target/imbalanced-streams-1.0-jar-with-dependencies.jar

Finally, use any of the scripts provided at src/main/java/experiments for the different groups of experiments and add your algorithm, generator, or evaluator. These scripts will generate the command lines used to run the experiments.

Citation

@article{aguiar2024survey,
  author={Aguiar, Gabriel and Krawczyk, Bartosz and Cano, Alberto},
  title={A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework},
  journal={Machine Learning},
  volume={113},
  pages={4165-4243},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src/main/java		src/main/java
.classpath		.classpath
.project		.project
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml
sizeofag-1.0.4.jar		sizeofag-1.0.4.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

Experiments

Algorithms

Evaluators

Results

How to add a new algorithm, generator, or evaluator in the framework

Citation

About

Releases

Packages

Languages

License

canoalberto/imbalanced-streams

Folders and files

Latest commit

History

Repository files navigation

A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

Experiments

Algorithms

Evaluators

Results

How to add a new algorithm, generator, or evaluator in the framework

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages