Python implementation of MNDO (Multivariate Normal Distribution based Oversampling).
Article about this implemention
- Anaconda / Python 3.6
- tqdm 4.31.1
- imbalanced-learn 0.4.3
If you use Keel-datasets, you can use the following command.
python pre_dataset.py dataset_directory
- Preprocessing all files in a directory.
- Remove unnecessary lines and replace class labels. (Positive class -> 1, Negative class -> 0)
- Preprocessed data is saved in MNDO/Predataset/xxx.csv
Resampled(generated) data is stored in ./pos_data
python over-sampling.py data_path
python train.py data_path
train.py steps:
- Load data
- Over-sampling (MNDO, SMOTE, Borderline-SMOTE, ADASYN, SMOTE-ENN and SMOTE-Tomek Links)
- Scaling (Normalization or Standardization)
- Learning (SVM, Decision Tree and k-NN)
- Predict (Results is saved in MNDO/output/xxx.csv)
If you want to train all files, you can use this script:
./run.sh
- Provide as python library
- Kotaro Ambai, Hamido Fujita, MNDO: Multivariate Normal Distribution Based Over-Sampling for Binary Classification, Volume 303: New Trends in Intelligent Software Methodologies, Tools and Techniques, DOI: 10.3233/978-1-61499-900-3-425
- Study on improving prediction accuracy for imbalanced medical data using Multivariate Normal Distribution based Oversampling
Kotaro Ambai (baibai25)