Skip to content

Class imbalance correction algorithm for multiple-instance data

License

Notifications You must be signed in to change notification settings

dstarrago/mismote

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MISMOTE

Class imbalance correction algorithm for multiple-instance data

In contrasts with regular classification problems, in which each example has a unique description, in multiple-instance classification (MIC) problems, each example has many descriptions. In the same way as regular classification problems, multiple-instances classification problems may suffer from the class imbalance problem. A data set suffer from the class imbalance problem when one or more of its classes are underrepresented, which means that the size of these classes is much smaller than that of the rest of the classes. Underrepresented classes are hard to learn by classification algorithms, and their instances are frequently misclassified in favor of the larger classes.

A successful method to deal with the class imbalance problem in regular data classification is called SMOTE. SMOTE generates synthetic examples in an underrepresented class through interpolation of training examples belonging to that class. With MISMOTE, we brought SMOTE's idea to multiple-instance classification, creating synthetic bags in underrepresented classes.

You can find a complete description of MISMOTE, as well as its experimental results in

  • Sanchez Tarrago, D., Cornelis, C., Bello, R., Herrera, F.: MISMOTE: synthetic minority over-sampling technique for multiple instance learning with imbalanced data. Central University Marta Abreu de Las Villas (2014). (text)

Developed with:

  • Java 1.8
  • NetBeans IDE 8.2

Dependencies:

  • Weka 3.7