Open-Set Semi-Supervised Learning for Long-Tailed Medical Datasets

Daniya Najiha¹, Jean Lahoud¹, Mustansar Fiaz², Amandeep Kumar³, Hisham Cholakkal¹
Mohamed Bin Zayed University of Artificial Intelligence¹, IBM Research², Johns Hopkins University³

This is the official repository for our ISBI 2025 paper. [Paper] [Models and Logs]

Abstract: Many practical medical imaging scenarios include categories that are under-represented but still crucial. The relevance of image recognition models to real-world applications lies in their ability to generalize to these rare classes as well as unseen classes. Real-world generalization requires taking into account the various complexities that can be encountered in the real-world. First, training data is highly imbalanced, which may lead to model exhibiting bias toward the more frequently represented classes. Moreover, real-world data may contain unseen classes that need to be identified, and model performance is affected by the data scarcity. While medical image recognition has been extensively addressed in the literature, current methods do not take into account all the intricacies in the real-world scenarios. To this end, we propose an open-set learning method for highly imbalanced medical datasets using a semi-supervised approach. Understanding the adverse impact of long-tail distribution at the inherent model characteristics, we implement a regularization strategy at the feature level complemented by a classifier normalization technique. We conduct extensive experiments on the publicly available datasets, ISIC2018, ISIC2019, and TissueMNIST with various numbers of labelled samples. Our analysis shows that addressing the impact of long-tail data in classification significantly improves the overall performance of the network in terms of closed-set and open-set accuracies on all datasets.

Results

Preparation

Required Packages

We suggest first creating a conda environment:

conda create --name openltr python=3.8

then use pip to install required packages:

pip install -r requirements.txt

Datasets

Please put the datasets in the ./data folder (or create soft links) as follows:

OpenLTR
├── config
    └── ...
├── data
    ├── ISIC2018
        └── ISIC2018_Dataset
             └── AK
             └── BCC
             └── ...
     ├── ISIC2019
        └── ISIC2019_Dataset
            └── ....
├── semilearn
    └── ...
└── ...

For the train-test splits of ISIC2018, we follow ECL.

Training

Here is an example to train OpenLTR on ISIC2018 with 25 % of data labels per seen class (i.e. with 1311 labeled samples in total).

# seed = 1
CUDA_VISIBLE_DEVICES=0 python train.py --c config/openset_cv/openltr/isic2018.yaml

Evaluation

After training, the best checkpoints will be saved in ./saved_models. The closed-set performance has been reported in the training logs. For the open-set evaluation, please see eval_io.py.

Acknowledgments

We sincerely thank the authors of IOMatch (ICCV'23) for creating such an awesome SSL benchmark.

Citation

@INPROCEEDINGS{10981231,
  author={Kareem, Daniya Najiha A. and Lahoud, Jean and Fiaz, Mustansar and Kumar, Amandeep and Cholakkal, Hisham},
  booktitle={2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)}, 
  title={Open-Set Semi-Supervised Learning for Long-Tailed Medical Datasets}, 
  year={2025},
  volume={},
  number={},
  pages={1-5},
  keywords={Ethics;Heavily-tailed distribution;Image recognition;Open Access;Conferences;Training data;Skin;Data models;Standards;Biomedical imaging},
  doi={10.1109/ISBI60581.2025.10981231}}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
MEDIA		MEDIA
config/openset_cv/openltr		config/openset_cv/openltr
data/isic2018_openset/labeled_idx		data/isic2018_openset/labeled_idx
semilearn		semilearn
README.md		README.md
eval_io.py		eval_io.py
readme		readme
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-Set Semi-Supervised Learning for Long-Tailed Medical Datasets

Results

Preparation

Required Packages

Datasets

Training

Evaluation

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Open-Set Semi-Supervised Learning for Long-Tailed Medical Datasets

Results

Preparation

Required Packages

Datasets

Training

Evaluation

Acknowledgments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages