Comparision between different Datasets #2457

Jingnan-Jia · 2021-06-25T19:42:50Z

Jingnan-Jia
Jun 25, 2021

At first, thanks very much for the development of MONAI. I love it!

I found MONAI provided CacheDataset, LMDBDataset, PersistentDataset, and SmartCache for the acceleration of data loading and transform.

My understanding of the 4 Datasets:
If I have 1000 3D CT scans for training,

CacheDataset will cache cache_num cases in memory before random transforms for training. So the data loading will be very fast for the first cache_num training samples and become slower for the rest 1000-cache_num samples in each epoch. The disadvantage is it require a lot of cpu memory if we want to cache all 1000 3D CT scans in one shot.
SmartCache will cache cache_num cases in memory before random transforms for training. But part of the cache_num cases will be replaced before next epoch according to replace_rate. So the data loading speed is stable for each batch of training data. This one seems better if we can not cache all 1000 3D CT scans in one shot.
PersistentDataset will save all cases in disk before random transforms. And load them again during random transforms. It will still load data from disk, but the loading time would be shorter because loading tensors seems faster than loading medical images.
I know little on LMDBDataset because I do not have the experience of LMDB database.

My question is:
Among the last 3 Datasets, do you have a recommendation on which Dataset is the fastest one? SmartCache, PersistentDataset or LMDBDataset ?

Nic-Ma · 2021-06-26T01:35:30Z

Nic-Ma
Jun 26, 2021
Maintainer

Hi @Jingnan-Jia ,

Thanks for your interest here.
You can find detailed comparison and tutorial in this notebook:
https://github.com/Project-MONAI/tutorials/blob/master/acceleration/dataset_type_performance.ipynb

Thanks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparision between different Datasets #2457

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Comparision between different Datasets #2457

Jingnan-Jia Jun 25, 2021

Replies: 1 comment

Nic-Ma Jun 26, 2021 Maintainer

Jingnan-Jia
Jun 25, 2021

Nic-Ma
Jun 26, 2021
Maintainer