Comparision between different Datasets #2457
Jingnan-Jia
started this conversation in
General
Replies: 1 comment
-
Hi @Jingnan-Jia , Thanks for your interest here. Thanks. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
At first, thanks very much for the development of MONAI. I love it!
I found MONAI provided
CacheDataset
,LMDBDataset
,PersistentDataset
, andSmartCache
for the acceleration of data loading and transform.My understanding of the 4 Datasets:
If I have 1000 3D CT scans for training,
CacheDataset
will cachecache_num
cases in memory before random transforms for training. So the data loading will be very fast for the firstcache_num
training samples and become slower for the rest1000-cache_num
samples in each epoch. The disadvantage is it require a lot of cpu memory if we want to cache all 1000 3D CT scans in one shot.SmartCache
will cachecache_num
cases in memory before random transforms for training. But part of thecache_num
cases will be replaced before next epoch according toreplace_rate
. So the data loading speed is stable for each batch of training data. This one seems better if we can not cache all 1000 3D CT scans in one shot.PersistentDataset
will saveall
cases in disk before random transforms. And load them again during random transforms. It will still load data from disk, but the loading time would be shorter because loading tensors seems faster than loading medical images.LMDBDataset
because I do not have the experience of LMDB database.My question is:
Among the last 3 Datasets, do you have a recommendation on which Dataset is the fastest one?
SmartCache
,PersistentDataset
orLMDBDataset
?Beta Was this translation helpful? Give feedback.
All reactions