Due to the changes introduced in the PR:
#5630
it became very inconvenient to use shared cache, and leads to potential crashes
previously, we introduced shared cache (via ListProxy) to speed up training and validation workflows significantly for users during multi-gpu training (with a minimal user changes, by simply setting CacheDataset(runtime_cahe=True))
#5365
with changes merged in PR #5630, all that simplicity was removed, and shared cache allocation and management is left to a user. A user needs to allocate it, and synchronize between processes, and even manually set it to be of proper length.
Internally CacheDataset assigns self.cache_num to keep track of a number of cached elements. Due to PR #5630, there is disconnect between self.cache_num and self._cache, they are not of the same length, and a user doesn't know the self.cache_num to allocate a proper length.
the new (and only way to use shared cache is) CacheDataset(runtime_cache = list_proxy)
- bug: len(list_proxy) == 0, and self.num_cache > 0, crash
- bug: len(list_proxy) < self.num_cache , crash
- potential bug: len(list_proxy) > self.num_cache, disconnect in length, can lead to unforeseen bugs in the future
- major inconvenience: a User needs to allocate cache manually as Manager().list() in master process and pass it to children. OR a user needs to allocate it in child process and use manual broadcasting to synchronize. Then ensure a proper length. All these steps will be same for all users and use cases. So now, there will be much more redundant coding every time someone wants to use shared memory caching.
Due to the changes introduced in the PR:
#5630
it became very inconvenient to use shared cache, and leads to potential crashes
previously, we introduced shared cache (via ListProxy) to speed up training and validation workflows significantly for users during multi-gpu training (with a minimal user changes, by simply setting CacheDataset(runtime_cahe=True))
#5365
with changes merged in PR #5630, all that simplicity was removed, and shared cache allocation and management is left to a user. A user needs to allocate it, and synchronize between processes, and even manually set it to be of proper length.
Internally CacheDataset assigns self.cache_num to keep track of a number of cached elements. Due to PR #5630, there is disconnect between self.cache_num and self._cache, they are not of the same length, and a user doesn't know the self.cache_num to allocate a proper length.
the new (and only way to use shared cache is) CacheDataset(runtime_cache = list_proxy)