You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TensorDeserializer's plaid_mode parameter was supposed to imply lazy_load, but it didn't until 2e129b0. This has strange implications, because this mode was not expected to work without lazy loading enabled.
Lazy vs. Eager Loading
Tensorizer supports two deserialization modes, eager and lazy, determined by the lazy_load parameter to the TensorDeserializer() constructor.
Eager Loading Mode
In eager mode, all tensors are loaded from the disk or network to their destination device up-front, during the TensorDeserializer() constructor call, and all data accesses after the deserializer's instantiation reach in and take a cached tensor from the internal TensorDeserializer._cache instance variable.
Lazy Loading Mode
In lazy loading mode, no tensors are loaded during the constructor, and each is instead loaded on-demand when you attempt to access deserializer[<key>], or iterate through a deserializer's entries (either in a loop or implicitly through deserializer.load_into_module()).
Normal lazy loading mode retains all loaded tensors in the deserializer's cache, but it has been assumed that plaid_mode could not do this while maintaining correctness.
Plaid Mode
Whereas the standard loading mode pre-allocates enough memory to hold all of the tensors expected to be loaded simultaneously in RAM, the plaid_mode optimization shares a single CPU memory region for all tensors being loaded, only as large as the single largest tensor, and overwrites each previous tensor's data upon loading the next tensor. plaid_mode is only legal for loading GPU tensors.
It had been our understanding that it was not valid for multiple tensors loaded this way to exist simultaneously, and that it would lead to corruption of the internal cache if attempted. The rationale for this is that when plaid_mode=True, entries that are loaded later will overwrite the tensor data in the CPU buffer associated with still-in-use tensors, and thus the contents of older cache entries may be invalidated. plaid_mode (originally called oneshot) was then intended as an optional flag to enable sharing a buffer for such loads, with the restriction that tensors could only be streamed—i.e., read and used before continuing, and that you could not go backwards again to access older tensors.
This is challenged by the existence of this bug, as the actual behaviour of specifying plaid_mode=True along with lazy_load=False has been to load all tensors up-front during the constructor, first from disk/network into the shared CPU buffer, then onto the GPU, whereupon each is stored in the cache and made available for later cached accesses. All of these GPU tensors appear to work fine, none conflicting with the others' existence.
The reason for this logically seems to be that each finished GPU tensor is no longer associated with the shared CPU buffer once they've been offloaded to the GPU (which happens immediately after loading, before continuing to the next tensor), so they can't actually interfere with each other, and everything is fine.
This behaviour makes sense, but we aren't sure whether or not to trust it, because the aforementioned behaviour of tensors corrupting one another was reportedly observed at some point during development, and we don't have a solid reason why it should be working fine like this now, it just is.
What went wrong with all the code ensuring deletion of older tensors?
The intended caching situation for a lazy-loading plaid_mode was as follows, keeping one item in the cache at a time:
deserializer=TensorDeserializer(..., plaid_mode=True)
# No tensors are loaded yet_=deserializer[<firstkey>]
# Now the <first key> tensor is loaded and temporarily cached_=deserializer[<firstkey>]
# The <first key> tensor is simply pulled from the cache_=deserializer[<secondkey>]
# The <first key> tensor is cleared from the cache, and then the <second key> tensor is loaded and then cached_=deserializer[<firstkey>]
# This raises an error, because the <first key> tensor has already been marked as unavailable in the cache
(Note that there is a distinction between keys that have never been loaded and ones that have already been loaded and then evicted.)
However, a bug caused the cache to be directly pre-populated even in plaid_mode during the constructor unless lazy_load=True was specified, skipping over the __getitem__ logic for cache deletions, so it instead did this:
deserializer=TensorDeserializer(..., plaid_mode=True)
# All tensors are loaded_=deserializer[<firstkey>]
# The <first key> tensor is pulled from the cache_=deserializer[<firstkey>]
# The <first key> tensor is pulled from the cache (again)_=deserializer[<secondkey>]
# The <first key> tensor is cleared from the cache, and <second key> is pulled from the cache_=deserializer[<firstkey>]
# This raises an error, because the <first key> tensor has already been marked as unavailable in the cache
Starting with everything cached, and then slowly clearing them away. This meant that the behaviour of "not being able to go back" was enforced correctly, yet only artificially, because they weren't actually being streamed.
What Happens Now?
Since the release of Tensorizer v1.0.0 debuting plaid_mode, we have neither encountered reports of it corrupting loaded tensors, nor have any of the developers ever seen it occur when running the test suite. This seems to suggest that plaid_mode does not need a lazy_load restriction (though it does still need its GPU-only restriction). However, suspicion remains as to why it didn't work that way before, and seemingly does work now, and whether running that way is truly guaranteed to be correct.
For now, the bug is fixed in 2e129b0 and plaid_mode implies lazy_load as it was originally intended to do. This may be updated to allow disabling lazy_load later, if we have confidence that it works.
The text was updated successfully, but these errors were encountered:
Plaid Mode Eagerness
TensorDeserializer
'splaid_mode
parameter was supposed to implylazy_load
, but it didn't until 2e129b0. This has strange implications, because this mode was not expected to work without lazy loading enabled.Lazy vs. Eager Loading
Tensorizer supports two deserialization modes, eager and lazy, determined by the
lazy_load
parameter to theTensorDeserializer()
constructor.Eager Loading Mode
In eager mode, all tensors are loaded from the disk or network to their destination
device
up-front, during theTensorDeserializer()
constructor call, and all data accesses after the deserializer's instantiation reach in and take a cached tensor from the internalTensorDeserializer._cache
instance variable.Lazy Loading Mode
In lazy loading mode, no tensors are loaded during the constructor, and each is instead loaded on-demand when you attempt to access
deserializer[<key>]
, or iterate through adeserializer
's entries (either in a loop or implicitly throughdeserializer.load_into_module()
).Normal lazy loading mode retains all loaded tensors in the deserializer's cache, but it has been assumed that
plaid_mode
could not do this while maintaining correctness.Plaid Mode
Whereas the standard loading mode pre-allocates enough memory to hold all of the tensors expected to be loaded simultaneously in RAM, the
plaid_mode
optimization shares a single CPU memory region for all tensors being loaded, only as large as the single largest tensor, and overwrites each previous tensor's data upon loading the next tensor.plaid_mode
is only legal for loading GPU tensors.It had been our understanding that it was not valid for multiple tensors loaded this way to exist simultaneously, and that it would lead to corruption of the internal cache if attempted. The rationale for this is that when
plaid_mode=True
, entries that are loaded later will overwrite the tensor data in the CPU buffer associated with still-in-use tensors, and thus the contents of older cache entries may be invalidated.plaid_mode
(originally calledoneshot
) was then intended as an optional flag to enable sharing a buffer for such loads, with the restriction that tensors could only be streamed—i.e., read and used before continuing, and that you could not go backwards again to access older tensors.This is challenged by the existence of this bug, as the actual behaviour of specifying
plaid_mode=True
along withlazy_load=False
has been to load all tensors up-front during the constructor, first from disk/network into the shared CPU buffer, then onto the GPU, whereupon each is stored in the cache and made available for later cached accesses. All of these GPU tensors appear to work fine, none conflicting with the others' existence.The reason for this logically seems to be that each finished GPU tensor is no longer associated with the shared CPU buffer once they've been offloaded to the GPU (which happens immediately after loading, before continuing to the next tensor), so they can't actually interfere with each other, and everything is fine.
This behaviour makes sense, but we aren't sure whether or not to trust it, because the aforementioned behaviour of tensors corrupting one another was reportedly observed at some point during development, and we don't have a solid reason why it should be working fine like this now, it just is.
What went wrong with all the code ensuring deletion of older tensors?
The intended caching situation for a lazy-loading
plaid_mode
was as follows, keeping one item in the cache at a time:(Note that there is a distinction between keys that have never been loaded and ones that have already been loaded and then evicted.)
However, a bug caused the cache to be directly pre-populated even in
plaid_mode
during the constructor unlesslazy_load=True
was specified, skipping over the__getitem__
logic for cache deletions, so it instead did this:Starting with everything cached, and then slowly clearing them away. This meant that the behaviour of "not being able to go back" was enforced correctly, yet only artificially, because they weren't actually being streamed.
What Happens Now?
Since the release of Tensorizer v1.0.0 debuting
plaid_mode
, we have neither encountered reports of it corrupting loaded tensors, nor have any of the developers ever seen it occur when running the test suite. This seems to suggest thatplaid_mode
does not need alazy_load
restriction (though it does still need its GPU-only restriction). However, suspicion remains as to why it didn't work that way before, and seemingly does work now, and whether running that way is truly guaranteed to be correct.For now, the bug is fixed in 2e129b0 and
plaid_mode
implieslazy_load
as it was originally intended to do. This may be updated to allow disablinglazy_load
later, if we have confidence that it works.The text was updated successfully, but these errors were encountered: