Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eager vs. Lazy Loading in Plaid Mode #30

Closed
Eta0 opened this issue Jun 7, 2023 · 0 comments · Fixed by #51
Closed

Eager vs. Lazy Loading in Plaid Mode #30

Eta0 opened this issue Jun 7, 2023 · 0 comments · Fixed by #51

Comments

@Eta0
Copy link
Contributor

Eta0 commented Jun 7, 2023

Plaid Mode Eagerness

TensorDeserializer's plaid_mode parameter was supposed to imply lazy_load, but it didn't until 2e129b0. This has strange implications, because this mode was not expected to work without lazy loading enabled.

Lazy vs. Eager Loading

Tensorizer supports two deserialization modes, eager and lazy, determined by the lazy_load parameter to the TensorDeserializer() constructor.

Eager Loading Mode

In eager mode, all tensors are loaded from the disk or network to their destination device up-front, during the TensorDeserializer() constructor call, and all data accesses after the deserializer's instantiation reach in and take a cached tensor from the internal TensorDeserializer._cache instance variable.

Lazy Loading Mode

In lazy loading mode, no tensors are loaded during the constructor, and each is instead loaded on-demand when you attempt to access deserializer[<key>], or iterate through a deserializer's entries (either in a loop or implicitly through deserializer.load_into_module()).

Normal lazy loading mode retains all loaded tensors in the deserializer's cache, but it has been assumed that plaid_mode could not do this while maintaining correctness.

Plaid Mode

Whereas the standard loading mode pre-allocates enough memory to hold all of the tensors expected to be loaded simultaneously in RAM, the plaid_mode optimization shares a single CPU memory region for all tensors being loaded, only as large as the single largest tensor, and overwrites each previous tensor's data upon loading the next tensor. plaid_mode is only legal for loading GPU tensors.

It had been our understanding that it was not valid for multiple tensors loaded this way to exist simultaneously, and that it would lead to corruption of the internal cache if attempted. The rationale for this is that when plaid_mode=True, entries that are loaded later will overwrite the tensor data in the CPU buffer associated with still-in-use tensors, and thus the contents of older cache entries may be invalidated. plaid_mode (originally called oneshot) was then intended as an optional flag to enable sharing a buffer for such loads, with the restriction that tensors could only be streamed—i.e., read and used before continuing, and that you could not go backwards again to access older tensors.

This is challenged by the existence of this bug, as the actual behaviour of specifying plaid_mode=True along with lazy_load=False has been to load all tensors up-front during the constructor, first from disk/network into the shared CPU buffer, then onto the GPU, whereupon each is stored in the cache and made available for later cached accesses. All of these GPU tensors appear to work fine, none conflicting with the others' existence.

The reason for this logically seems to be that each finished GPU tensor is no longer associated with the shared CPU buffer once they've been offloaded to the GPU (which happens immediately after loading, before continuing to the next tensor), so they can't actually interfere with each other, and everything is fine.

This behaviour makes sense, but we aren't sure whether or not to trust it, because the aforementioned behaviour of tensors corrupting one another was reportedly observed at some point during development, and we don't have a solid reason why it should be working fine like this now, it just is.

What went wrong with all the code ensuring deletion of older tensors?

The intended caching situation for a lazy-loading plaid_mode was as follows, keeping one item in the cache at a time:

deserializer = TensorDeserializer(..., plaid_mode=True)
# No tensors are loaded yet
_ = deserializer[<first key>]
# Now the <first key> tensor is loaded and temporarily cached
_ = deserializer[<first key>]
# The <first key> tensor is simply pulled from the cache
_ = deserializer[<second key>]
# The <first key> tensor is cleared from the cache, and then the <second key> tensor is loaded and then cached
_ = deserializer[<first key>]
# This raises an error, because the <first key> tensor has already been marked as unavailable in the cache

(Note that there is a distinction between keys that have never been loaded and ones that have already been loaded and then evicted.)

However, a bug caused the cache to be directly pre-populated even in plaid_mode during the constructor unless lazy_load=True was specified, skipping over the __getitem__ logic for cache deletions, so it instead did this:

deserializer = TensorDeserializer(..., plaid_mode=True)
# All tensors are loaded
_ = deserializer[<first key>]
# The <first key> tensor is pulled from the cache
_ = deserializer[<first key>]
# The <first key> tensor is pulled from the cache (again)
_ = deserializer[<second key>]
# The <first key> tensor is cleared from the cache, and <second key> is pulled from the cache
_ = deserializer[<first key>]
# This raises an error, because the <first key> tensor has already been marked as unavailable in the cache

Starting with everything cached, and then slowly clearing them away. This meant that the behaviour of "not being able to go back" was enforced correctly, yet only artificially, because they weren't actually being streamed.

What Happens Now?

Since the release of Tensorizer v1.0.0 debuting plaid_mode, we have neither encountered reports of it corrupting loaded tensors, nor have any of the developers ever seen it occur when running the test suite. This seems to suggest that plaid_mode does not need a lazy_load restriction (though it does still need its GPU-only restriction). However, suspicion remains as to why it didn't work that way before, and seemingly does work now, and whether running that way is truly guaranteed to be correct.

For now, the bug is fixed in 2e129b0 and plaid_mode implies lazy_load as it was originally intended to do. This may be updated to allow disabling lazy_load later, if we have confidence that it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant