Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory and storage awareness #289

Merged
merged 13 commits into from Jul 23, 2022
Merged

Add memory and storage awareness #289

merged 13 commits into from Jul 23, 2022

Conversation

jpsamaroo
Copy link
Member

@jpsamaroo jpsamaroo commented Oct 15, 2021

This PR adds awareness of memory and storage (disk, etc.) to Dagger, adding a new "storage subsystem", similar to the existing "processor subsystem". The intention is that by modeling storage resources explicitly - specifically detecting their real-time capacities and free space, and providing methods to move data to-and-from storage - we can teach the scheduler to swap data to disk when memory is full, or any other kind of capacity-protecting movement or scheduling.

We will additionally begin tracking GC allocations at runtime, and use estimates of such allocations to limit scheduling when the scheduler knows that memory would otherwise become exhausted. This should make it easier to execute code over "big data", even when such data is too large for a single worker, or even all workers, to keep in memory at one time. This model should also be extensible to GPUs (which have their own memory space), so that GPU OOMs can be avoided.

Todo:

  • Estimate runtime memory allocation per-signature
  • Reintroduce capacity awareness to Sch (memory)
  • Estimate current storage resource utilization via returned thunk metadata
  • Implement a per-thunk option for specifying thunk memory utilization
  • Add dispatch-based mechanism for specifying thunk options
  • Track and ignore first run of signature in alloc calculation
  • Query the allocator before moving data or executing thunks to ensure that the ensuing allocations won't exceed the memory allocation limit (needs an API in MemPool, plus local tracking of thunks actively fetching/executing); thunks will be paused until there is space available for their data and estimated local allocations
  • Provide storage thunk option to indicate which MemPool StorageDevice to use (defaults to the global device)
  • DTable: Add tests for computations using more than physical RAM
  • Update docs for new automatic disk caching and memory awareness behavior
  • Add docs on Storage system design
  • (Optional) Estimate StorageDevice transfer times per-byte and compression amount (per-type?), and teach Sch to compute StorageDevice transfer costs in scheduling
  • (Optional) Add compressed RAM support via TranscodingStreams.jl (separate subpackage, DaggerZRAM.jl?)
  • (Optional) Implement a user-programmable interface for detecting thunk temporary memory allocations at runtime, and store this estimate per-signature
  • (Optional) Add memory wait costs to estimate_task_costs
  • (Optional) Re-enable capacity monitoring based on memory availability (need to implement a threshold for how many over-capacity thunks can be scheduled per-worker before pausing scheduling)

@krynju
Copy link
Member

krynju commented Oct 16, 2021

Hey, two general ideas:

  • caching to disk toggle - and I guess it should be off by default
  • explicitly caching chunks/suggesting a cache? - in DTable a lot of chunks can be marked as "free to cache", because it's known they are not needed in the next stages of processing

@jpsamaroo
Copy link
Member Author

jpsamaroo commented Jan 26, 2022

I ended up deciding to implement this logic in MemPool, since it's the most reasonable place to do this, and it has the greatest control over memory management: JuliaData/MemPool.jl#60. With that PR posted and basically ready to go, I'm slightly changing what we'll be implementing in Dagger:

  • (Optional) Provide allocator thunk option to indicate which MemPool StorageDevice to use (defaults to the global device)
  • Detect storage resource capacity for all allocator sub-devices upon first use
  • Implement a user-programmable interface for detecting thunk temporary memory allocations at runtime, and store this estimate per-signature
  • Query the allocator before moving data or executing thunks to ensure that the ensuing allocations won't exceed the memory allocation limit (needs an API in MemPool, plus local tracking of thunks actively fetching/executing); thunks will be paused until there is space available for their data and estimated local allocations
  • (Optional) Estimate storage device utilization via returned thunk metadata
  • (Optional) Add memory wait costs to estimate_task_costs
  • (Optional) Re-enable capacity monitoring based on memory availability (need to implement a threshold for how many over-capacity thunks can be scheduled per-worker before pausing scheduling)

The non-optional items in this list are the basics necessary to let Dagger handle "big data" problems; the MemPool PR also gives us swap-to-disk automatically, so we don't need to worry about that for now. The optional items are useful for improving scheduling decisions, which are helpful, but not strictly necessary (and will be partially obviated by future work-stealing).

Once I have a working alternative, I'll likely close this PR. PR updated!

@jpsamaroo
Copy link
Member Author

@krynju

caching to disk toggle - and I guess it should be off by default

By default we'll follow whatever MemPool.GLOBAL_DEVICE is set to, which defaults to memory-only.

explicitly caching chunks/suggesting a cache? - in DTable a lot of chunks can be marked as "free to cache", because it's known they are not needed in the next stages of processing

This would be a decision for the MemPool allocator to make. I think I'd like to see how far we can get with basic allocation strategies (maybe MRU or similar), before we consider passing such information directly to the allocator. I'd prefer not to end up with an API like Linux's madvise, so I'll take some time to think on it.

Track worker storage resources and devices
Track thunk return value allocations
Expand procutil option to time_util and alloc_util
Add storage option for specifying MemPool storage device
Format bytes in debug logs
Add locking around CHUNK_CACHE
Move return value Chunks to MemPool device
Chunk: Update tochunk docstring
Walk data to determine serialization safety
Drop Julia 1.6 support
Split suites out into individual files
Provide usage info when run without BENCHMARK env var
Add option to save logs to output file
Add DTable CSV/Arrow reading suite
@jpsamaroo jpsamaroo marked this pull request as ready for review July 23, 2022 18:23
@jpsamaroo jpsamaroo merged commit 17e5b2e into master Jul 23, 2022
@jpsamaroo jpsamaroo deleted the jps/storage branch July 23, 2022 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants