New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[CODE] miscellaneous small issues for later #11

Closed

5 tasks done

justheuristic opened this issue Jun 19, 2022 · 0 comments

Closed

5 tasks done

[CODE] miscellaneous small issues for later #11

justheuristic opened this issue Jun 19, 2022 · 0 comments

Collaborator

justheuristic commented Jun 19, 2022 •

edited

Things that can be done to improve the code, but were left out to launch MVP faster:

server-side: connection_handler, backend, runtime
- modify task pool to deal cache handles as pure-python integers? (currently they are converted to tensors)
- when running inference over multiple layers on the same server, avoid passing layer activations between cpu<->gpu by
  storing them in MemoryCache
  - moved to Miscellaneous server-side improvements #68
- optimize disk space. r/n a server will eventually download all bloom blocks and store them into HF cache. Check for disk space in advance and/or figure out some cache eviction policy.
server-side: MemoryCache
- in allocate_cache, if there is not enough memory, wait for memory to be freed by existing tasks up to a given timeout.
  - note: this can be done using mp.Condtion
- allocate cache as one contigous buffer to avoid fragmentation
  - note: this feature is active as of #779959bc we will eventually switch back to non-cached version; rationale: did not observe significant issues from fragmentation, but contiguous buffers did complicate the code
- quantize cached values using bitsandbytes
  - wontfix (as of 2022.01.02): our current code relies on transformers' default bloom implementation, so we can't intervene in attention internals
- LRU-offload cache from gpu to ram?
  - moved to Miscellaneous server-side improvements #68
client-side: internals
- make begin_inference_session into a contextmanager

justheuristic mentioned this issue

Roadmap (tentative) #12

Open

32 tasks

justheuristic closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment