Roadmap (tentative) #12

justheuristic · 2022-06-20T14:26:11Z

Current tasks:

prototype bloom points system @borzunov ([RESEARCH] LM API merit system #6 )
local tensor parallelism ( Add local tensor-parallel fwd/bwd #143 , using BlackSamorez/tesnor_parallel by @BlackSamorez and @IaroslavLisniak )
increase default max sequence length (from max token input token length #146 )
allow running a server without open ports @Vahe1994
option to download pre-quantized blocks (@mryab )
improved routing (@justheuristic )
newest-latest libp2p - @Vahe1994
touch up fine-tuning examples, make sure they work in reasonable time ( @justheuristic )
a way to temporarily shutdown petals server
- suggested by @craffel : when running a petals server on a machine that is often in use, people should be able to shut off petals servers while running their experiments
- suggested behavior: shut down asap, restart once gpus are not in use for T minutes
Wanna contribute?
- go to our discord server and ask around!
- always in demand:
  - contribute examples (recommended but not required: create an issue / draft first, before you code them)
  - OS support / hardware support ( e.g. see Getting Petals to run on macOS #147 )
  - more models: OPT-175B, switch-XXL, whatever comes into fashion
  - host a server! (see README)

End of december: cover more use cases

tutorial: generation notebook
tutorial: prompt-tuning notebook
PreLayer prompt-tuning - mentioned as one of the baselines in https://arxiv.org/abs/2106.09685 - DONE
inference with prompt-tuning ( Implement standard inference modes #13 by @artek0chumak)
advanced inference: beam search, constraints/fusion, LoRA/AdaMix ( @artek0chumak , Implement standard inference modes #13 )
some kind of hub for tutorials, e.g. a minimalistic website
alpha test: let more people play with 176B model (where? no-brainer: bigscience, stability, discord)
rich inference interface for designing custom generation algorithms (by: @artek0chumak )
let servers run requests with different priorities ( [DESIGN] auction-like priorities for servers #8 by: @GreenFatGuy )
By this point, we must answer the main questions: (1) will people use it? (2a) what for? (2b) why not?

End of ~~july~~ august: make it reliable, test with early adopters

make it so that servers cannot be killed by a bad client request ( Handle errors in client-runtime interaction learning-at-home/hivemind#3 by: @justheuristic)
find the best way to reduce the size of 176B model ( [RESEARCH] 8-bit bloom / opt #4 by: @TimDettmers )
let servers automatically find and serve the most in-demand layers ( @borzunov )
implement popular non-beam inference types ( Implement standard inference modes #13 by @artek0chumak )
compress the activations sent between client and server nodes ( by: @mryab )
find enough hardware to run the 176B model ( [TALKING] talk to potential users, find use cases #14 by: @justheuristic )
pre-alpha test: once it is stable enough, let some (trusted) folks play with it and get their feedback
submit a EMNLP system demonstration proposal ( https://2022.emnlp.org/calls/System_Demonstrations/ )
begin investigating: tutorials, documentation, examples

End of june: build a proof-of-concept

agree on the user interface (see [DESIGN] user experience #5 (comment) )
run simple (but correct!) inference with a smaller model (for generation)
do simple (but correct!) forward/backward with frozen layers (for prompt tuning)
client can dynamically choose which remote servers to use for inference ( by: @justheuristic )
create basic correctness tests for later
check if 8-bit compression is remotely feasible ( by: @TimDettmers )
it's okay if the code is not super reliable for now
it's okay if servers have to be set up manually for now
begin investigating: quantized weights, quantized communication, automatic server allocation, "bloom points"

Important, but not urgent:

multiplicative adapters from https://wandb.ai/learning-at-home/LM_OWT/reports/Parameter-sharing-revisited--VmlldzoxOTAxNjcx ?
non-critical performance improvements ( [CODE] miscellaneous small issues for later #11 )
better finetuning methods: LoRA, AdaMix, PreLayer (see LoRA/AdaMix), whatever is SoTA at the time of building
fully decentralized point system

justheuristic · 2022-08-18T11:22:56Z

[moved inference of prompt-tuned model and priorities from summer to current tasks]

bionicles · 2023-05-22T21:42:38Z

hey, how hard would it be to extend petals to support training these models in addition to the fine tuning?

mryab · 2023-05-23T01:27:07Z

Hi @bionicles, Petals is a system designed specifically for inference of large models: however, it shares a lot of the underlying architecture with SWARM Parallelism (see https://github.com/yandex-research/swarm for a WIP implementation, which I hope to update in the coming weeks).

The short answer is "definitely possible", but please keep in mind that pretraining is out of scope for Petals. Hence, it might be more useful to continue the discussion elsewhere (e.g. to the SWARM repo or our Discord server) if you have specific questions or suggestions

borzunov · 2023-05-23T02:47:52Z

Hi @bionicles,

A small addition to the @mryab's response - while Petals does not support training from scratch, both Petals and SWARM are based on hivemind, our library for training over the Internet, which can be used for pre-training. Please see Q3 of the FAQ's "General" section for details.

justheuristic pinned this issue Jun 20, 2022

justheuristic mentioned this issue Jan 2, 2023

Client-side code improvements [Yozh-todo-list] #16

Closed

10 tasks

borzunov unpinned this issue May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap (tentative) #12

Roadmap (tentative) #12

justheuristic commented Jun 20, 2022 •

edited by borzunov

justheuristic commented Aug 18, 2022 •

edited

bionicles commented May 22, 2023

mryab commented May 23, 2023

borzunov commented May 23, 2023 •

edited

Roadmap (tentative) #12

Roadmap (tentative) #12

Comments

justheuristic commented Jun 20, 2022 • edited by borzunov

justheuristic commented Aug 18, 2022 • edited

bionicles commented May 22, 2023

mryab commented May 23, 2023

borzunov commented May 23, 2023 • edited

justheuristic commented Jun 20, 2022 •

edited by borzunov

justheuristic commented Aug 18, 2022 •

edited

borzunov commented May 23, 2023 •

edited