[CODE] port HF bloom to hivemind #2

justheuristic · 2022-05-31T19:04:46Z

Why: so we can play with in in inference mode

Original bloom code: huggingface/transformers#17474

The quest is to

implement bloom transformer layer as a hivemind expert.
prepare a huggingface model that only has bloom embeddings and logits, but runs all transformer layers via hivemind.RemoteExpert

justheuristic · 2022-06-02T00:52:42Z

Naively quantized bloom 6b on CPU: http://nora:8800/notebooks/decentralized/jheuristic/bloom_test/cpu-qint-matmul.ipynb
source: https://gist.github.com/justheuristic/47830c9ddfd45889894e69d4f45ce233

justheuristic · 2022-06-02T02:06:02Z

On client-side computations

Since we plan to run embeddings/logits on client side, we need to compute them efficiently.
Embedding computation is cheap AF, but logits are more complicated

Computing final logits on colab CPU costs over a minute per one token

Solution 1: use fast KNN

top-99% probabilities are held by the 100 most likely tokens
use HNSW to find top-100 tokens that have highest dot product
FAISS of ScaNN for fast nearest neighbor search

Solution 2: just use GPU

colab T4 in fp16: 30ms per token (no longer a bottleneck)
kudesnik m40 (~2x colab k80): 67ms, still very much acceptable
gpus might not be available

Current opinion: use GPUs, think about fast CPU mode later.

justheuristic · 2022-06-12T03:58:54Z

Copied bloom version huggingface/transformers@ca2a55e
from huggingface

Their attention code is spectacularly bad, see #1

justheuristic · 2022-06-12T04:00:49Z

Next steps:

push individual bloom layers to huggingface hub
implement BloomBlock as hivemind.moe.server.ExpertBackend

justheuristic · 2022-06-19T16:56:53Z

As of 8221469

Pushing model to hub is handled via python -m cli.convert_model --many_args_here, see README for usage example
Server can run forward, backward and inference of bloom blocks, see README for instructions on how to start a server

justheuristic added the development label May 31, 2022

justheuristic self-assigned this Jun 2, 2022

justheuristic transferred this issue from another repository Jun 12, 2022

justheuristic closed this as completed Jun 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CODE] port HF bloom to hivemind #2

[CODE] port HF bloom to hivemind #2

justheuristic commented May 31, 2022 •

edited

justheuristic commented Jun 2, 2022 •

edited

justheuristic commented Jun 2, 2022 •

edited

justheuristic commented Jun 12, 2022

justheuristic commented Jun 12, 2022 •

edited

justheuristic commented Jun 19, 2022

[CODE] port HF bloom to hivemind #2

[CODE] port HF bloom to hivemind #2

Comments

justheuristic commented May 31, 2022 • edited

justheuristic commented Jun 2, 2022 • edited

justheuristic commented Jun 2, 2022 • edited

On client-side computations

justheuristic commented Jun 12, 2022

justheuristic commented Jun 12, 2022 • edited

justheuristic commented Jun 19, 2022

justheuristic commented May 31, 2022 •

edited

justheuristic commented Jun 2, 2022 •

edited

justheuristic commented Jun 2, 2022 •

edited

justheuristic commented Jun 12, 2022 •

edited