-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changed basic hf server to support quantization and streaming #2293
Merged
Merged
Changes from all commits
Commits
Show all changes
70 commits
Select commit
Hold shift + click to select a range
6afd4fa
Starting multiple workers in inference image when multiple GPUs avail…
yk a0360fb
added some echoes
yk 8bd03fc
added some echoes
yk 1f2ebdc
more entrypoint stuff
yk bf1b77d
master port
yk 7419753
adding sleep
yk ef4a4e0
more configs
yk 9cb9075
Use LLaMA impl of Huggingface Transformers (#2263)
andreaskoepf cf161e9
Fix GPTNeoX-20B training (#2240)
dvruette 4359c6a
Updated Turkish language (#2270)
irfantogluk 54892e2
Add loader for CodeAlpaca-20k & gpt4all_pruned dataset (#2273)
andreaskoepf 01f94f7
Add support for Cerebras-GPT for training (#2276)
olliestanley 3c1335e
typo in parsing openai/summarize_from_feedback (#2268)
mikegarts 7e05077
Add rng_seed parameter to trainers (#2254)
andreaskoepf 1b72c07
Computing message queue positions (#2235)
yk e88efb7
Remove assigning eos token id (llama compatibility) (#2280)
andreaskoepf a167e10
Fix call-to-action responsiveness (#2290)
theopfr 696889c
Added max size to work queue and an error response if full when enque…
yk 7d47021
Fix loading of Nebulous/gpt4all_pruned dataset (#2291)
andreaskoepf e6ad876
changed basic hf server to support quantization and streaming
yk 9924bab
updated main script
yk dcf7d43
ctrl c trap
yk ad12aa7
replacing llama config
yk 90297b2
sleep param
yk f7ff758
removed dtypes
yk 6e4e0b8
loading in thread
yk 2b7e048
removed signal
yk 417d8ff
removed double start
yk 18de6b7
bugfix
yk a6e7550
bugfix
yk 8dd1866
exception handling in stream
yk b915ac4
bug handling
yk 1ea2323
bug handling
yk 710833d
bugfix
yk 57a7eec
bugfix
yk b9cbe0c
bugfix
yk b3a0599
bugfix
yk 1ba8929
bugfix
yk 7d0480a
bugfix
yk c649c92
bugfix
yk ecf60b9
logging
yk 743e409
bugfix
yk b66fd96
bugfix
yk 3e0ea1f
bugfix
yk 7610f21
bugfix
yk 0835395
bugfix
yk fd926b5
bugfix
yk e16a990
logging
yk 9d35bfa
logging
yk d0e4412
vocab size fix
yk 5a44c93
vocab size fix
yk 86ea260
vocab size fix
yk 8139699
vocab size fix
yk b27b6c4
vocab size fix
yk 072d1d7
vocab size fix
yk 1b9d0c3
vocab size fix
yk 9caf954
vocab size fix
yk e733abd
vocab size fix
yk ed38c87
decode hack
yk 17e43bd
more fixes
yk 6b5788e
added back token hack
yk 9701b0d
removed logging
yk 2ad7820
feedback
yk f517214
decode fix
yk 000f12c
warmup change
yk 9047652
torch threads
yk 82506ea
model loading fix
yk d27e712
delaying tokens by 1
yk 54e9994
Merge branch 'main' into hf-worker-server-bnb
yk 3195604
feedback
yk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
import os | ||
import signal | ||
import sys | ||
from pathlib import Path | ||
|
||
import huggingface_hub | ||
|
||
|
||
def terminate(signum, frame): | ||
print("Terminating...") | ||
sys.exit(0) | ||
|
||
|
||
if __name__ == "__main__": | ||
signal.signal(signal.SIGINT, terminate) | ||
model_id = os.getenv("MODEL_ID") | ||
snapshot_dir = Path(huggingface_hub.snapshot_download(model_id)) | ||
for file in snapshot_dir.rglob("*.json"): | ||
text = file.read_text() | ||
text = text.replace("LLaMA", "Llama") | ||
file.write_text(text) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
import typing | ||
|
||
import transformers | ||
from loguru import logger | ||
|
||
|
||
class Printer(typing.Protocol): | ||
def __call__(self, value: int) -> None: | ||
... | ||
|
||
|
||
def _unpack(value): | ||
if len(value.shape) > 1 and value.shape[0] > 1: | ||
raise ValueError("HFStreamer only supports batch size 1") | ||
elif len(value.shape) > 1: | ||
value = value[0] | ||
return value.cpu().tolist() | ||
|
||
|
||
# based on HF text streamer | ||
class HFStreamer(transformers.generation.streamers.BaseStreamer): | ||
def __init__(self, input_ids, printer: Printer): | ||
self.input_ids = _unpack(input_ids)[::-1] | ||
self.printer = printer | ||
|
||
def put(self, value): | ||
for token_id in _unpack(value): | ||
if self.input_ids: | ||
input_id = self.input_ids.pop() | ||
if input_id != token_id: | ||
logger.warning(f"Input id {input_id} does not match output id {token_id}") | ||
else: | ||
self.printer(token_id) | ||
|
||
def end(self): | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,8 @@ | ||
accelerate | ||
bitsandbytes | ||
fastapi | ||
huggingface_hub | ||
sse-starlette | ||
torch | ||
git+https://github.com/huggingface/transformers@main#egg=transformers | ||
uvicorn |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a more descriptive setting name here so people don't expect it to have an effect when not using the basic server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be called quantize because the hf-inference server also expects it to be called like this