-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use bnb==0.40.0.post4 to fix bias bug, use bfloat16 by default #341
Conversation
src/petals/server/server.py
Outdated
if self.block_config.model_type == "llama" and torch_dtype == torch.bfloat16 and quant_type != QuantType.NF4: | ||
logger.warning( | ||
"LLaMA is loaded in bfloat16 for compatibility with --quant_type nf4 servers (default). " | ||
"If you use a private swarm without such servers, use --torch_dtype float16 to force the original float16 dtype" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If you use a private swarm without such servers, use --torch_dtype float16 to force the original float16 dtype" | |
"If you want to run in float16, use --torch_dtype float16 to force the original float16 dtype" |
src/petals/server/server.py
Outdated
@@ -173,6 +173,12 @@ def __init__( | |||
self.quant_type = quant_type | |||
logger.info(f"Model weights are loaded in {get_dtype_name(torch_dtype, quant_type)} format") | |||
|
|||
if self.block_config.model_type == "llama" and torch_dtype == torch.bfloat16 and quant_type != QuantType.NF4: | |||
logger.warning( | |||
"LLaMA is loaded in bfloat16 for compatibility with --quant_type nf4 servers (default). " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"LLaMA is loaded in bfloat16 for compatibility with --quant_type nf4 servers (default). " | |
"LLaMA is loaded in bfloat16 for compatibility with Guanaco nf4 setup (default). " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as long as we warn everyone
5d434a1
to
f53c581
Compare
Transition to bfloat16 has been delayed due to bnb performance issues. |
This PR: