-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch adapters slightly faster #353
Conversation
src/petals/server/backend.py
Outdated
@@ -17,6 +17,7 @@ | |||
from petals.server.memory_cache import MemoryCache | |||
from petals.server.task_pool import PrioritizedTaskPool | |||
from petals.utils.misc import is_dummy | |||
from petals.utils.peft import using_global_adapter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't import it here, this triggers bnb import
src/petals/server/handler.py
Outdated
@@ -142,6 +144,7 @@ async def rpc_inference( | |||
requested_backends = tuple(self.module_backends[uid] for uid in requested_uids) | |||
max_length = metadata.get("max_length") | |||
active_adapter = metadata.get("active_adapter", "") | |||
assert not active_adapter or active_adapter in self.adapters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a error message here, otherwise the client will receive P2PHandlerError("") and won't know what happened
src/petals/utils/peft.py
Outdated
ADAPTER_NOT_SET = "__ADAPTER_NOT_SET" | ||
GLOBAL_ACTIVE_ADAPTER = ADAPTER_NOT_SET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ADAPTER_NOT_SET = "__ADAPTER_NOT_SET" | |
GLOBAL_ACTIVE_ADAPTER = ADAPTER_NOT_SET | |
global_active_adapter = None |
GLOBAL_ACTIVE_ADAPTER
is not a constant, it should not be in caps- Why use a predefined string constant instead of just
None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ok
- None means no adapter, ADAPTER_NOT_SET means you didn't even bother entering the context and you should be chastized for that
Currently, each
TransformerBackend.inference_step
looks for adapters and sets the correct adapter type for each block. This is not very expensive, but it can measurably affect inference time:This pull request uses faster adapter switching with just one variable assignment, without iterating over block.modules().