-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support libp2p relays for NAT traversal #186
Conversation
Vahe1994
commented
Jan 8, 2023
•
edited by borzunov
Loading
edited by borzunov
- Added relay options to servers
- Enabled relay options by default
- Changed hivemind version to 1.1.5
- Moved reachability check to be performed after blocks are loaded
2. Enabled relay options by default 3. Changed hivemind version to 1.1.5
src/petals/cli/run_server.py
Outdated
@@ -127,7 +127,10 @@ def main(): | |||
parser.add_argument("--mean_balance_check_period", type=float, default=60, | |||
help="Check the swarm's balance every N seconds (and rebalance it if necessary)") | |||
|
|||
parser.add_argument("--auto_relay", action='store_true', help="Enabling relay for NAT traversal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parser.add_argument("--auto_relay", action='store_true', help="Enabling relay for NAT traversal") | |
parser.add_argument("--auto_relay", action='store_true', help="Enable relay for NAT traversal") |
@@ -78,6 +78,8 @@ def __init__( | |||
load_in_8bit: Optional[bool] = None, | |||
tensor_parallel_devices: Optional[Sequence[torch.device]] = None, | |||
skip_reachability_check: bool = False, | |||
use_relay: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's best to either remove default argument values of to remove these arguments completely: we might forget to change defaults here in the future, and required values will be passed to kwargs anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point but I think we should keep it, since the convention in Petals is that all Server
defaults match to the defaults of run_server.py
(in turn, the hivemind default for use_auto_relay
is different).
But all you said would have applied if the defaults here matched with hivemind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will agree with @borzunov here. Here are my arguments:
- use_relay will not be passed from run_server and we want it by default to be True
- it is nice to see in the arguments all parameters that matters
- usually , it is not a good idea to be dependent on default argument from another library . They could be changed without notice and can lead to strange behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point about explicitly indicating arguments for creation of Server, though it somewhat contradicts the existence of **kwargs in init. My primary concern is that we should strive to have consistent defaults across different locations: one way to do this in an error-proof way would be to declare a common constant with the default value and use it in both locations. Besides, petals-cli is a part of Petals, so these files belong to the same library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to create constants for all defaults in this case (tens of them). I think this is a more general problem that should be addressed outside of this PR (maybe we should use smth like reflection).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I have questions about relays, I need to talk to @justheuristic before we ship this code.
logger = get_logger(__file__) | ||
|
||
|
||
def check_reachability(peer_id, wait_time: float = 7 * 60, retry_delay: float = 15) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from server.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All urgent issues have been resolved.