Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix root node resolution for SLURM cluster with dash in host name #1954

Merged
merged 1 commit into from
May 28, 2020

Conversation

LoicGrobol
Copy link
Contributor

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
    • No need for a bugfix, right?
  • Did you write any new necessary tests?
    • Not sure if this need any new tests as the general behaviour of the function is already tested
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #1943

For now this is about the minimal change to fix the issue, but it might be better to completely rewrite the function as

def resolve_root_node_address(root_node):
    """Extract a root node from a SLURM node list."""
    m = re.match(r"(?P<name>.*?)\[(?P<number>\d+)(-\d+)?(,.*)?\]", root_node)
    if not m:
        return root_node
    if m.group("number") is None:
        return m.group("name")
    return m.group("name") + m.group("number")

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team May 26, 2020 10:17
@williamFalcon
Copy link
Contributor

Don't we have tests for this? Mind adding a test just to make sure we don't break the other cases?

@Borda Borda added the bug Something isn't working label May 27, 2020
@LoicGrobol
Copy link
Contributor Author

@williamFalcon Well it's already tested (but weirdly enough, it's in the amp tests so maybe it would be worth it to move these elsewhere) for a basic case (abc[23-24, 45-40, 40] and some variants). I don't mind adding test cases with a dash or other quirks but I am not sure it is really interesting so you tell me :-)

@williamFalcon williamFalcon merged commit c3cf33d into Lightning-AI:master May 28, 2020
@williamFalcon
Copy link
Contributor

williamFalcon commented May 28, 2020

@LoicGrobol great! Good contribution :)

@LoicGrobol
Copy link
Contributor Author

@williamFalcon Haha, just scratching my itch, thanks for merging :-)

justusschock pushed a commit that referenced this pull request Jun 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TrainerDDPMixin.resolve_root_node_address fails if the host name contains a dash
3 participants