Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Re-Enable torchrun call in Zero to Thunder notebook #465

Open
t-vi opened this issue May 27, 2024 · 0 comments
Open

CI: Re-Enable torchrun call in Zero to Thunder notebook #465

t-vi opened this issue May 27, 2024 · 0 comments
Labels
bug Something isn't working ci / tests

Comments

@t-vi
Copy link
Collaborator

t-vi commented May 27, 2024

Because the CI runs into flakiness problems with distributed, I am disabling the call to torchrun in #452 , it would be neat to re-enable once we know what is going on.

Based on analysis of the build-logs, the problem seems to be connected to lambda-server1 , but it could also be some other but correlated thing. (This is off the 186 runs I got from the Azure API this morning.)

image

cc @Borda

@t-vi t-vi added bug Something isn't working ci ci / tests labels May 27, 2024
@t-vi t-vi removed the ci label Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci / tests
Projects
None yet
Development

No branches or pull requests

1 participant