New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SUBMARINE-1021. Experiment Watcher #767
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, some comments below.
LOG.error("Experiment watch failed. " + e.getMessage(), e); | ||
} | ||
|
||
// client.setDebugging(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented this line is because Watch in Java k8s client library seems not to support debugging mode, it made the watcher failed.
I'll just remove it.
@@ -565,6 +584,79 @@ public ServeResponse deleteServe(ServeRequest spec) | |||
} | |||
} | |||
|
|||
public void watchExperiment() throws ApiException{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a unit test or integration test for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing in the meeting at Thursday, we think this function is hard to test because of its nature and it should be tested with the websocket developed following this PR. Maybe it is not too late to write a test for it at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
73e9e5b
to
6571cbe
Compare
@@ -565,6 +584,79 @@ public ServeResponse deleteServe(ServeRequest spec) | |||
} | |||
} | |||
|
|||
public void watchExperiment() throws ApiException{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
} | ||
} finally { | ||
LOG.info("WATCH PytorchJob END"); | ||
throw new RuntimeException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw new RuntimeException(); | |
watchPytorch.close(); | |
throw new RuntimeException(); |
} | ||
} finally { | ||
LOG.info("WATCH TFJob END"); | ||
throw new RuntimeException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw new RuntimeException(); | |
watchTF.close(); | |
throw new RuntimeException(); |
6571cbe
to
9daa4af
Compare
What is this PR for?
Use k8s java client to build watchers of TFJobs and PytorchJobs, logging status when the experiment status change.
Watcher examples
We will create a websocket connection between server and workbench, and modify the frontend logic of workbench in the following PRs.
What type of PR is it?
[Feature]
Todos
None
What is the Jira issue?
https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-1021
How should this be tested?
It will run with the initializing of k8s submitter, then keep watching the experiments.
You can see the log when status of experiment changing.
Screenshots (if appropriate)
2021-10-06.21.46.52.mp4
Questions: