-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds max running time termination criteria #474
Conversation
b765c0d
to
5aff9fd
Compare
Adds method to terminate experiment in task scheduler. Refactors running states handling to separate class. Adds morphia converter for Time.
bdd5661
to
798c420
Compare
@@ -127,6 +128,8 @@ public void run(BenchFlowTestManagerConfiguration configuration, Environment env | |||
// http://mongodb.github.io/mongo-java-driver/3.4/driver/getting-started/quick-start/ | |||
MongoClient mongoClient = configuration.getMongoDBFactory().build(); | |||
ExecutorService taskExecutor = configuration.getTaskExecutorFactory().build(environment); | |||
ScheduledThreadPoolExecutor timeOutScheduledThreadPoolExecutor = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should build it as you do for the other executor service we have: https://github.com/benchflow/benchflow/pull/474/files#diff-3ebdba5e745234429a46f7536fa3feb8R130
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I don't see any need to put this in the configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postponed in #517
break; | ||
|
||
case VALIDATE_TERMINATION_CRITERIA: | ||
validateTerminationCriteria(testID); | ||
runningStatesHandler.validateTerminationCriteria(testID); | ||
break; | ||
|
||
case DERIVE_PREDICTION_FUNCTION: | ||
testModelDAO.setTestRunningState(testID, VALIDATE_PREDICTION_FUNCTION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you do this here and not in the function called in the next line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually happens also happens in the function in the next line. Although this wasn't changed in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #474 (commits)
@@ -135,6 +135,11 @@ private synchronized void handleStartState(String testID) { | |||
// wait for task to complete | |||
future.get(); | |||
|
|||
if (isTerminated(testID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you check this in the setTestState
method, so that is done only in one place?
Anyway this should not be possible, because once you get a timeout, you need to get exclusive access to the data structure where you keep running information and cleanup everything before other code gets executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The running tasks cannot be cancelled - the only way is to wait for them to terminate. Therefore I need to check if the test has been terminated in the meantime. It is also what we discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #474 (commits)
@@ -156,6 +161,18 @@ private synchronized void handleWaitingState(String testID) { | |||
// set state as ready | |||
testModelDAO.setTestState(testID, BenchFlowTestState.READY); | |||
|
|||
// update max running time timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need the following code? Is it to restart the execution, and take into account of the time the test has already been executed? If yes, I think this should happen in the RUNNING
state when the test goes back to RUNNING
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to update the maxRunning time since that is read when the test goes back to RUNNING
. In the WAITING
state that is the only place I know how much time has elapsed since the TimeoutTask
was scheduled.
@@ -172,44 +189,51 @@ private synchronized void handleTestRunningState(String testID) { | |||
logger | |||
.info("handleTestRunningState for " + testID + " with state " + testRunningState.name()); | |||
|
|||
// set timeout if not already set | |||
if (!timeoutTasks.containsKey(testID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the test has no timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the question fully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #474 (commits)
try { | ||
|
||
// wait for task to complete | ||
future.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make task cancellable, then probably here you are going to get an exception and you follow the exceptional flow, so that you do not need to check all the time if something is terminated. When you apply the previous comment, then the handling can be delegated to a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comment about cancellable tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked on this in #474 (commits). It is fine to have.
// replace with new task | ||
testTasks.put(testID, future); | ||
|
||
// we don't wait for the task to complete since the experiment-manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably at this stage, the waiting for this state should be exactly to wait for the wanted data from the experiment manager, that should be the details for each of the trials that actually gets executed up to the point in which the experiment manager declares the experiment as completed in any of the possible states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what happens - the EM informs when the experiment has completed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked on this in #474 (commits).
|
||
try { | ||
|
||
// TODO - update: set next state as validate termination criteria |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You set it to REMOVE_NON_REACHABLE_EXPERIMENTS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the comment is about. This is a TODO
String experimentID = testModelDAO.getRunningExperiment(testID); | ||
|
||
if (experimentID != null) { | ||
experimentManagerService.abortBenchFlowExperiment(experimentID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to get and ACK
about the execution of the operation, because otherwise it is really complex to control the behaviour of the distributed system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is covered in issue #499
@@ -267,6 +270,60 @@ public void runLoadTest() throws Exception { | |||
|
|||
} | |||
|
|||
@Test | |||
public void runBenchFlowTestTimeoutTest() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need also some tests where you submit multiple tests, and check the results of some of them, so that you can experience what happens with concurrency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added that to the issue #499
as per title
fixes #337 (except extensions, that are now in #517)
related to #281
depends on PR #469
TODO