Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified handling the job events for better coordination #430

Merged
merged 4 commits into from
Nov 10, 2016
Merged

Modified handling the job events for better coordination #430

merged 4 commits into from
Nov 10, 2016

Conversation

ajoymajumdar
Copy link
Contributor

  1. Modified to trigger the job launch when handling the JobScheduledEvent. Earlier there was a possibility of a race condition.
  2. Use one concurrent map of JobInfo to track the jobs on a local instance. Synchronize on one job instead of all the jobs.
  3. On job kill/interruption, stop running the rest of the tasks in the LocalJobRunner.

@ajoymajumdar ajoymajumdar added this to the 3.0.0 milestone Nov 3, 2016
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 88.7% when pulling 51e09b5 on ajoymajumdar:develop into 097af0f on Netflix:develop.

* @throws Exception If there is any problem.
*/
@Test
public void submitAndKillJob() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have a test for killing the job in the JobRestControllerIntegrationTests?

@@ -32,25 +36,37 @@
@Getter
public class JobScheduledEvent extends BaseJobEvent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we talked about renaming this cause not it's more of a "schedule a job event" and that's something that might be better as a direct method call to a service than an event? The events are intended to be listened and acted on by multiple listeners that are idempotent. This seems more like a direct action that needs to be taken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I too wanted to have this directly called. Before this is called an init method should be called. So init, scheduled, started and finished methods. Should I make the change?

}
// Tell the system a new job has been scheduled so any actions can be taken
log.info("Publishing job scheduled event for job {}", jobId);
this.eventPublisher.publishEvent(new JobScheduledEvent(jobId, jobRequest,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens now if the scheduling causes an exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We update the job status to Failed but we do not raise JobFinishedEvent event.


// Automatically track the number of jobs running on this node
this.registry.mapSize("genie.jobs.running.gauge", this.jobMonitors);
this.registry.mapSize("genie.jobs.scheduled.gauge", this.scheduledJobs);
this.registry.mapSize("genie.jobs.running.gauge", this.jobs);
this.registry.methodValue("genie.jobs.active.gauge", this, "getNumActiveJobs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's better to tag them with the status so that we can break them up that way?

public void onJobScheduled(final JobScheduledEvent event) {
final String jobId = event.getId();
jobs.put(jobId, new JobInfo());
handle(event.getId(), () -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh fancy functional programming haha

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 88.686% when pulling fea3d68 on ajoymajumdar:develop into 097af0f on Netflix:develop.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 88.723% when pulling aeab1df on ajoymajumdar:develop into 097af0f on Netflix:develop.

}
}
} catch (GenieException e) {
jobStateService.done(jobId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we don't have the persistence steps done within the state machine? It seems to me they're tightly coupled no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does makes sense but wanted to keep it separate to being with. First draft to be in-memory. This should actually have all the job related persistence steps done from within here.

@@ -781,7 +781,9 @@ private void copyRequestHeaders(final HttpServletRequest request, final ClientHt
private void copyResponseHeaders(final HttpServletResponse response, final ClientHttpResponse forwardResponse) {
final HttpHeaders headers = forwardResponse.getHeaders();
for (final Map.Entry<String, String> header : headers.toSingleValueMap().entrySet()) {
response.setHeader(header.getKey(), header.getValue());
if (!TRANSFER_ENCODING_HEADER.equalsIgnoreCase(header.getKey())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment here why you're doing this so that no one takes it out by accident going forward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add.

…vent. Earlier there was a possibility of a race condition.

2. Use one concurrent map of JobInfo to track the jobs on a local instance. Synchronize on one job instead of all the jobs.
3.On job kill/interruption, stop running the rest of the tasks in the LocalJobRunner.
…vent. Earlier there was a possibility of a race condition.

2. Use one concurrent map of JobInfo to track the jobs on a local instance. Synchronize on one job instead of all the jobs.
3.On job kill/interruption, stop running the rest of the tasks in the LocalJobRunner.
…vent. Earlier there was a possibility of a race condition.

2. Use one concurrent map of JobInfo to track the jobs on a local instance. Synchronize on one job instead of all the jobs.
3.On job kill/interruption, stop running the rest of the tasks in the LocalJobRunner.
…vent. Earlier there was a possibility of a race condition.

2. Use one concurrent map of JobInfo to track the jobs on a local instance. Synchronize on one job instead of all the jobs.
3.On job kill/interruption, stop running the rest of the tasks in the LocalJobRunner.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 88.688% when pulling 8ebadf8 on ajoymajumdar:develop into 5727dfa on Netflix:develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 88.688% when pulling 8ebadf8 on ajoymajumdar:develop into 5727dfa on Netflix:develop.

@tgianos tgianos merged commit 8c01cb1 into Netflix:develop Nov 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants