Skip to content

Conversation

@jukzi
Copy link
Contributor

@jukzi jukzi commented Sep 7, 2022

Bug 574883 - Job.getJobManager().join(family) doesn't wait for a
re-scheduled job

  • Add and use a Job specific lock to make sure the notification that a
    job stopped is send before the job is reported to be started again
    (in another thread).

  • JobManager.schedule() - when called from endJob() - is called within
    the synchronized(lock)) instead of using a second (non atomic)
    synchronized(lock) afterwards. The notification is still send outside
    that synchronized(lock).

Previously it was possible that the Job notifications was send and
processed in different threads and - since not synchronized - in
arbitrary order. Especially a job
could restart in another worker thread before the notification was
processed that previous job execution was done.
Still the events are processed in different threads, but the
notification order is fixed.

=> The same Job will not start again before all listeners noticed that
the Job was finished. That is also wanted for other Listeneres then the
join() Listener since the Listeners should be able to decide if the Job
should start again (see JobTest.testCancelFromAboutToRun())

The benefit is proven by JUnit Test Bug_574883.testJoinLambdaOften().
Basic functionality is tested by JobTest.

@jukzi jukzi force-pushed the 574883_fix branch 2 times, most recently from 69f59bb to 82cccad Compare September 7, 2022 11:49
@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2022

Unit Test Results

     10 files       10 suites   10m 20s ⏱️
2 355 tests 2 354 ✔️ 1 💤 0
2 356 runs  2 355 ✔️ 1 💤 0

Results for commit 575db06.

♻️ This comment has been updated with latest results.

@jukzi jukzi marked this pull request as draft September 8, 2022 08:36
@jukzi
Copy link
Contributor Author

jukzi commented Sep 8, 2022

damn now notification can deadlock. as happens in org.eclipse.core.tests.runtime.jobs.JobGroupTest.testShouldCancel_4() where endJob notifies JObGroupTest which tries to cancel.
image

@laeubi
Copy link
Contributor

laeubi commented Sep 8, 2022

Don't know if such a thing exits, but for me it seems it would be useful to have a Flow-Chart of the Job API where one can see the (desired) valid flows, e.g when does a Job transitions to what state, what listeners are the called and if its allowed for a listener to reschedule a job.

It seems currently not all Workflows are visible and thus it makes it hard to decide how to properly synchronize.

@jukzi
Copy link
Contributor Author

jukzi commented Sep 8, 2022

Don't know if such a thing exits, but for me it seems it would be useful to have a Flow-Chart

I would prefer if the events happen in a intuitive order instead of this &/%&%" - as in the example the same Job is ending two times in parallel. Need to find a way that a job is not started again before all its events have been processed.

@laeubi
Copy link
Contributor

laeubi commented Sep 8, 2022

I would prefer if the events happen in a intuitive order instead of this &/%&%"

But what is intuitive order? That's what I mean, should one really expect some "order" of events, or are these actually only notifications?

as in the example the same Job is ending two times in parallel. Need to find a way that a job is not started again before all its events have been processed.

That's another point that needs to be decided is it valid to schedule a job twice? If not one could simply skip the schedule if the job is marked for scheduling already...

@iloveeclipse
Copy link
Member

That's another point that needs to be decided is it valid to schedule a job twice

No need to be decided anything, it is valid.

@laeubi
Copy link
Contributor

laeubi commented Sep 8, 2022

That's another point that needs to be decided is it valid to schedule a job twice

No need to be decided anything, it is valid.

Sorry I wanted to write 'describe' instead, but in that case it is valid what was tried to be fixed here and actually only the client code might decide if it should schedule the job?

@jukzi
Copy link
Contributor Author

jukzi commented Sep 8, 2022

Two hours ago the test did not fail on my local computer. now they reproducible fail. its driving me nuts.
org.eclipse.core.internal.jobs.JobManager.now()
"can only be used to compare it with another value returned from this function,"
but what it is compared against?
InternalJob.T_NONE,
delay in JobManager.schedule(I)
org.eclipse.core.internal.jobs.InternalJob.T_INFINITE

AAAAAAAAAAAAAAAAAAAAAAAAA

@jukzi jukzi marked this pull request as ready for review September 8, 2022 16:38
@jukzi
Copy link
Contributor Author

jukzi commented Sep 8, 2022

Please review. Bug_574883.testJoinLambdaOften still fails but in average after much more iterations on my computer

and add a more reliable failing example
Bug 574883 - Job.getJobManager().join(family) doesn't wait for a
re-scheduled job

1. Do not restart the Job (in another thread) before it's notifications
have been processed by the join listener. That is also wanted for other
Listeners since they should be able to decide if the Job
should start again (see JobTest.testCancelFromAboutToRun(),
JobGroupTest-testShouldCancel_4()) and could use the information about
how the previous job ended.
Previously it was possible that the Job notifications was send and
processed in different threads and - since not synchronized - in
arbitrary order. Especially a job
could restart in another worker thread before the notification was
processed that previous job execution was done.
Still the events are processed in different threads, but the
notification order is fixed.

2. Perform reschedule while holding lock to prevent unsynchronized
moments where Jobs that are rescheduled are neither running, sleeping or
waiting

3. Synchronize read/write access to InternalJob.flags to atomically
change bits.

The benefit is shown by JUnit Test Bug_574883.testJoinLambdaOften().
Basic functionality is tested by JobTest, JobGroupTest

The join() implementation is still not 100% bullet proof but the chance
of false joins are reduced by some reasons.
Copy link
Contributor

@laeubi laeubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@jukzi jukzi merged commit f4d72a0 into eclipse-platform:master Sep 9, 2022
@jukzi jukzi deleted the 574883_fix branch September 9, 2022 07:36
/**
* This signal is used to synchronize Job listener notification
*/
volatile boolean waitForNotificationFinsished;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a type in the name of the field. Should be Finished rather than Finsished

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug 574883 - Job.getJobManager().join(family) doesn't wait for a re-scheduled job

4 participants