synchronize Job Notification to fix join() (fixes #148) #153

jukzi · 2022-09-07T10:20:41Z

Bug 574883 - Job.getJobManager().join(family) doesn't wait for a
re-scheduled job

Add and use a Job specific lock to make sure the notification that a
job stopped is send before the job is reported to be started again
(in another thread).
JobManager.schedule() - when called from endJob() - is called within
the synchronized(lock)) instead of using a second (non atomic)
synchronized(lock) afterwards. The notification is still send outside
that synchronized(lock).

Previously it was possible that the Job notifications was send and
processed in different threads and - since not synchronized - in
arbitrary order. Especially a job
could restart in another worker thread before the notification was
processed that previous job execution was done.
Still the events are processed in different threads, but the
notification order is fixed.

=> The same Job will not start again before all listeners noticed that
the Job was finished. That is also wanted for other Listeneres then the
join() Listener since the Listeners should be able to decide if the Job
should start again (see JobTest.testCancelFromAboutToRun())

The benefit is proven by JUnit Test Bug_574883.testJoinLambdaOften().
Basic functionality is tested by JobTest.

github-actions · 2022-09-07T12:05:46Z

Unit Test Results

    10 files     10 suites 10m 20s ⏱️
2 355 tests 2 354 ✔️ 1 💤 0 ❌
2 356 runs 2 355 ✔️ 1 💤 0 ❌

Results for commit 575db06.

♻️ This comment has been updated with latest results.

jukzi · 2022-09-08T08:39:35Z

damn now notification can deadlock. as happens in org.eclipse.core.tests.runtime.jobs.JobGroupTest.testShouldCancel_4() where endJob notifies JObGroupTest which tries to cancel.

laeubi · 2022-09-08T08:50:35Z

Don't know if such a thing exits, but for me it seems it would be useful to have a Flow-Chart of the Job API where one can see the (desired) valid flows, e.g when does a Job transitions to what state, what listeners are the called and if its allowed for a listener to reschedule a job.

It seems currently not all Workflows are visible and thus it makes it hard to decide how to properly synchronize.

jukzi · 2022-09-08T08:57:06Z

Don't know if such a thing exits, but for me it seems it would be useful to have a Flow-Chart

I would prefer if the events happen in a intuitive order instead of this &/%&%" - as in the example the same Job is ending two times in parallel. Need to find a way that a job is not started again before all its events have been processed.

laeubi · 2022-09-08T09:05:06Z

I would prefer if the events happen in a intuitive order instead of this &/%&%"

But what is intuitive order? That's what I mean, should one really expect some "order" of events, or are these actually only notifications?

as in the example the same Job is ending two times in parallel. Need to find a way that a job is not started again before all its events have been processed.

That's another point that needs to be decided is it valid to schedule a job twice? If not one could simply skip the schedule if the job is marked for scheduling already...

iloveeclipse · 2022-09-08T09:10:28Z

That's another point that needs to be decided is it valid to schedule a job twice

No need to be decided anything, it is valid.

laeubi · 2022-09-08T09:13:31Z

That's another point that needs to be decided is it valid to schedule a job twice

No need to be decided anything, it is valid.

Sorry I wanted to write 'describe' instead, but in that case it is valid what was tried to be fixed here and actually only the client code might decide if it should schedule the job?

jukzi · 2022-09-08T11:40:38Z

Two hours ago the test did not fail on my local computer. now they reproducible fail. its driving me nuts.
org.eclipse.core.internal.jobs.JobManager.now()
"can only be used to compare it with another value returned from this function,"
but what it is compared against?
InternalJob.T_NONE,
delay in JobManager.schedule(I)
org.eclipse.core.internal.jobs.InternalJob.T_INFINITE

AAAAAAAAAAAAAAAAAAAAAAAAA

jukzi · 2022-09-08T16:39:51Z

Please review. Bug_574883.testJoinLambdaOften still fails but in average after much more iterations on my computer

and add a more reliable failing example

Bug 574883 - Job.getJobManager().join(family) doesn't wait for a re-scheduled job 1. Do not restart the Job (in another thread) before it's notifications have been processed by the join listener. That is also wanted for other Listeners since they should be able to decide if the Job should start again (see JobTest.testCancelFromAboutToRun(), JobGroupTest-testShouldCancel_4()) and could use the information about how the previous job ended. Previously it was possible that the Job notifications was send and processed in different threads and - since not synchronized - in arbitrary order. Especially a job could restart in another worker thread before the notification was processed that previous job execution was done. Still the events are processed in different threads, but the notification order is fixed. 2. Perform reschedule while holding lock to prevent unsynchronized moments where Jobs that are rescheduled are neither running, sleeping or waiting 3. Synchronize read/write access to InternalJob.flags to atomically change bits. The benefit is shown by JUnit Test Bug_574883.testJoinLambdaOften(). Basic functionality is tested by JobTest, JobGroupTest The join() implementation is still not 100% bullet proof but the chance of false joins are reduced by some reasons.

laeubi

Looks good to me.

szarnekow · 2022-09-09T11:52:22Z

runtime/bundles/org.eclipse.core.jobs/src/org/eclipse/core/internal/jobs/InternalJob.java

+	/**
+	 * This signal is used to synchronize Job listener notification
+	 */
+	volatile boolean waitForNotificationFinsished;


There is a type in the name of the field. Should be Finished rather than Finsished

jukzi force-pushed the 574883_fix branch from 0a477a6 to 480e78b Compare September 7, 2022 10:22

jukzi linked an issue Sep 7, 2022 that may be closed by this pull request

Bug 574883 - Job.getJobManager().join(family) doesn't wait for a re-scheduled job #148

Closed

jukzi force-pushed the 574883_fix branch 2 times, most recently from 69f59bb to 82cccad Compare September 7, 2022 11:49

jukzi force-pushed the 574883_fix branch from 82cccad to 9740795 Compare September 8, 2022 05:54

jukzi marked this pull request as draft September 8, 2022 08:36

jukzi force-pushed the 574883_fix branch from 9740795 to 75f3269 Compare September 8, 2022 09:51

jukzi force-pushed the 574883_fix branch from 75f3269 to 306b40d Compare September 8, 2022 15:58

jukzi marked this pull request as ready for review September 8, 2022 16:38

jukzi force-pushed the 574883_fix branch from 306b40d to 40ba5be Compare September 9, 2022 06:32

jukzi requested review from iloveeclipse and laeubi September 9, 2022 06:32

jukzi added 2 commits September 9, 2022 08:40

enable Junit Test Bug_574883

00cc6a1

and add a more reliable failing example

jukzi force-pushed the 574883_fix branch from 40ba5be to 575db06 Compare September 9, 2022 06:40

laeubi approved these changes Sep 9, 2022

View reviewed changes

jukzi merged commit f4d72a0 into eclipse-platform:master Sep 9, 2022

jukzi deleted the 574883_fix branch September 9, 2022 07:36

szarnekow reviewed Sep 9, 2022

View reviewed changes

iloveeclipse mentioned this pull request Sep 30, 2022

JobManager.join(family) or related code is broken in 4.26 M1 #193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

synchronize Job Notification to fix join() (fixes #148) #153

synchronize Job Notification to fix join() (fixes #148) #153

Uh oh!

jukzi commented Sep 7, 2022

Uh oh!

github-actions bot commented Sep 7, 2022 •

edited

Loading

Uh oh!

jukzi commented Sep 8, 2022

Uh oh!

laeubi commented Sep 8, 2022 •

edited

Loading

Uh oh!

jukzi commented Sep 8, 2022

Uh oh!

laeubi commented Sep 8, 2022

Uh oh!

iloveeclipse commented Sep 8, 2022

Uh oh!

laeubi commented Sep 8, 2022

Uh oh!

jukzi commented Sep 8, 2022

Uh oh!

jukzi commented Sep 8, 2022

Uh oh!

laeubi left a comment

Uh oh!

szarnekow Sep 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

synchronize Job Notification to fix join() (fixes #148) #153

synchronize Job Notification to fix join() (fixes #148) #153

Uh oh!

Conversation

jukzi commented Sep 7, 2022

Uh oh!

github-actions bot commented Sep 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

jukzi commented Sep 8, 2022

Uh oh!

laeubi commented Sep 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jukzi commented Sep 8, 2022

Uh oh!

laeubi commented Sep 8, 2022

Uh oh!

iloveeclipse commented Sep 8, 2022

Uh oh!

laeubi commented Sep 8, 2022

Uh oh!

jukzi commented Sep 8, 2022

Uh oh!

jukzi commented Sep 8, 2022

Uh oh!

laeubi left a comment

Choose a reason for hiding this comment

Uh oh!

szarnekow Sep 9, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Sep 7, 2022 •

edited

Loading

laeubi commented Sep 8, 2022 •

edited

Loading