Skip to content

Enqueue jobs one by one to the queue can cause the jobs not being executed #1365

@alirezazamani

Description

@alirezazamani

Describe the bug

The issue is related to the nature of Helix/ZK and can be reproduced if the user adds jobs in a loop. So if this is the behavior, it is better to do batch job addition. So let's say a user is adding jobs one by one. Let's say the user is job1, job2, and job3.
At T1: Job1 and job2 are added and jobDAG is changed.
At T2: We get children of config and know new configs of job1 and job2 and change in DAG.
At T3: job3 is added and is being added to DAG.
At T4: Refresh is started and the controller sees config of job1, job2 and DAG will be Job1, job2, and job3.
Now in the pipeline, since we see job3 in the DAG and we do not see config, we purge job3 and remove config from ZK. Hence job3 will not be finished at all.

To Reproduce

Add multiple jobs in a loop

Expected behavior

Jobs should be executed and such race conditions should be avoided.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions