Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed load job is rescheduled when Leadership switch #18581

Open
liiuzq-xiaobai opened this issue Apr 18, 2024 · 0 comments
Open

Failed load job is rescheduled when Leadership switch #18581

liiuzq-xiaobai opened this issue Apr 18, 2024 · 0 comments
Labels
type-bug This issue is about a bug

Comments

@liiuzq-xiaobai
Copy link

Alluxio Version:
v2.9.3

Describe the bug
When using the LoadV2 version to load data (alluxio fs load xxxx --submit), if the task fails in the end, the job status will not be persisted in the journey. If there is a subsequent master-slave switch, the new Master will reschedule the previously stale failed Load job.In many production environments, rescheduling old failed tasks will cause a batch of unnecessary data to be loaded, thus greatly affecting cluster stability.
Furthermore, from the original design, it seems that the failed job is not expected to be rescheduled. See "Additional context" for details, so this should be a bug.

To Reproduce
First, a load job is submitted by loadV2, and then the loadjob fails.
image
Second, switch the master, check the job status of the LoadJob , and find that the Job has been rescheduled.
image

Expected behavior
After master-slave switching, failed jobs should not be rescheduled

Urgency
Affects cluster stability after master-slave switching

Are you planning to fix it
Yes

Additional context
企业微信截图_97e28ba5-51fb-49f2-8584-3c994ffb3c01
企业微信截图_bc1de378-908a-4315-b514-e8d6b0dd5a6d
First of all, please let us make it clear that the original design of this function is to hope that the job with a clear success or failure status will not be scheduled after the Leadership switch.

@liiuzq-xiaobai liiuzq-xiaobai added the type-bug This issue is about a bug label Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug This issue is about a bug
Projects
None yet
Development

No branches or pull requests

1 participant