Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refetch the backfill before updating, to avoid clobbering cancels #7094

Merged
merged 2 commits into from Mar 17, 2022

Conversation

prha
Copy link
Member

@prha prha commented Mar 16, 2022

Summary

The parts of the backfill daemon loop that can take a while is to actually create and submit the runs after reading the partitions from the backfill job.

This diff makes sure to re-fetch the backfill job just before writing state back, to minimize the time window in which the backfill could have been canceled before getting clobbered by the checkpoint write.

Helps address #7090

Test Plan

BK

@vercel
Copy link

vercel bot commented Mar 16, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployments, click below or on the icon next to each commit.

dagster – ./docs/next

🔍 Inspect: https://vercel.com/elementl/dagster/86DkwfV2tmsWJekZbd8FskwNh8sV
✅ Preview: Canceled

[Deployment for e4541e0 canceled]

dagit-storybook – ./js_modules/dagit/packages/ui

🔍 Inspect: https://vercel.com/elementl/dagit-storybook/6qxpZvfx9Ronrq1EfvXRbCPU1DNs
✅ Preview: Canceled

[Deployment for e4541e0 canceled]

@prha prha requested review from sryza and yuhan March 16, 2022 21:55
@vercel vercel bot temporarily deployed to Preview – dagster March 16, 2022 22:22 Inactive
@vercel vercel bot temporarily deployed to Preview – dagit-storybook March 16, 2022 22:22 Inactive
@prha prha requested a review from gibsondan March 17, 2022 18:26
Copy link
Member

@gibsondan gibsondan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other angle here would be to change our schema so that we don't update the status unless we are specifically changing the status as part of the write, right? Otherwise there's still a potential race here if the cancellation comes in between the read and the write (it's much less frequent though). That would probably require us to add more locking or abandon the serialized-namedtuple-in-the-db pattern

@prha
Copy link
Member Author

prha commented Mar 17, 2022

Yeah, I did consider that... but it did seem like a lot to pull out the status from the body and / or reconciling it. Still on the table to change the API for updating the object though.

@prha prha merged commit ffd8318 into master Mar 17, 2022
@prha prha deleted the prha/backfill_refetch branch March 17, 2022 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants