Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make update_execution() atomic #5358

Merged

Conversation

khushboobhatia01
Copy link
Contributor

Making update_execution() atomic to avoid concurrent updates causing inconsistency.

In the below screenshot two concurrent updates to the execution object resulted in inconsistent data where overall execution is set to running, but the log field depicts the execution had succeeded.
Screenshot 2021-09-08 at 1 20 13 PM

How did this happen?
Two interleaved update_execution() can cause this.

P1 P2
liveaction.status = running liveaction.status = succeeded
execution.status = running execution.status = running
if condition is not valid. Move forward. if condition is not valid. Move forward.
if condition is not valid. No log is pushed if condition is valid. succeeded log is pushed
Action update happens
Action update happens

After the above interleaved execution, P1 will overwrite overall execution status (succeeded set by P2) to running.

@pull-request-size pull-request-size bot added the size/M PR that changes 30-99 lines. Good size to review. label Sep 14, 2021
@CLAassistant
Copy link

CLAassistant commented Sep 15, 2021

CLA assistant check
All committers have signed the CLA.

@pull-request-size pull-request-size bot added size/L PR that changes 100-499 lines. Requires some effort to review. and removed size/M PR that changes 30-99 lines. Good size to review. labels Sep 15, 2021
Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically, all you did was add one line to create the with ...get_lock(...): line, and then indent then block of code to protect it within the lock. And then add some more to the tests. Nice.

Why did adding 1 lock in 1 method increase the number of lock side effects so much in the tests?

with Timer(key="action.executions.calculate_result_size"):
result_size = len(
ActionExecutionDB.result._serialize_field_value(liveaction_db.result)
with coordination.get_coordinator().get_lock(liveaction_db.id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a prefix to this to scope the change just to this block? Or are there other places already using this the live action id as the lock name that you also want to block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cognifloyd No, get_lock() is not being used anywhere else.
And to answer why tests are failing, https://github.com/StackStorm/st2/blob/master/st2actions/tests/unit/test_workflow_engine.py#L155-L159 mocks get_lock() to return ToozConnectionError (to test

with coord_svc.get_coordinator(start_heart=True).get_lock(wf_ex_id):
).

Because of the above mocking, action execution update calls will fail with my changes and cause the test assertions to fail. I'm working on the fix for this.

@khushboobhatia01
Copy link
Contributor Author

@cognifloyd All the tests are passing now. Could you please review and merge the request? Thanks

Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I'm not sure if this will get merged for 3.6.0 or if we'll wait till 3.7.0 to include it.

@cognifloyd cognifloyd added this to To Do in StackStorm v3.6.0 via automation Oct 5, 2021
@cognifloyd cognifloyd added this to the 3.6.0 milestone Oct 5, 2021
@arm4b
Copy link
Member

arm4b commented Oct 5, 2021

I remember there is another PR that is more involved related to changing the behavior in st2workflowengine #5367.
This one #5358 looks like an easy hotfix, tests are good and no blockers here. It would be really nice if we could include it for the v3.6.0.

@khushboobhatia01 Could you please include the Changelog record as well for this PR to make it complete?

@cognifloyd
Copy link
Member

I added a changelog entry

@arm4b
Copy link
Member

arm4b commented Oct 5, 2021

Thanks, looks we're all good ✔️ on this one!

@cognifloyd cognifloyd merged commit f4fcf13 into StackStorm:master Oct 5, 2021
StackStorm v3.6.0 automation moved this from To Do to Done Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L PR that changes 100-499 lines. Requires some effort to review.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants