Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabled removing lock for orphanded #4886

Open
wants to merge 1 commit into
base: master
from

Conversation

@guzzijones
Copy link
Contributor

guzzijones commented Mar 15, 2020

Fixes #4887

This stopped my execution queue item from being requested more than once for long running requests.
Is there a config setting for the timing of this?

I am not sure if this is needed since there is retry code in the code base.

This stopped my execution queue item from being requested more than once for long running requests.
Is there a config setting for the timing of this?

I am not sure if this is needed since there is retry code in the code base.
@pull-request-size pull-request-size bot added the size/XS label Mar 15, 2020
@CLAassistant

This comment has been minimized.

Copy link

CLAassistant commented Mar 15, 2020

CLA assistant check
All committers have signed the CLA.

@blag blag requested a review from m4dcoder Mar 17, 2020
@punkrokk punkrokk added this to the 3.2.0 milestone Mar 21, 2020
@punkrokk

This comment has been minimized.

Copy link
Contributor

punkrokk commented Mar 21, 2020

@guzzijones Can you expand your explanation and why you commented out this line? I will ping @m4dcoder again for feedback, I think I saw a note that he was able to reproduce it. I just had it happen many times in one of my environments.

@armab armab removed this from the 3.2.0 milestone Mar 21, 2020
@guzzijones

This comment has been minimized.

Copy link
Contributor Author

guzzijones commented Mar 22, 2020

It prevents the action from redeploying on exit. Large json take a while to update.

@guzzijones

This comment has been minimized.

Copy link
Contributor Author

guzzijones commented Mar 22, 2020

More info?

@blag blag added this to the 3.2.0 milestone Mar 24, 2020
@guzzijones

This comment has been minimized.

Copy link
Contributor Author

guzzijones commented Mar 24, 2020

@blag mentioned we need a unit test for this PR. before I spend a bunch of time on that can someone please at least bless the approach here.
my PR basically negates this entire function.
It probably is better to figure out why we end up here in the first place if the action is still not finished writing it's changes to the database.
I have the issue listed above if someone wants to verify we nuke this function.

@m4dcoder

This comment has been minimized.

Copy link
Contributor

m4dcoder commented Mar 24, 2020

@punkrokk I don't know where you read I was able to reproduce this. This is the first I look at this issue #4887 and the proposed solution here. I have not had a chance to review this to make the connection here between the issue and how this solution fixes it.

Copy link
Contributor

m4dcoder left a comment

The scheduler's garbage collection actively looks for action execution that is locked for long time and hasn't been released. This may be caused by a scheduler being terminated abnormally before releasing locks. The time before a lock is manually released by GC is set at the constant EXECUTION_SCHEDUELING_TIMEOUT_THRESHOLD_MS. This is currently not configurable. Since this issue is specific to the end user (large input which causes delay in scheduling), the solution should be to make EXECUTION_SCHEDUELING_TIMEOUT_THRESHOLD_MS configurable and then the end user adjust the value according to needs.

@m4dcoder

This comment has been minimized.

Copy link
Contributor

m4dcoder commented Mar 24, 2020

The alternative solution is to add service discovery for all the st2 components and instead of blindly releasing the lock, first look to see if the scheduler that is processing the action execution is still healthy and alive. This takes more work though.

@guzzijones

This comment has been minimized.

Copy link
Contributor Author

guzzijones commented Mar 24, 2020

Thanks @m4dcoder. I will redo this pr at some point soon hopefully. That makes a lot more sense.

@guzzijones

This comment has been minimized.

Copy link
Contributor Author

guzzijones commented Mar 25, 2020

Yes , this is also breaking the unit test for this method. Working on this today. I will probably have to close this request and point it to a new one as I made this change in the gui through github.com

@blag

This comment has been minimized.

Copy link
Contributor

blag commented Mar 25, 2020

You don't have to close this a create a new PR, you can just:

# Fetch changes from all branches from GitHub
git fetch --all
# Checkout the master branch
git checkout master
# Pull in all changes to your local master branch
git pull
# Switch back to your patch-5 branch
git checkout patch-5
# Rebase back on top of the master branch
git rebase master

And then you can just continue development in this branch as normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

6 participants
You can’t perform that action at this time.