Skip to content

fix concurrency#1120

Closed
lucafuji wants to merge 1 commit intoapache:masterfrom
lucafuji:concurrency_fix
Closed

fix concurrency#1120
lucafuji wants to merge 1 commit intoapache:masterfrom
lucafuji:concurrency_fix

Conversation

@lucafuji
Copy link

@lucafuji lucafuji commented Mar 4, 2016

Hi, I change the check from database to base executor to ensure each task_instance will get the consistent view of running instances in memory. Coding style may be not very good. But I will appreciate if you can review the logic, thanks a lot!

pass checks for python3
@mistercrunch
Copy link
Member

Sorry this doesn't work for us as the DB is the source of truth, the executor's state can get out of sync and ultimately we trust what is in the DB...

@lucafuji
Copy link
Author

@mistercrunch
Trusting DB as the source of truth is the root of cause of this issue actually .....
As I mentioned in #1085
If the concurrency is implemented by checking how many running task instances in database, which means it's possible all tasks can start running at the same time because when they query the database, they all get a count of 0.

And I'm wondering how the executor's state can get out of sync? By multiple schedulers?.

Maybe a better solution is using curator/zookeeper's semaphore to solve this problem(each of the task get a resource from the semaphore).

In summary, simply counting the number of running tasks can lead to concurrency issues when multiple task_instance is checking the database at the same time. We need to solve it using other mechanism

@mistercrunch
Copy link
Member

Right, it's a tricky situation as we want to support multiple schedulers in the future, and we certainly don't want to make zookeeper a hard requirement. Having support for multiple executors (and even more in the future), that only leaves the DB as the source of truth, and means we'd have to use it as a message queue almost, which isn't great.

@lucafuji
Copy link
Author

@mistercrunch Just curious how you are going to solve this issue using DB though? thanks

@mistercrunch
Copy link
Member

Yeah I think that's the idea, it may take a little while though. I'm going to write a design proposal and submit it out for review soon.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.05% when pulling cca807f on lucafuji:concurrency_fix into 0fd94d9 on airbnb:master.

@artwr
Copy link
Contributor

artwr commented Oct 16, 2016

@lucafuji I am not sure we have a clear design for this on our end, but if you still want to be involved feel free to reply. If we do not hear from you, we might close this PR for now as part of a cleanup effort for long standing PRs.

@lucafuji
Copy link
Author

@artwr Feel free to abandon this one, this PR also has race condition issue though.
Looking forward to your new design

@lucafuji lucafuji closed this Oct 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants