Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TINKERPOP-2217 #1114

Merged
merged 1 commit into from
May 20, 2019
Merged

TINKERPOP-2217 #1114

merged 1 commit into from
May 20, 2019

Conversation

danielcweber
Copy link
Contributor

https://issues.apache.org/jira/browse/TINKERPOP-2217

Fix potentially harmful timing issue: _writeInProgress could be observed by BeginSendingMessages to indicate that the loop in SendMessagesFromQueueAsync is still "in flight" while in reality, it has already exited.

Copy link
Member

@FlorianHockmann FlorianHockmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this, @danielcweber! LGTM in general, I only commented on 2 nits.

gremlin-dotnet/src/Gremlin.Net/Driver/Connection.cs Outdated Show resolved Hide resolved
SendMessagesFromQueueAsync().Forget();
_writeQueue.Enqueue(message);

if (Interlocked.Increment(ref _writeInProgress) == 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpick) I don't have a strong opinion on this, but an (integer) flag that can only have two values to represent the two states seems simpler to me than a counter that is only compared to 0 and 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only that it's wrong. The send-while-loop may exit, meanwhile the queuing observes the flag to be 1, not starting the loop. After that, the flag is reset, but it's too late. The work item may never be processed.

Copy link
Contributor Author

@danielcweber danielcweber May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't help IMO to include CompareExchange into the while loop. When it's decided to exit the loop, another queueing attempt may already have observed the queue to be running, and work may be lost.

That being said, I'm not particularly awesome at lock free programming. I think we could also just use ordinary locks in all those places and be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, never mind. I though that Decrement could be replaced by the previous version with CompareExchange, but that would exit the while loop after a single iteration. Since you have to evaluate the state of _writeInProgress in every iteration, it really needs to be a counter instead of a flag.
It's really a bit confusing that _writeInProgress is now effectively a counter and a status flag at the same time, but I also don't know how we could solve that differently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(My answer actually only addressed your first comment here as I wrote it before seeing your second comment.)

That being said, I'm not particularly awesome at lock free programming. I think we could also just use ordinary locks in all those places and be good.

Me neither, but how could we implement this with ordinary locks without blocking multiple task waiting to get the lock? The current implementation ensures that a single task can send all messages in the queue and do that in the background.
But it would in general be good if @jorgebay could also review this as he definitely has more experience than I do in this area.

Copy link
Contributor Author

@danielcweber danielcweber May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not a status flag ever, it's only a counter. The comparison == 1 just asserts whether the increment (= work item queue) is the first one when idle, and thus has to start the loop. Same for the check on 0. We must ensure that we really drained the queue and decremented the counter to zero to be able to leave the loop.

What's non optimal is that we now have two sources of truth of the count of work items: The queue itself and the counter. EDIT: That's why I put the increment and enqueue together as a unit. There should now never be discrepancy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither, but how could we implement this with ordinary locks without blocking multiple task waiting to get the lock?

That's not a big problem, you don't have to go lock free to be async. Also, you don't have to block inside of locks, and still you can maintain async code. Stephen Cleary has a wonderful library of async synchronization primitives that could help a lot here, but I'm not sure whether you'd want to take a dependency on a 3rd party lib.

@jorgebay
Copy link
Contributor

Nice catch @danielcweber !

I think we can fix the race condition by peeking after yielding the _writeInProgress flag:

        private async Task SendMessagesFromQueueAsync()
        {
            // ...
            Interlocked.CompareExchange(ref _writeInProgress, 0, 1);

            // Since the loop ended and the write in progress was set to 0
            // a new item could have been added, write queue can contain items at this time
            if (!_writeQueue.IsEmpty && Interlocked.CompareExchange(ref _writeInProgress, 1, 0) == 1)
            {
                await SendMessagesFromQueueAsync().ConfigureAwait(false);
            }
        }

I personally would like this change to the approach proposed in this patch because it requires less effort to review 3 lines of code than 20+ in this sensitive part of the driver. Would you like to change your patch?

@danielcweber
Copy link
Contributor Author

Can be done. Also, you might wanna check out my alternative approach, because actually all we need is async locking, we don't need no send-queue at all.

@jorgebay
Copy link
Contributor

thanks @danielcweber!

VOTE +1

…ved by BeginSendingMessages to indicate that the loop in SendMessagesFromQueueAsync is still "in flight" while in reality, it has already exited.
@FlorianHockmann
Copy link
Member

VOTE +1

@FlorianHockmann FlorianHockmann merged commit 0d38dba into apache:master May 20, 2019
@FlorianHockmann
Copy link
Member

Merged via CTR.

@danielcweber danielcweber deleted the TINKERPOP-2217 branch May 20, 2019 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants