Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Use a separate gorutine to handle the logic of reconnect" #700

Merged
merged 2 commits into from
Jan 6, 2022

Conversation

wolfstudy
Copy link
Member

Reverts #691

xiaolong ran and others added 2 commits January 6, 2022 10:37
Signed-off-by: xiaolongran <xiaolongran@tencent.com>
@wolfstudy wolfstudy self-assigned this Jan 6, 2022
@wolfstudy wolfstudy added this to the v0.8.0 milestone Jan 6, 2022
@wolfstudy wolfstudy requested a review from zymap January 6, 2022 02:56
@bschofield
Copy link
Contributor

bschofield commented Jan 6, 2022

Just to be clear, this commit doesn't actually revert #691 but instead adds a close channel, right? If so it might be good to update the title.

I do have a separate concern with this approach, which might or might not be justified. It looks to me like operations which were previously serialized by virtue of the single goroutine in runEventsLoop() can now occur concurrently, because they are now running in separate goroutines.

For example: suppose the user calls p.Flush() on the producer, which causes a &flushRequest{} to be enqueued to p.eventsChan. The primary goroutine in runEventsLoop() will then call p.internalFlush() which will try to write to p.cnx. Suppose that at the same time, the broker closes the connection. and the new goroutine calls p.reconnectToBroker(), which tries to change p.cnx whilst p.internalFlush() is still using it.

Does it not matter that p.reconnectToBroker() can now be happening at the same time as p.internalSend() / p.internalFlush() / p.internalClose() / p.internalFlushCurrentBatch(), when previously the use of a single goroutine meant that these functions could not be executing simultaneously? Is there some other lock or feature of the code that means this is OK? Or does this introduce a race condition?

Also, does it matter that (because of goroutine scheduling) the order in which these operations execute can now be different to before?

Would really appreciate your thoughts @wolfstudy / @cckellogg. Apologies in advance if you already considered this possibility and there is no issue.

@bschofield
Copy link
Contributor

bschofield commented Jan 6, 2022

If the operations occurring simultaneously could be an issue then it would be easy to add a lock to prevent this.

However I don't think that would prevent the order of operations from being changed from the previous behaviour?

wolfstudy added a commit that referenced this pull request Jan 12, 2022
Signed-off-by: xiaolongran <rxl@apache.org>

### Motivation

In #700, we use a separate go rutine to handle the logic of reconnect, so here you may encounter the same data race problem as #535

### Modifications

Now, the conn field is read and written atomically; avoiding race conditions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants