Revert "Use a separate gorutine to handle the logic of reconnect" #700

wolfstudy · 2022-01-06T02:37:56Z

Reverts #691

This reverts commit 39e13ac.

Signed-off-by: xiaolongran <xiaolongran@tencent.com>

bschofield · 2022-01-06T12:59:50Z

Just to be clear, this commit doesn't actually revert #691 but instead adds a close channel, right? If so it might be good to update the title.

I do have a separate concern with this approach, which might or might not be justified. It looks to me like operations which were previously serialized by virtue of the single goroutine in runEventsLoop() can now occur concurrently, because they are now running in separate goroutines.

For example: suppose the user calls p.Flush() on the producer, which causes a &flushRequest{} to be enqueued to p.eventsChan. The primary goroutine in runEventsLoop() will then call p.internalFlush() which will try to write to p.cnx. Suppose that at the same time, the broker closes the connection. and the new goroutine calls p.reconnectToBroker(), which tries to change p.cnx whilst p.internalFlush() is still using it.

Does it not matter that p.reconnectToBroker() can now be happening at the same time as p.internalSend() / p.internalFlush() / p.internalClose() / p.internalFlushCurrentBatch(), when previously the use of a single goroutine meant that these functions could not be executing simultaneously? Is there some other lock or feature of the code that means this is OK? Or does this introduce a race condition?

Also, does it matter that (because of goroutine scheduling) the order in which these operations execute can now be different to before?

Would really appreciate your thoughts @wolfstudy / @cckellogg. Apologies in advance if you already considered this possibility and there is no issue.

bschofield · 2022-01-06T13:24:44Z

If the operations occurring simultaneously could be an issue then it would be easy to add a lock to prevent this.

However I don't think that would prevent the order of operations from being changed from the previous behaviour?

Signed-off-by: xiaolongran <rxl@apache.org> ### Motivation In #700, we use a separate go rutine to handle the logic of reconnect, so here you may encounter the same data race problem as #535 ### Modifications Now, the conn field is read and written atomically; avoiding race conditions.

xiaolong ran and others added 2 commits January 6, 2022 10:37

Revert "Use a separate gorutine to handle the logic of reconnect (#691)"

accf726

This reverts commit 39e13ac.

add closeCh for go rutine leak

536d882

Signed-off-by: xiaolongran <xiaolongran@tencent.com>

wolfstudy requested a review from cckellogg January 6, 2022 02:52

wolfstudy self-assigned this Jan 6, 2022

wolfstudy added this to the v0.8.0 milestone Jan 6, 2022

wolfstudy requested a review from zymap January 6, 2022 02:56

cckellogg approved these changes Jan 6, 2022

View reviewed changes

wolfstudy mentioned this pull request Jan 6, 2022

Reconnection blocked in producer by request timed out #697

Closed

wolfstudy merged commit ff7a962 into master Jan 6, 2022

wolfstudy mentioned this pull request Jan 11, 2022

Fix data race while accessing connection in partitionProducer #701

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Use a separate gorutine to handle the logic of reconnect" #700

Revert "Use a separate gorutine to handle the logic of reconnect" #700

wolfstudy commented Jan 6, 2022

bschofield commented Jan 6, 2022 •

edited

Loading

bschofield commented Jan 6, 2022 •

edited

Loading

Revert "Use a separate gorutine to handle the logic of reconnect" #700

Revert "Use a separate gorutine to handle the logic of reconnect" #700

Conversation

wolfstudy commented Jan 6, 2022

bschofield commented Jan 6, 2022 • edited Loading

bschofield commented Jan 6, 2022 • edited Loading

bschofield commented Jan 6, 2022 •

edited

Loading

bschofield commented Jan 6, 2022 •

edited

Loading