THRIFT-3060: clear the offline queue when written #1777

razvanz · 2019-04-05T15:18:45Z

This PR addresses an issue in the NodeJS module where data would be sent multiple times when the socket reconnects multiple times because the offline_queue would never be emptied.

This fix clears the offline_queue when the socket reconnects and the queued data is written.

Fixes THRIFT-3060.

jeking3

The CI issues have been resolved so I would suggest rebasing on master to get a clean run. The logic looks good to me in terms of solving the described issue. The whole offline queue gives me some concerns:

What happens if a disconnect happens during the base class write? Does the message get put into the offline queue? I don't see how it would, but calling write() has a chance of detecting the disconnect, wouldn't one want that message to be placed into the offline queue in that situation?
Just because something successfully called write() does not mean it was delivered. A disconnect following write but before the OS can push it out the door will drop anything queued for outbound send. So if folks are really relying on this mechanism to guarantee delivery of a queued request, they are not getting it. This is one of the tough parts about asynchronous messaging that are usually overlooked.
Not sure this logic of tolerating disconnects and queueing up outbound requests has any limits. What happens if the client queues up 5000 queries for the server and on connect slams it? (Not a concern for your PR, but something to think about). I would consider moving this logic to another transport layer that wraps the connection, like a disconnect_tolerant_connection that has a queue size limit).

razvanz · 2019-05-14T14:32:06Z

@jeking3 Having looked into the details, all your 3 points seem to be valid issues. Without using the callback interface, when calling write, the message will be put into some internal queues. This means that if a disconnect happens there will be no transparency over what was delivered and what was left pending into internal queues.

The solution is to use the callback interface for the Socket.write method, thus ensuring transparency over what get's delivered and what not. The offline_queue managed by this client adds yet another layer of queuing, which doesn't bring many benefits for the complexity it introduces so it should be avoided. This is however another discussion, which will require a bigger change.

This PR is meant to address a smaller issue in regards to the current implementation, which I think could be merged until a bigger redesign would be ready. Let me know what you think 😃.

jeking3 · 2019-05-14T18:26:24Z

I agree with you in that this is an improvement. My comments were really just about the whole offline queue in general, and how it really isn't guaranteeing anything. I'm not sure something like that belongs in the thrift transport implementation at all. Given clients are asynchronous already, they should implement their own request timeouts and re-submit requests, and design their systems to have idempotent requests.

jeking3 requested changes May 12, 2019

View reviewed changes

hotfix: clear the offline queue when once written

da47ce6

razvanz force-pushed the hotfix/offline_queue branch from 7aa5d99 to da47ce6 Compare May 14, 2019 13:38

jeking3 approved these changes May 14, 2019

View reviewed changes

jeking3 merged commit c035eca into apache:master May 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

THRIFT-3060: clear the offline queue when written #1777

THRIFT-3060: clear the offline queue when written #1777

razvanz commented Apr 5, 2019

jeking3 left a comment

razvanz commented May 14, 2019 •

edited

Loading

jeking3 commented May 14, 2019

THRIFT-3060: clear the offline queue when written #1777

THRIFT-3060: clear the offline queue when written #1777

Conversation

razvanz commented Apr 5, 2019

jeking3 left a comment

Choose a reason for hiding this comment

razvanz commented May 14, 2019 • edited Loading

jeking3 commented May 14, 2019

razvanz commented May 14, 2019 •

edited

Loading