Feature: Connection status notifications and notifications drop handling #143

flxo · 2019-03-13T21:27:04Z

This is a follow up on #142 without any change on the reconnection behaviour.

Check channel state before sending notifications and distibute connection state changes. The notification Receiver is wrapped into a struct that contains a oneshot Receiver that allows the client to check whether the notification struct is dropped or not. Upon disconnections the corresponding Error is sent.

flxo · 2019-03-13T21:34:05Z

lets think about merging ConnectError and NetworkError. I think that makes live easier for users and obsoletes ConnectError::NetworkError(_) which I needed to have a distinct type for Notification::Disconnected(_).

flxo · 2019-03-13T21:36:06Z

The test story is here is bad - I just added a very very basic one.

Any thoughts about kind of integration tests with a "real" broker? Especially for testing the connection state changes this is definitely more convenient than mocked unit tests...

flxo · 2019-03-13T21:37:42Z

One more note: I didn't increment the version which is kinda needed because of the API change. This can be done when eventually this branch is merged.

tekjar · 2019-03-14T07:02:58Z

src/client/connection.rs

-                Err(false) => break 'reconnection,
-                Ok(_v) => continue 'reconnection,
-            }
+                Err(reconnect) => {


Do we need this handling? I intended to abstract all the reconnection behavior in mqtt_io method and it tells whether to connect or not.

Did you observer any bug with that?

Ah. Probably not. I picked the commit from pr-notifications and didn't notice that this is not needed.

manifest · 2019-03-16T00:30:07Z

@flxo Thanks for working on that. We've just run into that issue and here is already a solution :-)

manifest · 2019-03-16T00:33:28Z

@tekjar Could we please merge this one and bump the crate version? This feature is quite important for our case.

tekjar · 2019-03-16T04:18:45Z

Sorry for the delay on this. I'll try to close this today

tekjar · 2019-03-16T08:02:03Z

src/client/connection.rs

-                handle_notification(notification, &notification_tx);
-                future::ok(reply)
+                let mut mqtt_state = mqtt_state.borrow_mut();
+                mqtt_state.handle_incoming_mqtt_packet(packet)


Any reason why you've moved linear and_then to nested and_then? IMO chaining promises one after the other is much more readable than nesting combinators, which can lead to call back hell sort of code

This two lines just remove the explicit future::ok(). The Result<> returned by handle_incoming_mqtt_packet can be used directly and coverts because of IntoFuture for Result. Not sure what you mean by linear and nested her.

Oh. Got it.

What I meant was that method 2 below will keep combinators like and_then one after the other instead of one inside the other

.and_then(move |packet| { debug!("Incoming packet = {:?}", packet_info(&packet)); let mut mqtt_state = mqtt_state.borrow_mut(); mqtt_state.handle_incoming_mqtt_packet(packet) .and_then(|(notification, reply)| { handle_notification(notification) Ok(reply) }) }) .filter(should_forward_packet);

vs

.and_then(move |packet| { debug!("Incoming packet = {:?}", packet_info(&packet)); let mut mqtt_state = mqtt_state.borrow_mut(); let o = mqtt_state.handle_incoming_mqtt_packet(packet) future::result(o) }) .map(|(notification, reply)| { handle_notification(notification) reply }) .filter(should_forward_packet);

tekjar · 2019-03-16T08:13:19Z

src/client/connection.rs

+                let mut mqtt_state = mqtt_state.borrow_mut();
+                mqtt_state.handle_incoming_mqtt_packet(packet)
+                    .and_then(|(notification, reply)| {
+                        match notification {


Nit: Can we move this to a method? Helps in understanding the flow for newcomers as well as me when I visit the code after a break

Ah. I got what you mean and updated the PR. I did not move it to a methods sind the signature would be quite weird and with the not nested and_then I think it's readable. I also remember why I did the nested and_then: Avoid the double clone of the Rc<MqttState.

Feel free to refactor this.

tekjar · 2019-03-18T06:13:31Z

Left some small nits. So AFAIU you are using oneshot channel along with crossbeam channel for synchronization. Give we anyway lose the convenience of crossbeam select with the wrapper, why not just use futures channel?

tekjar · 2019-03-18T06:21:37Z

Coming to your questions

lets think about merging ConnectError and NetworkError

We'll do this in the next PR

Any thoughts about kind of integration tests with a "real" broker? Especially for testing the connection state changes this is definitely more convenient than mocked unit tests

Integration tests with a real broker would be good but I thought we should also simulate bad networks, disconnections and half-open connections to continuously test out corner cases. But we can start with any available broker and later work on rumqttd to simulate those conditions. Gave a shot to toxiproxy but too many bugs

I didn't increment the version which is kinda needed because of the API change. This can be done when eventually this branch is merged

I'll increment when I merge this to master :)

flxo · 2019-03-18T08:03:59Z

Left some small nits. So AFAIU you are using oneshot channel along with crossbeam channel for synchronization. Give we anyway lose the convenience of crossbeam select with the wrapper, why not just use futures channel?

I don't see any drawback for using the futures channel since I'm in most of my project anyway in "future" context. You can still use the crossbeam select stuff with the impl Deref for Notifications that gives access to the crossbeam part. We can also consider to make this pub.

tekjar · 2019-03-18T08:11:39Z

Cool. Let's merge this after those small changes. I can make those changes myself after merging if you are busy

tekjar · 2019-03-18T11:25:52Z

src/client/connection.rs

+                match notification {
+                    Notification::None => (),
+                    // Ignore error on notification_tx send, since the receiver can be dropped at any time
+                    _ if mqtt_state.send_notifications() => drop(notification_tx.send(notification)),


looks like this will block the event loop. What happens when the receiver is not dropped and the channel is full?

Yes. This blocks the loop. As far a I got it that's ok in order to backpressure on the broker.
A option would be to use a unbounded channel - but this is probably not what we want here.

I'm not fluent in the MQTT spec: What is a broker allowed to do with clients that back pressure?

Blocking the event loop is not ok just because the receiver is not able to catch up. Pings should still continue to happen to prevent broker disconnecting the client and publisher should not be slow because the receiver is doing some heavy computation.

This is getting a little tricky than I anticipated. I'll give proper thought to this and ping you in a few days.

Hm. You're right. I forgot the pings...sorry. Will think about that too.

Cool :). Sorry for the back and forths on this

Options regarding the pings:

Accept that an overload of the client will stop sending pings. The broker will probably disconnect in such a situation (or has to according the spec?). The timeouts are quite relaxed to be fine with a short delayed ping. This is the behaviour on notification_refactor (this PR).

A unbound notification queue: sorry - bad idea...

Discard notifications when the notification queue is full. Behaviour on master. I'd really prefer option 1 one over 3 since at least you see what happened.

Another idea is to split the notification queue into several ones to give the client more flexibility what to handle or which of the Receivers to drop. e.g Notification::Puback could be less interesting that Notification::Connected or Notification::Disconnected(_). Here still the decision whether a Notification can be discarded or not has to be done.

Configuration options to choose between option 1 and 3. Should not be a feature but could be added to the config struct.

What do you think?

Hi @flxo. Sorry for the delay on this. Coming to the options that you've mentioned

Blocking the event loop, in general, isn't a good idea. It's not just about pings and disconnects. If someone is doing publishes from a different thread and the receiver is doing heavy computations, publishers won't progress. Just because crossbeam channel is full, blocking the publisher doesn't make any sense. Also, the eventloop block will keep interfering with any future features we might need on the eventloop.

Agreed :)

Sounds good

Personally, I feel that we should provide an option for future channels along with crossbeam channel for notifications. Future channels won't have this problem of blocking the eventloop while doing a blocking send. Also we won't lose data silently. If people choose to use crossbeam channel, the get the flexibility of select but might miss incoming data.

I'm open to discussion on this. May on a new issue :)

Also, I'll busy this month and won't be able to do a lot of rumqtt work. But we can try to reach a conclusion this month and implement it in the next

Check channel state before sending notifications and distibute connection state changes. The notification Receiver is wrapped into a struct that contains a oneshot Receiver that allows the client to check whether the notification struct is dropped or not.

TotalKrill · 2019-05-13T15:46:55Z

Might i inquire for any news on this, after skimming through this PR, this seems like it would give users such as me ( with devices on bad networks a chance to react on network changes ) a chance to react to those network changes.

tekjar · 2019-05-14T06:02:30Z

@TotalKrill Couldn't spend any time on this crate now. Will most probably resume the work in June

flxo · 2019-05-15T06:48:26Z

@TotalKrill As you might have noticed there are a couple of open points that are not super easy to solve or decide. I'm using this branch in an experimental project and it works so far in that specific scenario.
Comments are welcome!

TotalKrill · 2019-05-20T11:02:33Z

@flxo alright, I might have to take a look into that. A semi-related question though, how are topic subscriptions being handled during disconnect/connect?

Right now I have noticed that long running connections are not receiving subscription notifications after a while. I am suspecting this is due to a disconnect/reconnect. My preferred behavior would be to resubscribe to topics as well on reconnects.

flxo · 2019-05-20T17:50:31Z

As far as I know resubscriptions upon a reconnect is a open topic. See #85. Probably for now you have to do that on your own.

TotalKrill · 2019-05-21T15:09:54Z

the clean_session(false) seems to solve it as long as the broker doesnt die

tekjar · 2019-05-22T08:14:24Z

@TotalKrill

As @flxo mentioned, I'll need some consensus on automatic resubscription. This particular pull request isn't related to this issue. Can you subscribe to issue #85 so that you don't miss any progress on this?

TotalKrill · 2019-06-19T13:45:51Z

I tried wrapping my head around this some. The problem is that the eventloop handles publish requests, and that if we are handling other events during this time, the publish might stall since the other events from the broker might ( such as recieved mqtt messages ) might be in the way. as well as that client generated pings and events can get lost. And all these problems are due to the fact that the client is overloaded?

Could we not return errors when trying to publish to an overloaded eventloop, or send an error notification in the case when a ping failed to send due to overloading the eventloop. Or is this impossible due to problem with blocking the eventloop?

marcotuna · 2019-07-18T08:29:35Z

Any updates on this subject? I am having troubles when needing to resubscribe when broker crashes.

tekjar · 2019-07-18T11:04:54Z

@marcotuna I've implemented reconnection events in the master branch. But I'll have to add automatic resubscription to eventloop (of course based on user configuration).

You can use reconnection events to resubscribe manually but the subscription will only happen after the publishes in the queue are transmitted by the eventloop

tekjar · 2019-07-18T11:08:04Z

@flxo Reconnection notifications and notification drop handling are part of master now. Please feel free to comment on the issue if you don't feel the current behavior solves your usecase

tekjar · 2019-07-18T11:10:09Z

@marcotuna Can you please head issue #85. I'll expose resubscription option today

flxo mentioned this pull request Mar 13, 2019

Notifications on connection state changes and reconnection handling #142

Closed

tekjar reviewed Mar 14, 2019

View reviewed changes

tekjar reviewed Mar 18, 2019

View reviewed changes

tekjar closed this Jul 18, 2019

michaelmarconi mentioned this pull request Sep 5, 2019

Re-subscribe after reconnect #85

Open

Feature: Connection status notifications and notifications drop handling #143

Feature: Connection status notifications and notifications drop handling #143

Conversation

flxo commented Mar 13, 2019

flxo commented Mar 13, 2019

flxo commented Mar 13, 2019

flxo commented Mar 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manifest commented Mar 16, 2019

manifest commented Mar 16, 2019

tekjar commented Mar 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tekjar Mar 18, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tekjar commented Mar 18, 2019 • edited

tekjar commented Mar 18, 2019 • edited

flxo commented Mar 18, 2019

tekjar commented Mar 18, 2019

tekjar Mar 18, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TotalKrill commented May 13, 2019 • edited

tekjar commented May 14, 2019

flxo commented May 15, 2019

TotalKrill commented May 20, 2019

flxo commented May 20, 2019

TotalKrill commented May 21, 2019

tekjar commented May 22, 2019

TotalKrill commented Jun 19, 2019 • edited

marcotuna commented Jul 18, 2019

tekjar commented Jul 18, 2019

tekjar commented Jul 18, 2019

tekjar commented Jul 18, 2019

tekjar Mar 18, 2019 •

edited

tekjar commented Mar 18, 2019 •

edited

tekjar commented Mar 18, 2019 •

edited

tekjar Mar 18, 2019 •

edited

TotalKrill commented May 13, 2019 •

edited

TotalKrill commented Jun 19, 2019 •

edited