Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout handling #472

Closed
philib opened this issue Nov 10, 2017 · 14 comments
Closed

Timeout handling #472

philib opened this issue Nov 10, 2017 · 14 comments

Comments

@philib
Copy link

philib commented Nov 10, 2017

Hey,

is there a possibility to subscribe on timeout events on the server side?
I want to register if a ACK package is send as a response to an observation from the client or not.

Thanks in advance,
Philip

@boaks
Copy link
Contributor

boaks commented Nov 10, 2017

I'm not sure, what you assume that timeout should be related.
If you send a notify as ACK, this is just sent from the CoapServer, there is no timeout related to that transmission (it's no CON, it's just a ACK).
To send that notify, the CoapServer calls in the end the CoapResource.handleGET(). So if your just interested, if the notify was tried to send, just implement that handleGET according your intention.

@philib
Copy link
Author

philib commented Nov 10, 2017

I'm interested if the notify from the server is acknowledged by the client, to check if the client still is connected properly.

Also i'm interested in handling reconnection on the client side. If a client observes a resource, and e.g the server crashes and restarts, the client will no longer get updates from the server. In the case the server doesnt notify the clients in a given time, i'd like to check if the server is still available, otherwise i want to try to request a new observation until the server is available again

@boaks
Copy link
Contributor

boaks commented Nov 10, 2017

For the coap-server (the observed):
There are two configuration parameter:
NOTIFICATION_CHECK_INTERVAL (time in milliseconds)
NOTIFICATION_CHECK_INTERVAL_COUNT

if either the count or the time is reached, the coap-server uses a CON notify to check, if the coap-client is still interested.

For the coap-client (the observer):
If you use the CoapClient there CoapObserveRelation will take care of doing a reregister, if it doesn't receive a notify. That's bound to the MAX_AGE option (default 60s).

@philib
Copy link
Author

philib commented Nov 13, 2017

if either the count or the time is reached, the coap-server uses a CON notify to check, if the coap-client is still interested.

Is there any method which will get invoked after the notify gets canceld? I want to handle those disconnects but I only get a console output :

Nov 13, 2017 8:39:29 AM org.eclipse.californium.core.network.stack.ObserveLayer$NotificationController onTimeout INFORMATION: Notification for token [5e1644e98fbb0b50] timed out. Canceling all relations with source [/127.0.0.1:51514]

If you use the CoapClient there CoapObserveRelation will take care of doing a reregister, if it doesn't receive a notify. That's bound to the MAX_AGE option (default 60s).

On the client the onError() on the CoapObserveRelation gets invoked, but it takes up to 2 min and doesnt result in a reconnect. Where do i have to set MAX_AGE to to speed up the invocation?

@boaks
Copy link
Contributor

boaks commented Nov 16, 2017

Is there any method which will get invoked after the notify gets canceld? I want to handle those disconnects but I only get a console output

Have a look at CoapResource.removeObserveRelation or the ResourceObserver

On the client the onError() on the CoapObserveRelation gets invoked

OK, if that gets invoked, then you not using a commit from our repository! Or which commit you are using, that provides a CoapObserveRelation.onError().

, but it takes up to 2 min and doesnt result in a reconnect. Where do i have to set MAX_AGE to to speed up the invocation?

I'm not sure, what you mean. Why should a "onError" (assuming you mean CoapHandler.onError()) schedule such a "refresh observation"? If you get an error, why do you think, auto-repeat is a good approach? So, I'm not sure, what your plan is. That "refresh observation" is intended for cases, where you have established a observation, but didn't receive an error nor a notify for a longer period. Just in that case, a new observation request is send.

@philib
Copy link
Author

philib commented Nov 17, 2017

Have a look at CoapResource.removeObserveRelation or the ResourceObserver

Thanks, that's what i was looking for.

assuming you mean CoapHandler.onError()

yes, my fault

So, I'm not sure, what your plan is

Assuming a server, which a client is connected to, loses its connection and is offline for a longer period (eg. 5 min). The client should recognize such a server-side disconnect immediately and then try to reconnect to the server until it is back again.

@boaks
Copy link
Contributor

boaks commented Nov 17, 2017

The client should recognize such a server-side disconnect immediately and then try to reconnect to the server until it is back again.

So, your coap-client was observing some resource on a coap-server.
Then that "notify timeout" occurred and triggered a "observe reregister",
which fails, because the observed coap-server was offline,
what is reported with "onError()".

If you want to do retries in that case, just trigger that retry in the "onError()". I'm not sure, what pattern would match you situation best, but I hope you know it and therefore you could implement your proper retry strategy. Californium only implements the very basic (and safe) functionality for that.

@philib
Copy link
Author

philib commented Nov 17, 2017

just trigger that retry in the "onError()"

I wanted to that, but it takes to much time for the onError() to be invoked (up to 10 minutes). Is there a way to speed up this invokation by configuring the coap-client?
I couldn't find a suitable NetworkConfig to achieve this.

Thanks in advance

@boaks
Copy link
Contributor

boaks commented Nov 17, 2017

Sorry, it's time, that you provide some wireshark and californium logs, where we can see what happens :-).
And possibly the branch/commit/tag your using.
Unfortunately, I will not be able to work on this next week, but from the 27.11. I will have a look on your logs.

@philib
Copy link
Author

philib commented Nov 24, 2017

Thanks for your patience !!
Im using the current version of the master branch.
I started the server and the client and killed the server to simulate downtime.

Here is my CoAP Server
Here is my CoAP Client

Thats the console output:
coap

Thats the wireshark log:
unbenannt3

As you can see from the console and the wirkeshark logs, the client starts the first reconnect attempt after about 17 min.

@vikram919
Copy link
Contributor

vikram919 commented Nov 29, 2017

@philib
From the log information you have provided, I could see Option Max-Age set to 1000 seconds.
One question did you explicitly define Max-Age option to the response in your implementation?

       @Override
       public void handleGET(CoapExchange exchange) {
       	exchange.setMaxAge(1000); // instead set it to 0//
       	// respond to the request
           exchange.respond("Hello World!");
       }

Please read this comment:
#479 (comment)

@boaks
Copy link
Contributor

boaks commented Nov 29, 2017

So some more details about the timings.

RFC7641, 3.3.1, page 11:

To make sure it has a current representation and/or to re-register
its interest in a resource, a client MAY issue a new GET request with
the same token as the original at any time. All options MUST be
identical to those in the original request except for the set of ETag
Options. It is RECOMMENDED that the client does not issue the
request while it still has a fresh notification/response for the
resource in its cache. Additionally, the client SHOULD at least wait
for a random amount of time between 5 and 15 seconds after Max-Age
expired to reduce collisions with other clients.

Because of the recommendation at the end, the client waits for that MAX-AGE until it re-registers (in your log 9:42:16 to 9:59:00). Then, because your server is still down, the client retries that re-register, which takes also 1 minute to fail. So, if it takes long to detect that the server is not longer available, this depends on the settings your using.

For the most communication technique, the only possibility to check "aliveness" is to exchange messages. Some technique does this with the protocol, others outside of that protocol.

If you need shorter detection times, then you may design your communication using notifies more frequently. If you send such a notify every 30s (and adjust the MAX_AGE accordingly), you may detect faster, that the server is down. Sure, at the cost of more traffic. So you must find the best trade-off between detection time and traffic for your application.

@boaks
Copy link
Contributor

boaks commented Dec 5, 2017

@philib

Do you still have issues related with the timeout handling?
If not, can we close this issue?

@philib
Copy link
Author

philib commented Dec 5, 2017

Because of the recommendation at the end, the client waits for that MAX-AGE until it re-registers (in your log 9:42:16 to 9:59:00). Then, because your server is still down, the client retries that re-register, which takes also 1 minute to fail. So, if it takes long to detect that the server is not longer available, this depends on the settings your using.

Thanks a lot, reducing the MAX_AGE serverside solved the problem 👍

@philib philib closed this as completed Dec 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants