Problem with client registration #751

oz117 · 2019-10-03T15:52:51Z

Hi everyone,
I am having an issue that I cannot fix lately and need your input on this.
We have an instance of the leshan server running inside a GCP cluster (only one instance) behind an internal load balancer.
What I have been seeing lately is that a lot of our clients are not able to register passed a certain point (that I am not able to detect).
Here is a screenshot of the logs of the server

All the 129 bytes received messages are clients trying to register to the server.
However the server seems to do nothing with the calls/messages.
Do you have any idea of what could be happening here?
For more details all the clients are using the U mode and have a dynamic IP that changes when they loose the connection.
Tell me if you need anything else 😄
Thanks for the help

The text was updated successfully, but these errors were encountered:

sbernard31 · 2019-10-03T16:14:10Z

Are you using DTLS ? regarding your log (port 5683), I would say : no.
Which version of Leshan ?
Could you make a tcpdump capture at server side to see what happened ?

oz117 · 2019-10-04T07:17:15Z

Hi, I am not currently using DTLS.
I will post a tcpdump as soon as possible

oz117 · 2019-10-13T11:31:59Z

Hi so I finally found what was causing the issue and it came from me not understanding something:
I was making synchronous calls inside the registered function on my registrationListener.
It was causing all the COAP server to not take any more requests.
One thing I am not able to understand is why this happens. I would have though that the listener would run on a separate that would have no impact on the COAP server.
Do you have any idea ?

sbernard31 · 2019-10-14T08:27:20Z

One thing I am not able to understand is why this happens. I would have though that the listener would run on a separate that would have no impact on the COAP server.

This is not the case. Listeners are called by threads of the coap protocol state thread pool.
The number on thread is limited so if you block them all, there is no more threads available to handle new CoAP request.
(Maybe here some explication sync/async call and CoAP thread pools)

Your underlying question is why we don't call listener method in another thread. The answer is just to let user do what they want. I feel sometime this is not really useful to create a new thread (e.g. if you just want to add metrics)

But we should probably document this !

sbernard31 · 2019-10-17T16:20:49Z

I added some javadoc about that.

Should we close this issue now ?

oz117 · 2019-10-21T16:16:18Z

Hi,
Thanks for the documentation.
I have one last question, in your opinion, if we have a lot of devices connecting at the same time, should we increase the number of stage threads or increase the timeout value of the clients?
As an example, right now we are having an issue were multiple devices connect at the same time (more that 8) and some of them will start to register and unregister for a very long time.
After a few minutes those devices register properly.
What we see is that the device register call timeouts as a result the device sends an unregister and then it tries to register again and so on.

sbernard31 · 2019-10-22T12:21:08Z

It's pretty hard to advice anything without knowing the setup.
What is the current timeout at device/client side ? (few minutes is OK)

Increasing the number of threads could make sense if your threads are blocked waiting for some resources. But adding to much threads could decrease performance (because of context switch)
There is some theoretical way to calculate the good number of thread. But you need some variable you maybe don't have :p.

Did you implement your own RegistrationStore or SecurityStore ?
Both implementation should answer quickly as there are executed by CoAP thread pool or DTLS thread pool.

As an example, right now we are having an issue were multiple devices connect at the same time (more that 8)

You mean "more than 8" devices at the same time cause issue ? (This should not)

oz117 · 2019-10-23T08:53:46Z

Hi,

Currently the ack timeout is set to 2 seconds on the client, we increased it to 30 secs.
I left the default RegistratioStore and SecurityStore.
The issues seems to occur when we have more more than 10 devices connecting at the same time.
For the setup basically we will have up to 1000 devices connected to our server (not all at the same time).

When they register we do multiple reads and observes on different resources/instances.
Then each client will only send ObserverResponses unless they loose their network connection in which case they will perform a new register; which from what I can see will first trigger an unregister from the leshan-server followed by a register, correct if I am wrong :).

sbernard31 · 2019-10-23T11:39:34Z

10 devices at the "same time" should not be an issue at all. This means there is an issue somewhere.

Do you see anything in log ? or in wireshark capture ? do you have any blocking thread which could avoid to handle the request in time ?

which from what I can see will first trigger an unregister from the leshan-server

If a device register itself and there is already a registration for that client. (client is identified by endpoint Name), so the previous registration is removed and an unregister event is raised.

oz117 · 2019-11-04T15:11:22Z

I don't see what could be blocking the receiving threads. Whenever I receive a register request I use Rx to send a message to a separate thread that will in turn do a few read/observe requests.

Are the receiving and sending threads shared ?

sbernard31 · 2019-11-04T15:14:35Z

at CoAP level threads pool is shared to handle receiving and sending messages.

oz117 · 2019-11-05T15:17:35Z

Ok so then the locked threads come from the read/observe requests I do on the devices... I have a better understanding of what is happening then.
Thanks a lot.

Do yo have any advice on how to best handle this?

sbernard31 · 2019-11-05T15:58:27Z

I repeart myself :) :

prefer async send than sync one. async must not block thread.
if this is not enough try to increase the number of thread in protocol stage.

sbernard31 · 2019-11-13T10:28:42Z

Could we close this issue ?

oz117 · 2019-11-13T14:46:04Z

Yep. I will close this now.
Thanks for the help :)

sbernard31 added a commit that referenced this issue Oct 17, 2019

#751: Add some documentation about listener execution.

742669f

sbernard31 added the question Any question about leshan label Oct 17, 2019

oz117 closed this as completed Nov 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with client registration #751

Problem with client registration #751

oz117 commented Oct 3, 2019

sbernard31 commented Oct 3, 2019

oz117 commented Oct 4, 2019

oz117 commented Oct 13, 2019

sbernard31 commented Oct 14, 2019

sbernard31 commented Oct 17, 2019

oz117 commented Oct 21, 2019

sbernard31 commented Oct 22, 2019

oz117 commented Oct 23, 2019

sbernard31 commented Oct 23, 2019

oz117 commented Nov 4, 2019

sbernard31 commented Nov 4, 2019

oz117 commented Nov 5, 2019

sbernard31 commented Nov 5, 2019

sbernard31 commented Nov 13, 2019

oz117 commented Nov 13, 2019

Problem with client registration #751

Problem with client registration #751

Comments

oz117 commented Oct 3, 2019

sbernard31 commented Oct 3, 2019

oz117 commented Oct 4, 2019

oz117 commented Oct 13, 2019

sbernard31 commented Oct 14, 2019

sbernard31 commented Oct 17, 2019

oz117 commented Oct 21, 2019

sbernard31 commented Oct 22, 2019

oz117 commented Oct 23, 2019

sbernard31 commented Oct 23, 2019

oz117 commented Nov 4, 2019

sbernard31 commented Nov 4, 2019

oz117 commented Nov 5, 2019

sbernard31 commented Nov 5, 2019

sbernard31 commented Nov 13, 2019

oz117 commented Nov 13, 2019