Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with client registration #751

Closed
oz117 opened this issue Oct 3, 2019 · 15 comments
Closed

Problem with client registration #751

oz117 opened this issue Oct 3, 2019 · 15 comments
Labels
question Any question about leshan

Comments

@oz117
Copy link

oz117 commented Oct 3, 2019

Hi everyone,
I am having an issue that I cannot fix lately and need your input on this.
We have an instance of the leshan server running inside a GCP cluster (only one instance) behind an internal load balancer.
What I have been seeing lately is that a lot of our clients are not able to register passed a certain point (that I am not able to detect).
Here is a screenshot of the logs of the server
image
All the 129 bytes received messages are clients trying to register to the server.
However the server seems to do nothing with the calls/messages.
Do you have any idea of what could be happening here?
For more details all the clients are using the U mode and have a dynamic IP that changes when they loose the connection.
Tell me if you need anything else 😄
Thanks for the help

@sbernard31
Copy link
Contributor

Are you using DTLS ? regarding your log (port 5683), I would say : no.
Which version of Leshan ?
Could you make a tcpdump capture at server side to see what happened ?

@oz117
Copy link
Author

oz117 commented Oct 4, 2019

Hi, I am not currently using DTLS.
I will post a tcpdump as soon as possible

@oz117
Copy link
Author

oz117 commented Oct 13, 2019

Hi so I finally found what was causing the issue and it came from me not understanding something:
I was making synchronous calls inside the registered function on my registrationListener.
It was causing all the COAP server to not take any more requests.
One thing I am not able to understand is why this happens. I would have though that the listener would run on a separate that would have no impact on the COAP server.
Do you have any idea ?

@sbernard31
Copy link
Contributor

One thing I am not able to understand is why this happens. I would have though that the listener would run on a separate that would have no impact on the COAP server.

This is not the case. Listeners are called by threads of the coap protocol state thread pool.
The number on thread is limited so if you block them all, there is no more threads available to handle new CoAP request.
(Maybe here some explication sync/async call and CoAP thread pools)

Your underlying question is why we don't call listener method in another thread. The answer is just to let user do what they want. I feel sometime this is not really useful to create a new thread (e.g. if you just want to add metrics)

But we should probably document this !

@sbernard31 sbernard31 added the question Any question about leshan label Oct 17, 2019
@sbernard31
Copy link
Contributor

I added some javadoc about that.

Should we close this issue now ?

@oz117
Copy link
Author

oz117 commented Oct 21, 2019

Hi,
Thanks for the documentation.
I have one last question, in your opinion, if we have a lot of devices connecting at the same time, should we increase the number of stage threads or increase the timeout value of the clients?
As an example, right now we are having an issue were multiple devices connect at the same time (more that 8) and some of them will start to register and unregister for a very long time.
After a few minutes those devices register properly.
What we see is that the device register call timeouts as a result the device sends an unregister and then it tries to register again and so on.

@sbernard31
Copy link
Contributor

It's pretty hard to advice anything without knowing the setup.
What is the current timeout at device/client side ? (few minutes is OK)

Increasing the number of threads could make sense if your threads are blocked waiting for some resources. But adding to much threads could decrease performance (because of context switch)
There is some theoretical way to calculate the good number of thread. But you need some variable you maybe don't have :p.

Did you implement your own RegistrationStore or SecurityStore ?
Both implementation should answer quickly as there are executed by CoAP thread pool or DTLS thread pool.

As an example, right now we are having an issue were multiple devices connect at the same time (more that 8)

You mean "more than 8" devices at the same time cause issue ? (This should not)

@oz117
Copy link
Author

oz117 commented Oct 23, 2019

Hi,

  1. Currently the ack timeout is set to 2 seconds on the client, we increased it to 30 secs.

  2. I left the default RegistratioStore and SecurityStore.

  3. The issues seems to occur when we have more more than 10 devices connecting at the same time.

  4. For the setup basically we will have up to 1000 devices connected to our server (not all at the same time).

When they register we do multiple reads and observes on different resources/instances.
Then each client will only send ObserverResponses unless they loose their network connection in which case they will perform a new register; which from what I can see will first trigger an unregister from the leshan-server followed by a register, correct if I am wrong :).

@sbernard31
Copy link
Contributor

10 devices at the "same time" should not be an issue at all. This means there is an issue somewhere.

Do you see anything in log ? or in wireshark capture ? do you have any blocking thread which could avoid to handle the request in time ?

which from what I can see will first trigger an unregister from the leshan-server

If a device register itself and there is already a registration for that client. (client is identified by endpoint Name), so the previous registration is removed and an unregister event is raised.

@oz117
Copy link
Author

oz117 commented Nov 4, 2019

I don't see what could be blocking the receiving threads. Whenever I receive a register request I use Rx to send a message to a separate thread that will in turn do a few read/observe requests.

Are the receiving and sending threads shared ?

@sbernard31
Copy link
Contributor

at CoAP level threads pool is shared to handle receiving and sending messages.

@oz117
Copy link
Author

oz117 commented Nov 5, 2019

Ok so then the locked threads come from the read/observe requests I do on the devices... I have a better understanding of what is happening then.
Thanks a lot.

Do yo have any advice on how to best handle this?

@sbernard31
Copy link
Contributor

I repeart myself :) :

@sbernard31
Copy link
Contributor

Could we close this issue ?

@oz117
Copy link
Author

oz117 commented Nov 13, 2019

Yep. I will close this now.
Thanks for the help :)

@oz117 oz117 closed this as completed Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Any question about leshan
Projects
None yet
Development

No branches or pull requests

2 participants