-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Leshan Client to launch thousands LWM2M clients in the same VM. #491
Comments
I'm writing a JMeter plugin to simulator thousands of lwm2m client in one JVM, but as questions mentioned in this issue, too much of threads are created for one single client. I use leshan-client-cf 1.0.0-M6, and if I call disableSecuredEndpoint(), then 5 below threads are created for each client.
It seems that DTLSConnectionHandler and CoAP server are still not using a common executor with leshan-client-cf 1.0.0-M6? |
You're right. still not implemented but glad to see there is interest for this enhancement. Did you try the workaround above (with endpoint Factory) ? |
No, I don't know where to update the code... Can you tell me more detailed info and I can test it. Thank you! |
This is where you create your LeshanClient. I suppose you should use a LeshanClientBuilder ? In this code example, |
OK, I see. Let me try. |
Yes, it seems work. Below is code fragment. Could you help to verify? Thank you. Below is the shared executor instance, which is a static variable to make sure all of client instances use the same executor.
Below code is invoked from multi-thread context, set the same static executor instance by method setExecutor.
Below is the thread-dump file: simulated 100 clients, and only one CoapServer thread was created. |
It sounds ok. personnally I will go for :
and
|
OK, thanks. I validated and it works. Do you have more recommendations for optimizing the thread usage? |
The thread remaining are :
Right ?
or create your own Deduplicator with a shared Executor. This is maybe possible to do it without code modification (I mean without providing a PR to californium) but this means lots of code duplication ... :/
|
Simon, thanks for your quick valuable response. Yes, so far we still have 4 threads for each client. I create a PR for the 1st, and we'll investigate the rests of 2 recommendations. |
It sounds to be a good plan 👍 |
While browsing around the issues I came across this one and I can see that there is some similarity with what we're trying to do. I'll try to explain the problem we're facing and please let me know @sbernard31 if you need more info or if I need to open a separate issue for that. We're running a modified version of leshan-client-demo, the modifications being the support for running multiple clients. We've implemented a number of thread related optimizations (ideas being similar to the ones described in this issue, plus different or additional changes for the 1.0.0-M12 version). System description: 2 vms, 1 gateway vm and 1 device simulator vm. The gateway can be considered as a "server", where all devices are registered. On the device simulator vm we run our modified leshan-client-demo. After starting up and registering with the gateway, each device receives an observe command for 3 different resources (eg: temperature, timer, addressable text display) and establishes observe relations. Traffic pattern: excluding the initial start-up time, there are three types of messages that are exchanged between the devices and the gateway. 1. non-confirmable udp messages (NON) -- these are the notify messages for each resource sent by each device to the gateway every five minutes, no reply from the gateway expected. 2. confirmable udp messages (CON) -- every 10th NON message is a CON (we have the NOTIFICATION_CHECK_INTERVAL_COUNT Californium parameter set to 10), meaning that after sending 9 NON messages, each device will send a CON message to the gateway. An ACK (acknowledgement) reply is expected from the gateway for each CON request. These are 75 bytes each. 3. keepalive udp messages -- these messages are sent every 90 seconds by the gateway to each device, so if we have a large number of devices, there's basically a constant flow of keepalive messages from the gateway to the device simulator vm. These are 42 bytes each. Problem: we have noticed that after running for some period of time (this depends on the number of devices we start, eg: ~6 hours for 10k devices with the described traffic pattern or ~50 hours for 1k devices) the devices are no longer able to receive the ACK type messages, retry a number of times and then cancel all observe relations. After investigating the problem, we discovered that there are lots of udp packet receive errors in the OS (we're using Ubuntu 16.04 and you can monitor this with the netstat -su command). We checked tcpdump on both sides (gateway and the device simulator vm) and everything is as expected, ACK packets are sent by the gateway and received on the device simulator VM. At first we though this is some OS network buffer related issue (due to the heavy traffic between gateway and devices) and tried a number of optimizations recommended by network experts but the issue remains the same. In the end, we came to the conclusion that the kernel fails to send the packages to the jvm process and the issue lies inside the jvm process (meaning that it's no longer accepting packets from the kernel). My question to you is why this might be happening. Is there some counter or limitation (after this limit is hit, no more udp messages are accepted or something else breaks) in leshan/californium that we are not aware of? The only other Californium parameter we have overriden (apart from NOTIFICATION_CHECK_INTERVAL_COUNT) is MAX_PEER_INACTIVITY_PERIOD. We set this to 30 hours. I'm not sure if this is the right place for this issue (whether the problem lies in the leshan, Californium or Scandium layer) so please let me know if I need to take it up with the Californium github. |
@nkvaratskhelia see #744 |
With #794, the threads remaining are connector sender/receiver threads (see eclipse-californium/californium#1203 for more details). Remaining detailed : Leshan client send request with sync API this makes the code easier to read and write but for simulation this could block thread uselessly .... so this is not ideal. |
#794 is now integrated in master. You can now use : ScheduledExecutorService sharedExecutor = Executors.newScheduledThreadPool(100,
new NamedThreadFactory("shared executor"));
LeshanClient[] clients = new LeshanClient[3000];
for (int i = 0; i < clients.length; i++) {
LeshanClientBuilder builder = new LeshanClientBuilder("myclient" + i);
builder.setSharedExecutor(sharedExecutor);
clients[i] = builder.build();
} |
Currently by default a Leshan client uses 10 threads :
We can reduce it by removing 3(UDP+1deduplicator) or 5(DTLS+1deduplicator) threads if your are using only UDP or only DTLS.
To do that LeshanClientBuilder.disableUnsecuredEndpoint() or disableSecuredEndpoint()...
But this is still 5 or 7 threads by clients, if you want to create a simulator which would run thousands of clients that's means thousands of threads there is no way this was a good design...
I suppose we should change the code a bit to be able to use only one thread pool executor for all the clients.
Currently this is partially done in californium/scandium, you can set a common executor for DTLSConnectionHandler and CoAP server so you will save 2 more thread by clients.
With Leshan 1.0.0-M5, this will looks like this :
The text was updated successfully, but these errors were encountered: