Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Leshan Client to launch thousands LWM2M clients in the same VM. #491

Closed
sbernard31 opened this issue Mar 21, 2018 · 15 comments
Closed
Labels
client Impact LWM2M client new feature New feature from LWM2M specification

Comments

@sbernard31
Copy link
Contributor

sbernard31 commented Mar 21, 2018

Currently by default a Leshan client uses 10 threads :

  • 1 UDP sender (from element-connector)
  • 1 UDP receiver (from element-connector)
  • 2 Deduplicator (from californium)
  • 1 Coap Server (from californium)
  • 1 DTLS sender (from scandium)
  • 1 DTLS receiver (from scandium)
  • 1 DTLS retransmit (from scandium)
  • 1 DTLS connectionHandler (from scandium)
  • 1 registration engine (from leshan)

We can reduce it by removing 3(UDP+1deduplicator) or 5(DTLS+1deduplicator) threads if your are using only UDP or only DTLS.
To do that LeshanClientBuilder.disableUnsecuredEndpoint() or disableSecuredEndpoint()...

But this is still 5 or 7 threads by clients, if you want to create a simulator which would run thousands of clients that's means thousands of threads there is no way this was a good design...

I suppose we should change the code a bit to be able to use only one thread pool executor for all the clients.
Currently this is partially done in californium/scandium, you can set a common executor for DTLSConnectionHandler and CoAP server so you will save 2 more thread by clients.

With Leshan 1.0.0-M5, this will looks like this :

final ScheduledExecutorService executor = Executors.newScheduledThreadPool(1);
builder.setEndpointFactory(new EndpointFactory() {
    @Override
    public CoapEndpoint createUnsecuredEndpoint(InetSocketAddress address, NetworkConfig coapConfig,
            ObservationStore store) {
         return new CoapEndpoint(address, coapConfig);
    }
    @Override
    public CoapEndpoint createSecuredEndpoint(DtlsConnectorConfig dtlsConfig, NetworkConfig coapConfig,
            ObservationStore store) {
        DTLSConnector dtlsConnector = new DTLSConnector(dtlsConfig);
        dtlsConnector.setExecutor(new StripedExecutorService(executor));

        return new CoapEndpoint(dtlsConnector, coapConfig, null, null);
    }
});
client = builder.build();
client.getCoapServer().setExecutor(executor);
@sbernard31 sbernard31 added new feature New feature from LWM2M specification client Impact LWM2M client labels Mar 21, 2018
@jinfahua
Copy link

jinfahua commented May 30, 2018

I'm writing a JMeter plugin to simulator thousands of lwm2m client in one JVM, but as questions mentioned in this issue, too much of threads are created for one single client. I use leshan-client-cf 1.0.0-M6, and if I call disableSecuredEndpoint(), then 5 below threads are created for each client.

  • CoapServer
  • Deduplicator1
  • RegistrationEngine#0
  • UDP-Receiver-0.0.0.0/0.0.0.0:0[0]
  • UDP-Sender-0.0.0.0/0.0.0.0:0[0]

It seems that DTLSConnectionHandler and CoAP server are still not using a common executor with leshan-client-cf 1.0.0-M6?

@sbernard31
Copy link
Contributor Author

sbernard31 commented May 30, 2018

It seems that DTLSConnectionHandler and CoAP server are still not using a common executor with leshan-client-cf 1.0.0-M6?

You're right. still not implemented but glad to see there is interest for this enhancement.

Did you try the workaround above (with endpoint Factory) ?

@jinfahua
Copy link

No, I don't know where to update the code... Can you tell me more detailed info and I can test it. Thank you!

@sbernard31
Copy link
Contributor Author

This is where you create your LeshanClient. I suppose you should use a LeshanClientBuilder ?

In this code example, builder is a LeshanClientBuilder.

@jinfahua
Copy link

OK, I see. Let me try.

@jinfahua
Copy link

jinfahua commented May 30, 2018

Yes, it seems work. Below is code fragment. Could you help to verify? Thank you.

Below is the shared executor instance, which is a static variable to make sure all of client instances use the same executor.

 private static ScheduledExecutorService executor = null;
 
 private static synchronized ScheduledExecutorService getExecutor(LeshanClientBuilder builder) {
  if(executor == null) {
   executor = Executors.newScheduledThreadPool(1, new NamedThreadFactory("CoapServer#%d"));
   builder.setEndpointFactory(new EndpointFactory() {
    @Override
       public CoapEndpoint createUnsecuredEndpoint(InetSocketAddress address, NetworkConfig coapConfig, ObservationStore store) {
            return new CoapEndpoint(address, coapConfig);
       }
       @Override
       public CoapEndpoint createSecuredEndpoint(DtlsConnectorConfig dtlsConfig, NetworkConfig coapConfig, ObservationStore store) {
           DTLSConnector dtlsConnector = new DTLSConnector(dtlsConfig);
           dtlsConnector.setExecutor(new StripedExecutorService(executor));
           return new CoapEndpoint(dtlsConnector, coapConfig, null, null);
       }
   }); 
  }
  return executor;
 }

Below code is invoked from multi-thread context, set the same static executor instance by method setExecutor.

LeshanClientBuilder builder = new LeshanClientBuilder(context.getParameter(PARA_ENDPOINT));

try {
  ObjectsInitializer initializer = new ObjectsInitializer();
  int shortServerId = Integer.parseInt(context.getParameter(PARA_COAP_SHORT_SERV_ID));
  initializer.setInstancesForObject(...);
  builder.setObjects(initializer.create(LwM2mId.SECURITY, LwM2mId.SERVER, LwM2mId.DEVICE));

  LeshanClient client = builder.build();
  client.getCoapServer().setExecutor(getExecutor(builder));
  ...

Below is the thread-dump file: simulated 100 clients, and only one CoapServer thread was created.

thread_w9.txt

@sbernard31
Copy link
Contributor Author

It sounds ok.

personnally I will go for :

 private final static ScheduledExecutorService executor =  Executors.newScheduledThreadPool(1, new NamedThreadFactory("CoapServer#%d"));

 private static ScheduledExecutorService getExecutor() {
  return executor;
 }

and

LeshanClientBuilder builder = new LeshanClientBuilder(context.getParameter(PARA_ENDPOINT));

try {
  ObjectsInitializer initializer = new ObjectsInitializer();
  int shortServerId = Integer.parseInt(context.getParameter(PARA_COAP_SHORT_SERV_ID));
  initializer.setInstancesForObject(...);
  builder.setObjects(initializer.create(LwM2mId.SECURITY, LwM2mId.SERVER, LwM2mId.DEVICE));
  builder.setEndpointFactory(new EndpointFactory() {
    @Override
       public CoapEndpoint createUnsecuredEndpoint(InetSocketAddress address, NetworkConfig coapConfig, ObservationStore store) {
            return new CoapEndpoint(address, coapConfig);
       }
       @Override
       public CoapEndpoint createSecuredEndpoint(DtlsConnectorConfig dtlsConfig, NetworkConfig coapConfig, ObservationStore store) {
           DTLSConnector dtlsConnector = new DTLSConnector(dtlsConfig);
           dtlsConnector.setExecutor(new StripedExecutorService(getExecutor()));
           return new CoapEndpoint(dtlsConnector, coapConfig, null, null);
       }
   }); 

  LeshanClient client = builder.build();
  client.getCoapServer().setExecutor(getExecutor());

@jinfahua
Copy link

OK, thanks. I validated and it works. Do you have more recommendations for optimizing the thread usage?

@sbernard31
Copy link
Contributor Author

The thread remaining are :

  • Deduplicator1
  • RegistrationEngine#0
  • UDP-Receiver-0.0.0.0/0.0.0.0:0[0]
  • UDP-Sender-0.0.0.0/0.0.0.0:0[0]

Right ?

  1. For RegistrationEngine, we need to modify the Leshan Client code (PR is welcomed ;)).
  2. For Deduplicator, you can use a NoDeduplicator but you will not be able to detect duplicate message
NetworkConfig coapConfig  = LeshanClientBuilder.createDefaultNetworkConfig();
coapConfig.set(NetworkConfig.Keys.DEDUPLICATOR, NetworkConfig.Keys.NO_DEDUPLICATOR);

builder.setCoapConfig(coapConfig);

or create your own Deduplicator with a shared Executor. This is maybe possible to do it without code modification (I mean without providing a PR to californium) but this means lots of code duplication ... :/

  1. For UDP-Receiver/Sender we need to create a new UDPConnector, there is no easy way to do that. Like 2, maybe possible with code duplication but the best solution should be to provide a PR.

@jinfahua
Copy link

jinfahua commented May 30, 2018

Simon, thanks for your quick valuable response.

Yes, so far we still have 4 threads for each client. I create a PR for the 1st, and we'll investigate the rests of 2 recommendations.

@sbernard31
Copy link
Contributor Author

It sounds to be a good plan 👍

@nkvaratskhelia
Copy link

While browsing around the issues I came across this one and I can see that there is some similarity with what we're trying to do. I'll try to explain the problem we're facing and please let me know @sbernard31 if you need more info or if I need to open a separate issue for that.

We're running a modified version of leshan-client-demo, the modifications being the support for running multiple clients. We've implemented a number of thread related optimizations (ideas being similar to the ones described in this issue, plus different or additional changes for the 1.0.0-M12 version).

System description: 2 vms, 1 gateway vm and 1 device simulator vm. The gateway can be considered as a "server", where all devices are registered. On the device simulator vm we run our modified leshan-client-demo. After starting up and registering with the gateway, each device receives an observe command for 3 different resources (eg: temperature, timer, addressable text display) and establishes observe relations.

Traffic pattern: excluding the initial start-up time, there are three types of messages that are exchanged between the devices and the gateway. 1. non-confirmable udp messages (NON) -- these are the notify messages for each resource sent by each device to the gateway every five minutes, no reply from the gateway expected. 2. confirmable udp messages (CON) -- every 10th NON message is a CON (we have the NOTIFICATION_CHECK_INTERVAL_COUNT Californium parameter set to 10), meaning that after sending 9 NON messages, each device will send a CON message to the gateway. An ACK (acknowledgement) reply is expected from the gateway for each CON request. These are 75 bytes each. 3. keepalive udp messages -- these messages are sent every 90 seconds by the gateway to each device, so if we have a large number of devices, there's basically a constant flow of keepalive messages from the gateway to the device simulator vm. These are 42 bytes each.

Problem: we have noticed that after running for some period of time (this depends on the number of devices we start, eg: ~6 hours for 10k devices with the described traffic pattern or ~50 hours for 1k devices) the devices are no longer able to receive the ACK type messages, retry a number of times and then cancel all observe relations.

After investigating the problem, we discovered that there are lots of udp packet receive errors in the OS (we're using Ubuntu 16.04 and you can monitor this with the netstat -su command). We checked tcpdump on both sides (gateway and the device simulator vm) and everything is as expected, ACK packets are sent by the gateway and received on the device simulator VM. At first we though this is some OS network buffer related issue (due to the heavy traffic between gateway and devices) and tried a number of optimizations recommended by network experts but the issue remains the same. In the end, we came to the conclusion that the kernel fails to send the packages to the jvm process and the issue lies inside the jvm process (meaning that it's no longer accepting packets from the kernel).

My question to you is why this might be happening. Is there some counter or limitation (after this limit is hit, no more udp messages are accepted or something else breaks) in leshan/californium that we are not aware of?

The only other Californium parameter we have overriden (apart from NOTIFICATION_CHECK_INTERVAL_COUNT) is MAX_PEER_INACTIVITY_PERIOD. We set this to 30 hours.

I'm not sure if this is the right place for this issue (whether the problem lies in the leshan, Californium or Scandium layer) so please let me know if I need to take it up with the Californium github.

@sbernard31
Copy link
Contributor Author

@nkvaratskhelia see #744

@sbernard31
Copy link
Contributor Author

With #794, the threads remaining are connector sender/receiver threads (see eclipse-californium/californium#1203 for more details).

Remaining detailed : Leshan client send request with sync API this makes the code easier to read and write but for simulation this could block thread uselessly .... so this is not ideal.

@sbernard31
Copy link
Contributor Author

#794 is now integrated in master.

You can now use :

ScheduledExecutorService sharedExecutor = Executors.newScheduledThreadPool(100,
        new NamedThreadFactory("shared executor"));

LeshanClient[] clients = new LeshanClient[3000];
for (int i = 0; i < clients.length; i++) {
    LeshanClientBuilder builder = new LeshanClientBuilder("myclient" + i);
    builder.setSharedExecutor(sharedExecutor);
    clients[i] = builder.build();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client Impact LWM2M client new feature New feature from LWM2M specification
Projects
None yet
Development

No branches or pull requests

3 participants