Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are Observations stored on client side? #1574

Open
Ozame opened this issue Jan 14, 2024 · 27 comments
Open

How are Observations stored on client side? #1574

Ozame opened this issue Jan 14, 2024 · 27 comments
Labels
question Any question about leshan

Comments

@Ozame
Copy link

Ozame commented Jan 14, 2024

Question

We are using the leshan to simulate thousands of devices/clients. Since performance-wise one of the limiting factors is the use of threads, we use shared thread pools and stop clients when the devices are sleeping. This approach has some issues, one such seems to be that when the LeshanClient is stopped, it seems to lose all the observations sent by the server. When testing the client with Leshan Server Demo, the observed values are updated nicely on the server if the client is not stopped. When stopped and started again, no notifications are sent when the observed resources are changed, unless the server re-observes them.

My question is, how are those observations stored on the client side, is there some way we could save these so that the client restart would not affect them?

@Ozame Ozame added the question Any question about leshan label Jan 14, 2024
@sbernard31
Copy link
Contributor

Could I ask which version of Leshan you are using ?

Since performance-wise one of the limiting factors is the use of threads, we use shared thread pools

Yep there was some improvement about that but still some limitations. (code was not initially thought with this use case in mind. Neither in Leshan nor Californium)

See for more details at :

and stop clients when the devices are sleeping.

Just by curiosity what kind of gain to you get with stop() ?

When testing the client with Leshan Server Demo, the observed values are updated nicely on the server if the client is not stopped. When stopped and started again, no notifications are sent when the observed resources are changed, unless the server re-observes them.

I guess this is because when you start the client it send a new Register Request.
And AS Observations are tied to 1 registration WHEN the server receive new register request THEN it create a new registration and drop the previous one with all its tied observation.

My question is, how are those observations stored on the client side, is there some way we could save these so that the client restart would not affect them?

If I'm right ☝️ this is not about saving the observations its about not sending a new registration on start.

@sbernard31
Copy link
Contributor

Are you using coap or coaps ?

@Ozame
Copy link
Author

Ozame commented Jan 15, 2024

Thanks for the quick reply!
We are using COAPS, and the Leshan version is 2.0.0-M14.

Just by curiosity what kind of gain to you get with stop() ?

With the thread pools we can control almost all threads, but the one remaining client-specific thread is the "DTLS-Receiver-0-/127.0.0.1:xxxxx". Stopping the client helps us get rid of this temporarily.

I guess this is because when you start the client it send a new Register Request.

We actually made a change in the RegistrationEngine class so that it checks the registeredServers-field on start method call, and if there exists a server in there, we try to do an update first. This seems to have fixed the behavior as needed, the same registration is kept.

    @Override
    public void start() {
        stop(false); // Stop without de-register
        synchronized (this) {
            started = true;
            // Try factory bootstrap
            LwM2mServer dmServer = factoryBootstrap();

            if (dmServer == null) {
                // If it failed try client initiated bootstrap
                if (!scheduleClientInitiatedBootstrap(NOW))
                    throw new IllegalStateException("Unable to start client : No valid server available!");
            } else {
                // If there exists a registered server already, we try to send a registration update
                // Only one server is supported for now
                if (!registeredServers.isEmpty()) {
                    String registrationId = registeredServers.entrySet().iterator().next().getKey();
                    var updateTask = new CustomRegistrationEngine.UpdateRegistrationTask(dmServer, registrationId,
                            new RegistrationUpdate());
                    updateFuture = schedExecutor.submit(updateTask);
                } else {
                    registerFuture = schedExecutor.submit(new CustomRegistrationEngine.RegistrationTask(dmServer));
                }
            }
        }
    }

Might be that I am thinking this wrong, but I feel that if it is the client who sends the notifications on resource change, it should have knowledge of the observations that have been done to it's resources. I was not able to pinpoint where this information could be, or if it exists at all on the client.

@sbernard31
Copy link
Contributor

sbernard31 commented Jan 15, 2024

This seems to have fixed the behavior as needed, the same registration is kept.

And it works for you now ?
Maybe we could add this to Leshan client (maybe optionally) ? 🤔

but I feel that if it is the client who sends the notifications on resource change, it should have knowledge of the observations that have been done to it's resources.

Observation is mainly a CoAP feature. and this seems to be a nightmare to implement in CoAP (+ there is some specification issue in the RFC)
LWM2M is supposed to be based on that CoAP rfc7641 but it twist a bit the feature and so it is also complicated for LWM2M dev to integrate it correctly. (because you are supposed to reuse what is already developed in the CoAP library but API not always fit for LWM2M use case ...)

Anyway, just to say that relation is mainly store in Coap Library code :

@Ozame
Copy link
Author

Ozame commented Jan 16, 2024

And it works for you now ?
Yep.

Observation-wise, we noticed that there is an ObservationStore related to the CoapEndpoint, and tried to save it for re-use when Endpoint is created in CoapsClientEndpointFactory's createCoapEndpoint. Not sure if it was something in our implementation, but seemed like the Endpoints did not add anything there.
But I will check those classes you shared, and see if something can be done with this.

@sbernard31
Copy link
Contributor

sbernard31 commented Jan 16, 2024

The ObservationStore is used at CoAP client side (and so at LWM2M server side)

@Ozame
Copy link
Author

Ozame commented Jan 16, 2024

The ObservationStore is used at CoAP client side (and so at LWM2M server side)

So it is of no use on the LWM2M client side?

Relating to this, can you suggest where to look for a way to persist the DTLS sesssion/connection ID on client restart? I noticed from the logs the InMemoryReadWriteLockConnectionStore is used for this, but the related SesssionStore seems like it needs the session id to be stored elsewhere. I am not quite sure where to snatch this..

@sbernard31
Copy link
Contributor

So it is of no use on the LWM2M client side?

That's it !

Relating to this, can you suggest where to look for a way to persist the DTLS sesssion/connection ID on client restart? I noticed from the logs the InMemoryReadWriteLockConnectionStore is used for this, but the related SesssionStore seems like it needs the session id to be stored elsewhere. I am not quite sure where to snatch this..

"DTLS sesssion/connection ID" , The ID behind connection confuse me a little. I don't know if you want to persist connection or just the connection ID value. I guess this is the former because the latter does not make so much sense ?

Anyway, Session and Connection are 2 very different concepts in (D)TLS.

In Californium,

  • about Session, I know there is a SessionStore but don't know if it is used at client side ? 🤔
  • about Connection, Maybe you could have a look at DTLSConnector.save() / DTLSConnector.load() from PersistentComponent interface ? 🤔

In all case if this is pure Californium question maybe better to directly ask there : https://github.com/eclipse-californium/californium/issues

the related SesssionStore seems like it needs the session id to be stored elsewhere. I am not quite sure where to snatch this..

There is some example here : #1395 (comment)

@Ozame
Copy link
Author

Ozame commented Jan 18, 2024

Ok, so this was solved. It appears that Leshan and californium actually handled the observations and sessions nicely, and stopping and starting the client did in fact keep the ObserveRelations in ObjectResource intact. Thanks for the tip on CoapResource, with that I was able to solve it by looking at how the resources are handled.

The culprit behind the sessions and observations being lost was in the RegistrationEngine, more specifically these lines in start:

 // Try factory bootstrap
            LwM2mServer dmServer = factoryBootstrap();

in factoryBootstrap, the endpoints are created again, so when the resource change is handled in ObjectResource, the sending of the notification fails due to the server not being found. This was because of the endpoint instance being different. Putting this behind a similar if like with the update/registration check solved it.

Thanks @sbernard31 for all the help!

Regarding the changes I made on the RegistrationEngine, is this start/stop handling something you could want in the default one, behind a configuration option? I could maybe submit a PR if so?

@sbernard31
Copy link
Contributor

Ok, so this was solved.

Glad to see you find a solution. 👍

Thanks @sbernard31 for all the help!

You're welcome. 🙂

Regarding the changes I made on the RegistrationEngine, is this start/stop handling something you could want in the default one, behind a configuration option? I could maybe submit a PR if so?

I have no clear idea if this is something which should be integrated OR not.
I'm not sure there is obvious use case but maybe this could be a step in adding support to Queue/Offline mode ? 🤔

The another point, I'm not sure if it can work with all transport layer java-coap implementation ? (because start stop destroy is mainly a californium concept, some libraries has just start/destroy concept)

Perhaps you could create a PR mainly to share the code, but I can guarantee that it will be integrated soon (or even later).
Or if this is just few lines of code you share it directly in this issue in a comment ?

But if we don't integrate it, how will you reuse Leshan ? by copy/paste/ modify the code ?
If yes, do you do that for some other reason ? or just for this one ?
I mean we can try to adapt the code to let you customize this behavior even if we don't provide something as simple as a configuration flag.

@Ozame
Copy link
Author

Ozame commented Jan 19, 2024

The another point, I'm not sure if it can work with all transport layer java-coap implementation ? (because start stop destroy is mainly a californium concept, some libraries has just start/destroy concept)

Yeah, good point.

I can maybe clean up the code a bit and create that PR. Whether or not it will be integrated, could prove useful for someone.

But if we don't integrate it, how will you reuse Leshan ? by copy/paste/ modify the code ?

Not sure I fully understand the question, but for our use case, I just created a new class using the copied RegistrationEngine/Factory code with the changes made. Not ideal but works. Maintenance-wise, upgrading leshan version will require some effort if the implementation changes, so having this in the library code itself would have been ideal.

Regarding the Queue mode, this kind of start/stop approach works for us. I.e. we handle the wake-up times and data being sent on start up on a higher level, outside Leshan. But I would guess the other way to have the queue support for client would be to have the periodic communication and queued requests/responses within Leshan. I'm not so familiar with the lwm2m specs, so don't know how much is specified regarding this mode's behavior.

The real device we are simulating here will be using queue mode, waking up only for the periodic communication. But at least on our case, the command queues and those will be mostly on server side and the client's queue-mode responsibilities are simpler in that regard.

@sbernard31
Copy link
Contributor

sbernard31 commented Jan 19, 2024

I can maybe clean up the code a bit and create that PR. Whether or not it will be integrated, could prove useful for someone.

👍

I just created a new class using the copied RegistrationEngine/Factory code with the changes made. Not ideal but works.

That's exactly my question.

Maintenance-wise, upgrading leshan version will require some effort if the implementation changes, so having this in the library code itself would have been ideal.

Yep and my point was IF we don't integrate your PR THEN we can maybe find a solution where we adapt the DefaultRegistrationEngine in a way it could be extended without too much modification for your use case ? (but let's see your code first)

Regarding the Queue mode, this kind of start/stop approach works for us. I.e. we handle the wake-up times and data being sent on start up on a higher level, outside Leshan.

Is it possible to you to share a link to the code I'm curious to see how API is used ?

@Ozame
Copy link
Author

Ozame commented Jan 19, 2024

Is it possible to you to share a link to the code I'm curious to see how API is used ?

The code I cannot share too much, but in short:

Client usage wise, we just call its start and stop methods. When the client is supposed to wake up, we do

  • client.start()
  • trigger updates on the observed resource, along the lines of:
      int objectId = X;
       LwM2mObjectEnabler objectEnabler = client.getObjectTree().getObjectEnabler(objectId);
       LwM2mInstanceEnabler enablerInstance = ((ObjectEnabler) objectEnabler).getInstance(0);
       var objectInstance = (ObjectXEnabler) enablerInstance;
       objectInstance.updateResources();
  • Set a timer to eventually trigger call to client.stop()

When building the client:

  • we add the mentioned custom registration engine by using LeshanClientBuilder.setRegistrationFactory()
  • we create a CaliforniumClientEndpointsProvider with the builder, setting the relevant options.
  • For that provider, we have CoapsClientProtocolProvider with createDefaultEndpointFactory overridden, so that we can override the createSecuredConnector in there, to specify the executor to use our threadpool:
                    @Override
                    protected Connector createSecuredConnector(DtlsConnectorConfig dtlsConfig) {
                        var con = new DTLSConnector(dtlsConfig);
                        con.setExecutor(sharedExecutor);
                        return con;
                    }

I'll try to create the PR when I get the chance 👍

@sbernard31
Copy link
Contributor

Thx for details. 🙏

I wait for the PR and then we will see what we do to limit your code duplication about RegistrationEngine. 🙂

@Ozame
Copy link
Author

Ozame commented Feb 2, 2024

@sbernard31 I've added a draft PR for the registration engine. There is one other change I would be interested in making, but I will most likely create a separate issue/PR for that one, as it is unrelated to this feature.

@sbernard31
Copy link
Contributor

There is one other change I would be interested in making, but I will most likely create a separate issue/PR for that one, as it is unrelated to this feature.

👍

@sbernard31
Copy link
Contributor

sbernard31 commented Feb 6, 2024

About #1574 (comment),

I mean we can try to adapt the code to let you customize this behavior even if we don't provide something as simple as a configuration flag.

The modification in Leshan could looks like : 4ddcce5

And so your code to modify behavior would be like :

private final DefaultRegistrationEngineFactory engineFactory = new DefaultRegistrationEngineFactory() {
    @Override
    protected RegistrationEngine doCreate(String endpoint, LwM2mObjectTree objectTree,
            EndpointsManager endpointsManager, UplinkRequestSender requestSender, BootstrapHandler bootstrapState,
            LwM2mClientObserver observer, java.util.Map<String, String> additionalAttributes,
            java.util.Map<String, String> bsAdditionalAttributes, ScheduledExecutorService executor,
            long requestTimeoutInMs, long deregistrationTimeoutInMs, int bootstrapSessionTimeoutInSec,
            int retryWaitingTimeInMs, Integer communicationPeriodInMs, boolean reconnectOnUpdate,
            boolean resumeOnConnect, boolean useQueueMode, ContentFormat preferredContentFormat,
            java.util.Set<ContentFormat> supportedContentFormats, LinkFormatHelper linkFormatHelper) {

        return new DefaultRegistrationEngine(endpoint, objectTree, endpointsManager, requestSender, bootstrapState,
                observer, additionalAttributes, bsAdditionalAttributes, executor, requestTimeoutInMs,
                deregistrationTimeoutInMs, bootstrapSessionTimeoutInSec, retryWaitingTimeInMs,
                communicationPeriodInMs, reconnectOnUpdate, resumeOnConnect, useQueueMode, preferredContentFormat,
                supportedContentFormats, linkFormatHelper) {
            @Override
            protected void onWakeUp(Map<String, LwM2mServer> registeredServers) {
                if (registeredServers.isEmpty()) {
                    super.onWakeUp(registeredServers);
                } else {
                    // TODO support multiple servers
                    triggerRegistrationUpdate(registeredServers.values().iterator().next());
                }
            }
        };
    }
};

This is not so elegant (too many argument ... ) but it should avoid you a lot of code duplicate.

(This could be a solution if we decide to not integrate #1579)

@Ozame
Copy link
Author

Ozame commented Feb 9, 2024

I think your suggestion looks good, definitely would reduce the duplication on both the default engine and the factory. I think that kind of hook approach would work nicely, but one thing I'm concerned is how to guarantee that the right fields from the engine are included on that onWakeUp (also onShutDown or similar would be needed).

Concrete example: I recently had to change the executor-logic on start and stop to create and kill the executor, respectively, when the methods are called. This was because I noticed that specifying a shared executor (with x amount of threads) for the client would inevitably lead to it's threads being blocked all the time, essentially failing the registration updates after couple days. This was visible with 100 clients already. With this approach, we use 2 threads per client (registration engine, dtls receiver) ,but the performance so far seems way better than with the pool.
On the code:

 public void start() {
        stop(false); // Stop without de-register
        synchronized (this) {
            started = true;
            if (attachedExecutor && scheduledExecutor.isShutdown()) {
                scheduledExecutor = createScheduledExecutor();
            }
            LwM2mServer dmServer;
....

and

public void stop(boolean deregister) {
...
  cancelRegistrationTask();
            // we should manage the case where we stop in the middle of a bootstrap session ...
            cancelBootstrapTask();
            if (attachedExecutor) {
                try {
                    scheduledExecutor.shutdownNow();
                    scheduledExecutor.awaitTermination(bootstrapSessionTimeoutInSec, TimeUnit.SECONDS);
                } catch (InterruptedException e) {
...

So this kind of use would require that the executor could be set/get on the onWakeUp and onShutDown hooks.

But, like you said, I'm also not sure how much of use this will be for users without the actual Queue mode support. Is that something that's in the plans for the future?

Overall, I think Leshan's been great to work with, despite our use case of massive client amount not being so well supported.. I think most headaches been caused by the thread usage, optimization of which has been lots of trial and error 😄 But I think we're almost there.

@sbernard31
Copy link
Contributor

I'm concerned is how to guarantee that the right fields from the engine are included on that onWakeUp (also onShutDown or similar would be needed).

Yep I understand. Not exhaustive list of solutions :

  • add new parameters to the method each time someone has a new need.
  • remove parameter and change visibility for needed attribute (change from private to protected).
  • remove parameter and change visbility to all all attributes (and optionally all private method too). This increase extensibility but also make future modification more harder when trying to respected Semantic Versioning.

Note that currently Leshan 2.0.0 is in development and API can be change/break between 2 milestones release. (then when stable version will be release we will try to respect semantic versioning.

But let's better understand your new needs before to get this kind of decision.

This was because I noticed that specifying a shared executor (with x amount of threads) for the client would inevitably lead to it's threads being blocked all the time,

I'm not sure to get why ? because we use sync API to send request in DefaultRegistrationEngine ? OR ?

With this approach, we use 2 threads per client (registration engine, dtls receiver) ,but the performance so far seems way better than with the pool.

And so you kill executor on stop to release threads ? like you stop() connector to release thread ?

But, like you said, I'm also not sure how much of use this will be for users without the actual Queue mode support. Is that something that's in the plans for the future?

This is in the scope of the project as this is part of LWM2M specification.
Until now nobody express this kind of need and so there is no plan at short mid term.
But this is a community driven project, so plan could change.

Overall, I think Leshan's been great to work with, despite our use case of massive client amount not being so well supported...
I think most headaches been caused by the thread usage, optimization of which has been lots of trial and error 😄 But I think we're almost there.

Improving Leshan client to be able to simulate lot of client would be great. I appreciate your work and I would be happy if we could make it better together.

Maybe we should create a new issue, where we clearly identify problem than we try to define what would be the real good solution for each problem, then also a workaround/short term solution.

By the way long time ago, I did that : https://github.com/sbernard31/benchmark-clients
but this is based on Leshan 1.x

@sbernard31
Copy link
Contributor

Did you try to create to use VirtualThread of Java 21 and so maybe you can use much more thread and no need to stop/start all this scheduler ?

See : https://docs.oracle.com/en/java/javase/21/core/virtual-threads.html#GUID-C0FEE349-D998-4C9D-B032-E01D06BE55F2

(I never tested it)

@Ozame
Copy link
Author

Ozame commented Feb 9, 2024

I'm not sure to get why ? because we use sync API to send request in DefaultRegistrationEngine ? OR ?

This behavior was indeed very weird. I profiled the whole thing with JProfiler, and it seemed like every time there would be too many simultaneous connections, all the threads in the pool (set with clientbuilder.setSharedExecutor) would start blocking, as if they couldn't make the call in

 RegisterResponse response = sender.send(server, request, requestTimeoutInMs);

After we stopped the client, we would get the "Registration/Update task interrupted" on the logs, which would indicate the send was stuck until that point. As for why, could not pinpoint the reason.

I actually tried to make an async version of the registrationengine, but the only change was that the behavior seemed to happen in that async thread. The clients would communicate, but the amount of Leshan Async Request timeout would stay the same as the client amount. That's where I got the idea to just use the default approach of one RegistrationEngine#%d thread per client, but with manual start and stop.

Here you can see that while the dtsl-receiver and timer were nicely shutdown on client.stop(), the async thread kept on going. I could not save the stacktrace from that that, but IIRC it was always stuck on that sender.send.

image

BTW I think it would not maybe be impossible to reproduce this, just by having ~100 clients connecting to a server, starting and stopping per timers, and being given a very limited threadpool (size ~2).

And so you kill executor on stop to release threads ? like you stop() connector to release thread ?

Yep, the executor that is basically just that registrationengine-thread.

Did you try to create to use VirtualThread of Java 21

Sadly not, still using Java 17 in this project :( But would be interesting to see how that would affect things.

Maybe we should create a new issue, where we clearly identify problem than we try to define what would be the real good solution for each problem, then also a workaround/short term solution.

That sounds like a good idea.

@sbernard31
Copy link
Contributor

This behavior was indeed very weird.

Thx for you explanation. For now it's hard to me to really understand the issue. 🤔

As DefaultRegistrationEngine is using sync send API, it means that a thread is block when sending a request until it get a response.

If all RegistrationEngine use same threadpool than the coap and/or DTLS stack then we can imagine a situation where all client use 1 thread of the pool for sending a request then there is no more thread available to really do the job at coap/dtls level. (sending/receiving/handling data)

But using an async send should solve the issue because RegistrationEngine should not block thread anymore... 🤔

So I'm confused 😕

Did you try to use several shared thread pool. I mean 1 threadpool for all DefaultRegistrationEngine and another for other tasks (like coap/dtls) ?

Yep, the executor that is basically just that registrationengine-thread.

So at least there is a kind of workaround 😬

But would be interesting to see how that would affect things.

Let me know if you test that.

@sbernard31
Copy link
Contributor

I created an issue to summarize all about simulating several client with Leshan : #1585

@Ozame
Copy link
Author

Ozame commented Feb 14, 2024

Did you try to use several shared thread pool. I mean 1 threadpool for all DefaultRegistrationEngine and another for other tasks (like coap/dtls) ?

Yep, tried to have a separate pool for the DTLS-connector (USE 1 below), this basically affected the DTLS-timers based on my observations. BUT later on I realized that we specified that other pool also to the provider, and not just the client. Not sure if this makes any difference, but didn't happen to test.

So: one pool for this:

USE 1

   @Override
                    protected Connector createSecuredConnector(DtlsConnectorConfig dtlsConfig) {
                        var con = new DTLSConnector(dtlsConfig);
                        con.setExecutor(sharedExecutor2);
                        return con;
                    }

another pool for these

USE 2

 clientBuilder.setSharedExecutor(sharedExecutor)

USE 3

        var endpointsProvider = (CaliforniumClientEndpointsProvider) client.getEndpointsProvider().toArray()[0];
        endpointsProvider.getCoapServer().setExecutors(sharedExecutor, sharedExecutor, true);

then there is no more thread available to really do the job at coap/dtls level.

This was our initial guess, that there would be some kind of deadlock-type of situation where all threads are in use and can't be released.

@sbernard31
Copy link
Contributor

Maybe I don't get you correctly, so let me know if I misunderstood you.
I get you have dedicated pool for all DTLSConnector AND another pool shared between all RegistrationEngine and CoapServer.

My point is that you should try a dedicated pool for all RegistrationEngine and a dedicated pool for all CoapServer. (so 3 different thread pool)

AFAIK :

  • CoapServer is not supposed to block thread.
  • Registration engine block it because of usage of sync API to send request.

Let me know if with this 3 different thread pool you still need destroy/recreate RegistrationEngine Schedulor on stop/start.

@Ozame
Copy link
Author

Ozame commented Mar 1, 2024

I decided to try this one out with the Java 21 virtual threads, since it turned out we could maybe change the java version after all. With three separate executors provided to these three uses, each using virtual threads, we were able to get much better performance as expected, since we can have way more threads running in each of these cases. When providing this external executor, the RegistrationEngine executor is not touched on stop/start.

However, I feel this is more of a band aid solution, and there might be some logical error in the way we are handling all of this.
I might try and see if we can approach this from a different angle, with perhaps using the using an asynchronous version of the registration engine, and not so forcefully stopping the clients.

@sbernard31
Copy link
Contributor

Thx for sharing that.
Good news that Java 21 Virtual threads seems to help.

However, I feel this is more of a band aid solution

Yep surely a aync RegistrationEngine in Leshan and a DTLSConnector in Californium based on netty could be the right moves but this is probably lot of work.

For Leshan, If you get a not so bad async implementation of registration engine, Please share the code (opening PR ?)

Let me know if you need help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Any question about leshan
Projects
None yet
Development

No branches or pull requests

2 participants