Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Leshan server ready for production in industry ? #1614

Open
EmbGangsta opened this issue May 2, 2024 · 11 comments
Open

Is Leshan server ready for production in industry ? #1614

EmbGangsta opened this issue May 2, 2024 · 11 comments
Labels
question Any question about leshan

Comments

@EmbGangsta
Copy link

Question

Hi,

Our company (www.cls.fr) is very interested in using LwM2M and we would like to add instance of leshan server inside our infrastructure but I wonder if today leshan is ready to be used in industrial in production mode ?

We need to support LwM2M 1.1 with queue mode as our devices will be sleeping most of the time, will use dynamic IP addresses (devices are using cellular modems).

Then 10000+ beacons could connect at the same time, is Leshan able to cope with this ?

What would be the licensing model in use ?

Regards

@EmbGangsta EmbGangsta added the question Any question about leshan label May 2, 2024
@sbernard31
Copy link
Contributor

Hi,

I wonder if today leshan is ready to be used in industrial in production mode ?

At least, we try to be. (except demos and some experimental feature which are not production ready, the client is maybe a bit less production ready, I think it is mainly used for testing)
CoAP and CoAP over DTLS 1.2 based on Californium/Scandium are currently the more production ready transport layer.

I can also say that Leshan 1.x is currently used in production at Semtech (SierraWireless)
I'm not totally sure but I understand that Orange is currently using Leshan 2.0.0-Mx in production.

You can also see :

Leshan 1.x is the stable release (stable API). It implements LWM2M v1.0.x only and is based on Californium/Scandium 2.x.
Leshan 2.0.x is the in development (not stable API, you could face API changes between 2 release. It implements LWM2M v1.1.x only and is based on Californium/Scandium 3.x. (4.x is there is a release before the stable release?)

See more details at : https://github.com/eclipse-leshan/leshan/wiki/Roadmap

We need to support LwM2M 1.1 with queue mode as our devices will be sleeping most of the time, will use dynamic IP addresses (devices are using cellular modems).

This is a classic use case and should be supported following : https://github.com/eclipse-leshan/leshan/wiki/LWM2M-Devices-with-Dynamic-IP

Then 10000+ beacons could connect at the same time, is Leshan able to cope with this ?

It's very hard to answer to this kind of question, it depends on so many different factor but I guess this should be OK 🤔
But whatever the answer you will get to this question, I advice you to do some test performances anyway.

What would be the licensing model in use ?

Leshan is dual licensing so you can choose to use one license or another.
More details at :

If all of this information is not enough, you could directly contact license@eclipse.org

@sbernard31
Copy link
Contributor

@JaroslawLegierski, @jvermillard, @cyril2maq, @gcx-seb maybe you have some experience to share about your usage of Leshan ?

@jvermillard
Copy link
Contributor

I use Leshan 2.0.x in production for some customers (I'm a freelancer), and even load-tested it way above 10k devices, so if you need 2.0.x features, it's totally doable to run it in prod; but you need to be careful when upgrading milestones.

IMO, 10k devices can be doable on a single machine. If you target more than 50~100k+ it's where you need more work to have a multinode setup + redis to manage the sessions

@boaks
Copy link

boaks commented May 2, 2024

Then 10000+ beacons could connect at the same time, is Leshan able to cope with this ?

In my experience (coap/dtls, Californium), it's not only about the number of devices, it depends also on the number of intended messages. 1.000.000 devices sending every hour results in less than 300 msg/s. But 1000 webcams with 30 msg/s will result in 30.000 msgs/s. So, how frequently are your beacons considered to send messages?

I also made frequently the experience, that not the CoAP device frontend is a performance bottleneck, in quite a lot of times, it's the application-backend. So, please also verify, that this is able to process the load.

@EmbGangsta
Copy link
Author

Ok, actually our beacons are not very verbose, we are generally sending around 16 bytes every 1 minutes in term of data push (SEND method,, only the useful data speaking without SENML CBOR overhead). Exceptionnally it could output more data if beacon was out of cellular coverage during long time (datalogger feature), in this case the beacon may output more than 1500 bytes x 50 times in the same connection but then that's all ... So the average uplink content is really reduced and we will use CBOR and opaque data as much as possible

@boaks
Copy link

boaks commented May 2, 2024

(devices are using cellular modems)

we are generally sending around 16 bytes every 1 minutes

That's less than 200 msgs/s, so it should not be an issue.

Just to mention: 16 bytes application data will require about 100 bytes additional for ip, udp, dtls, and coap. That will be 5MB per month and device. And it will drain the battery a lot.

@EmbGangsta
Copy link
Author

We have different ways to cope with data thoughput VS battery. When generating lot of data (connection period short < 5min) it's generally in USB plugged case, otherwise we have configs to deliver data by chunks to limit overhead of connections / transmits. That's also why I am pushing to use LwM2M today instead of MQTT !!

@cyril2maq
Copy link

At Orange we do use leshan 2.X in production, with heterogeneous types of devices and LwM2M SDK connected to ours servers, and I can confirm it is very robust. Even more in your case, where it seems that you manage both device and server code.

And as @jvermillard and @sbernard31 mentioned, the API can still change (with breaking change), so you need to anticipate this in your project.

On our side, with regards to our backend application, we currently aim about 5k devices per leshan instance. So we performed loading tests with 5k devices on 1 instance with heavy usage (bootstrap, firmware updates, observation...). As @boaks mentionned, performance limitations will probably not come from leshan but rather from your backend application. And FYI, to handle multi-instances, you will need some work regarding specific implementations (using redis to share sessions).

@EmbGangsta
Copy link
Author

Thanks.
I have now to present this solution to our infra architect and come back with other questions.

Regards

@PadmabushanReddy
Copy link

Apart from Leshan, Most of the "Production Readiness" Depends on the backend integrations (mainly the patterns used to integrate) with your own frameworks and managing client connect, disconnect and bootstrapping cycles ( loadbalancing and avoiding thundering heards)
We have Leshan v1.x running in a clustered setup with backend integrated to C* and Kafka (while keeping redis for Auth) with 2.5 Million actively connected devices(Hubs) sending us events (using observe-notify) in the range of 500-600 events/day/device. This load is served by 25 instances of leshan (staying at 100k per server since the Californium/scandium had a max connection value at 150k)
As your scale increases (beyond say 1M active/live connections), you need to reinvent dynamic loadbalancing ( bootstrap based or otherwise) yourself. Building in the observability would help a lot in understanding the application and loadpattern.
We still didnt try Leshan 2.0 at scale. I'm hoping it can hold 4-5x more connections than v1.x

@boaks
Copy link

boaks commented May 6, 2024

@PadmabushanReddy

Thanks a lot for let us know!

since the Californium/scandium had a max connection value at 150k

150k is the default, with Cf 3.11 it will be

Californium3.properties:

# Maximum number of active peers.
# Default: 150000
COAP.MAX_ACTIVE_PEERS=1000000

# DTLS maximum connections.
# Default: 150000
DTLS.MAX_CONNECTIONS=1000000

to adapt that. With Cf 2.x it depends on

Californium.properties:

MAX_ACTIVE_PEERS=1000000

and which value is passed to

DtlsConnectorConfig.Builder.setMaxConnections(int)

You may need to check here in the Leshan project, what is supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Any question about leshan
Projects
None yet
Development

No branches or pull requests

6 participants