Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics to keep track of the total time the devices are connected to Hono #1287

Open
kaniyan opened this issue Jun 13, 2019 · 5 comments
Open
Assignees
Projects
Milestone

Comments

@kaniyan
Copy link
Contributor

@kaniyan kaniyan commented Jun 13, 2019

Add metrics to keep track of the total time the devices are connected to Hono per tenant.

Currently there are metrics in Hono to track the number of simultaneous device connections. In addition to it, it is also valuable to track how long the devices are connected to Hono per tenant. This can also be used to effectively manage the resources by setting the resource-limits and also shown in the dashboard.

@calohmn calohmn added the Metrics label Jul 16, 2019
@sophokles73 sophokles73 added this to the 1.1.0 milestone Oct 19, 2019
@sophokles73 sophokles73 added this to To do in 1.1.0 via automation Oct 19, 2019
@kaniyan

This comment has been minimized.

Copy link
Contributor Author

@kaniyan kaniyan commented Nov 11, 2019

The MQTT and AMQP protocol adapters which maintain connection state already provide metrics namely hono.connections to track the number of connected devices per tenant.

Now we would also like to find out the total time the devices are connected per tenant. After taking a closer look at the existing metrics, I think the metrics hono.connections could well also be used to compute the total connection time per tenant. Currently this metric is being recorded using type Gauge, scrapped at regular intervals and stored as time series in the Prometheus server. By finding the sum of the number of device connections reported at every scrape for the given time period and multiplying it with the scrape interval should give the total device connection time per tenant.

A sample PromQL query to get the total connection time of a tenant for the last one hour is given below:

sum(sum_over_time(hono_connections_authenticated {tenant="DEFAULT_TENANT"} [1h])) * scraping interval

The applications like resource limit checks that would like to calculate the total connection time should also be aware of the scrapping interval.

In this approach, the level of accuracy depends on the scrapping interval. For example if the scrapping interval is 5 seconds, then a connection established and disconnected within 5 seconds may not be recorded by the metrics hono.connections and so the connection time.

@sophokles73

This comment has been minimized.

Copy link
Member

@sophokles73 sophokles73 commented Nov 20, 2019

I am a little concerned with the fact that the usefulness of this approach is so closely tied to the definition of an appropriate scraping interval. When using a short interval, the accurateness of the measures values might be sufficient. However, I am not sure if 5 seconds is a reasonable interval length in all cases.
Using a longer interval might in fact lead to a connection time of 0. For example when using a 20 second scrape interval, if all devices connect, send data and disconnect every 20 seconds, we might end up in a situation where the devices always connect in between the scrapes, right? FMPOV this example isn't very far fetched. Even if only, say, half of the devices connect in between the scrapes, the resulting accuracy of the measurement won't be sufficient, would it?

@kaniyan

This comment has been minimized.

Copy link
Contributor Author

@kaniyan kaniyan commented Nov 21, 2019

yes as the scraping interval gets longer and quicker the devices connect and disconnect, then less accurate the computed connection time using the metrics hono.connections. I think then it makes sense to make use of a new metrics to record the connection time.

@sophokles73

This comment has been minimized.

Copy link
Member

@sophokles73 sophokles73 commented Nov 22, 2019

I think then it makes sense to make use of a new metrics to record the connection time.

I think so too

@kaniyan

This comment has been minimized.

Copy link
Contributor Author

@kaniyan kaniyan commented Dec 11, 2019

A straightforward solution that comes to my mind is to start a timer immediately after a device successfully connects, stop that timer once that device disconnects and record the duration using metrics of type Summary or Counter. After giving some thought, I think this way is not very useful as the data will be stored in Prometheus only after a device disconnects. In case of MQTT or AMQP, it is possible that the devices stay connected for a long time like weeks, months or even longer. If resource-limit checks are to make use of this data to set a limit based on the device connection time, then this data won't be accurate and may not be quite useful.

As an alternative, how about using a timer per tenant that records periodically lets say every 10 seconds the connection time of all devices under this tenant (i.e no. of devices connected at the start of the timer multiplied by the timer interval will be recorded using a metric type of Counter)? There will be devices that connect or disconnect after the timer is started. I think this can be handled by stopping that timer, recording the connection time and then start the timer to track the connection time based on the change in the number of connected devices. WDYT? Ofcourse other suggestions are surely welcome :)

kaniyan added a commit to bosch-io/hono that referenced this issue Dec 18, 2019
Signed-off-by: Kartheeswaran Kalidass <Kartheeswaran.Kalidass@bosch-si.com>
kaniyan added a commit to bosch-io/hono that referenced this issue Dec 20, 2019
Signed-off-by: Kartheeswaran Kalidass <Kartheeswaran.Kalidass@bosch-si.com>
@sophokles73 sophokles73 moved this from To do to In progress in 1.1.0 Jan 16, 2020
kaniyan added a commit to bosch-io/hono that referenced this issue Jan 20, 2020
Signed-off-by: Kartheeswaran Kalidass <Kartheeswaran.Kalidass@bosch-si.com>
kaniyan added a commit to bosch-io/hono that referenced this issue Jan 21, 2020
Signed-off-by: Kartheeswaran Kalidass <Kartheeswaran.Kalidass@bosch-si.com>
kaniyan added a commit to bosch-io/hono that referenced this issue Jan 21, 2020
Signed-off-by: Kartheeswaran Kalidass <Kartheeswaran.Kalidass@bosch-si.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
1.1.0
  
In progress
3 participants
You can’t perform that action at this time.