Support payload normalization #508

johanstokking · 2022-08-19T17:51:18Z

Summary

References #395

Changes

This adds initial support for normalized payload as discussed in the referenced issue. There's currently only support for temperature, wind speed and wind direction, just to start making things work end-to-end.

The idea is that the schema provides reusable building blocks. ~~This PR includes a "number" with the current value but one may also put statistics of multiple values in there, even any percentile.~~ This is demonstrated with temperature, direction and speed.

Notes for Reviewers

Since JavaScript is loosely typed, the uplink normalizer can return a singular object or an array of objects. Most of the time, there's a single measurement in the data, so I went with this. It might be confusing, however. Also strongly typed runtimes may have to do some work to detect the return type
The uplink normalizer belongs to the decoder. It must be declared in the same file, and examples are declared as part of the decoder. This means that the uplink normalizer is not really a first class codec, but I think it won't be, as it's so obvious that it runs after decoding, unlike other codecs

Looking forward to hearing your thoughts @pablojimpas

johanstokking

CI is only failing because of the old Go version in Actions still

tools/runscript/main.go

pablojimpas · 2022-08-20T18:51:59Z

This PR includes a "number" with the current value but one may also put statistics of multiple values in there, even any percentile.

Are there currently end devices that send this type of aggregated statistics? Our integration for now stores raw time series data and then calculates this kind of statistics as the user wish. However, having these values provided by the end devices themselves will be useful to reduce the number of uplinks without losing too much insight.

Looking forward to hearing your thoughts @pablojimpas

I will check the schema changes in more detail over this weekend and test it with some sample data to see how it looks. I'll let you know if I have further comments, but so far is looking good! Thanks for putting in the work, once we have this first one ready I'm willing to contribute with other types of measurements and writing normalizers for some devices.

johanstokking · 2022-08-22T14:21:28Z

Are there currently end devices that send this type of aggregated statistics?

Short answer is no, not really.

I gave this another thought, including the implications, and I think we shouldn't do this for most dimensions. Where it might make sense is things that change very often, more often than the sensible time to send a raw data point. What comes to mind is distance and light measurements.

johanstokking · 2022-08-22T16:02:45Z

schema.json

+                "pressure": {
+                  "type": "number",
+                  "description": "Atmospheric pressure (hPa)"
+                }


Even though sane values are [900..1100] otherwise we're seeing world records, I think we should limit min/max values to unit validity and allow room for wrong measurements.

The wind cannot come from 470°, the relative humidity cannot be 120%, the wind speed cannot be negative. Likewise, a temperature of -500℃ doesn't exist either.

But.... What happens when something is off with the temperature sensor though? Is it an acceptable consequence to not consider the result normal payload, and not process it as such? Or would we leave an outlier of -280℃ to the application layer to handle?

I gave this another thought; I think it's up to the normalizer function to validate stuff. Naively converting a voltage in a temperature without bounds checking is just wrong. We have errors and warnings output for this as well.

pablojimpas · 2022-08-23T22:16:18Z

Short answer is no, not really.

I gave this another thought, including the implications, and I think we shouldn't do this for most dimensions. Where it might make sense is things that change very often, more often than the sensible time to send a raw data point. What comes to mind is distance and light measurements.

I can speak from my experience with agricultural sensors and I agree with your thought, if the uplink interval is set to, let's say 1 hour, having min, max, avg, std_dev…might be crucial for data analysis. Temperature and humidity, for example, change pretty quickly from one hour to the next one during sunrise or under certain conditions. On the other hand, if the device has an uplink interval of 10 or 15 minutes, statistics during that interval are almost useless for these measurements that don't change too fast, the raw data point is enough in most cases.

The interval at which statistics begin to be useful is different for each measurement/use-case, but it's definitely a nice thing to have as an option. I think it would be good if this “stats aggregation” could become a good practice for device makers in the long term, increasing the uplink interval to improve the scalability of the network.

pablojimpas · 2022-08-23T23:42:50Z

I was doing some testing with mock data to check the validation, and I'm not sure why it doesn't work for me, this website does not find errors in the validation process: https://www.jsonschemavalidator.net/s/OEHRvXwj

johanstokking · 2022-08-24T11:58:15Z

I can speak from my experience with agricultural sensors and I agree with your thought, if the uplink interval is set to, let's say 1 hour, having min, max, avg, std_dev…might be crucial for data analysis. Temperature and humidity, for example, change pretty quickly from one hour to the next one during sunrise or under certain conditions. On the other hand, if the device has an uplink interval of 10 or 15 minutes, statistics during that interval are almost useless for these measurements that don't change too fast, the raw data point is enough in most cases.

Right. How about adding an object for stats then?

{
  "air": {
    "temperature": 21.5,
    "temperatureStats": {
      "min": 21.1,
      "max": 21.9,
      "count": 5,
      "median": 21.6,
      "stdDev": 0.2
    }
  }
}

This way, the sensor can choose what to send.

I was doing some testing with mock data to check the validation, and I'm not sure why it doesn't work for me, this website does not find errors in the validation process: https://www.jsonschemavalidator.net/s/OEHRvXwj

This is because schema.json is full of definitions, and it's context-specific which definition is used.

Here you go: https://www.jsonschemavalidator.net/s/z5oQYMYU (notice oneOf in the bottom)

adriansmares

The tooling changes look good to me.

My only concern with this implementation in which the readings are provided as an array of objects is that IoT Central will not generate dashboards for them. I'm not aware of other platforms with this limitation, but I would not be surprised to find that more of them have this issue.

In my experience telemetry readings are heavily tied to the path to the value (readings.temperature_sensor.value) - and the path is hard to reconcile when arrays are considered - if readings[0].temperature_sensor.value is the temperature of sensor 1, but every 10 uplinks we actually have two readings and the same temperature sensor is found at readings[1].temperature_sensor.value, how do you keep up with this as a consumer ?

The discussion in the issue hinted at a 'source' parameter, but I don't see that one in this schema. I think the source should be the key of a reading, instead of the array index. Consider the Dragino LHT65 - it has two temperature outputs, which in the current schema seem to encode ambiguously.

johanstokking · 2022-08-24T21:12:09Z

My only concern with this implementation in which the readings are provided as an array of objects is that IoT Central will not generate dashboards for them. I'm not aware of other platforms with this limitation, but I would not be surprised to find that more of them have this issue.

Yeah we discussed this at large in #395 (comment) and further.

What I want to do is declare a new application upstream message type with normalized payload (and some metadata). If there are two measurements, AS would publish that message type twice. So consumers would only deal with normalized payload and get each normalized payload individually without having to deal with arrays.

adriansmares · 2022-08-24T21:15:35Z

Yeah we discussed this at large in #395 (comment) and further.

What I want to do is declare a new application upstream message type with normalized payload (and some metadata). If there are two measurements, AS would publish that message type twice. So consumers would only deal with normalized payload and get each normalized payload individually without having to deal with arrays.

Ok, that should work.

lib/payload.json

johanstokking added this to the 2022 Q3 milestone Aug 19, 2022

johanstokking requested a review from adriansmares August 19, 2022 17:51

johanstokking self-assigned this Aug 19, 2022

johanstokking changed the title ~~Feature/normalized payload~~ Support payload normalization Aug 19, 2022

johanstokking commented Aug 19, 2022

View reviewed changes

tools/runscript/main.go Outdated Show resolved Hide resolved

johanstokking force-pushed the feature/normalized-payload branch 2 times, most recently from b36aa36 to 3f7078b Compare August 22, 2022 15:57

johanstokking commented Aug 22, 2022

View reviewed changes

johanstokking marked this pull request as ready for review August 22, 2022 16:03

johanstokking requested a review from Jaime-Trinidad as a code owner August 22, 2022 16:03

johanstokking force-pushed the feature/normalized-payload branch 2 times, most recently from 66a77c8 to 789c055 Compare August 22, 2022 16:11

johanstokking force-pushed the feature/normalized-payload branch from 53ce68c to d2ceff2 Compare August 24, 2022 12:20

adriansmares reviewed Aug 24, 2022

View reviewed changes

pablojimpas suggested changes Aug 29, 2022

View reviewed changes

lib/payload.json Outdated Show resolved Hide resolved

lib/payload.json Show resolved Hide resolved

lib/payload.json Outdated Show resolved Hide resolved

johanstokking added 8 commits August 29, 2022 16:14

Fix link

82a49d3

Fix downlink decoder examples

479ea9c

Update dependencies

f5006d4

Initial support for payload normalization

134ad32

Upgrade to Go 1.18

dbf20ae

Add validation for atmospheric pressure

e47d78f

Extract normalized payload to own schema file

c2b287f

Use percentage for relative humidity

6f897d6

Refer to TTS documentation for codecs

f8ea10e

johanstokking force-pushed the feature/normalized-payload branch from 7d5a6a1 to f8ea10e Compare August 29, 2022 14:14

johanstokking merged commit d810556 into master Aug 29, 2022

johanstokking deleted the feature/normalized-payload branch August 29, 2022 14:44

johanstokking mentioned this pull request Aug 29, 2022

Enforce a standard format for decoded sensor data #395

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support payload normalization #508

Support payload normalization #508

johanstokking commented Aug 19, 2022 •

edited

johanstokking left a comment

pablojimpas commented Aug 20, 2022

johanstokking commented Aug 22, 2022

johanstokking Aug 22, 2022

johanstokking Aug 23, 2022

pablojimpas commented Aug 23, 2022 •

edited

pablojimpas commented Aug 23, 2022

johanstokking commented Aug 24, 2022 •

edited

adriansmares left a comment •

edited

johanstokking commented Aug 24, 2022

adriansmares commented Aug 24, 2022

Support payload normalization #508

Support payload normalization #508

Conversation

johanstokking commented Aug 19, 2022 • edited

Summary

Changes

Notes for Reviewers

johanstokking left a comment

Choose a reason for hiding this comment

pablojimpas commented Aug 20, 2022

johanstokking commented Aug 22, 2022

johanstokking Aug 22, 2022

Choose a reason for hiding this comment

johanstokking Aug 23, 2022

Choose a reason for hiding this comment

pablojimpas commented Aug 23, 2022 • edited

pablojimpas commented Aug 23, 2022

johanstokking commented Aug 24, 2022 • edited

adriansmares left a comment • edited

Choose a reason for hiding this comment

johanstokking commented Aug 24, 2022

adriansmares commented Aug 24, 2022

johanstokking commented Aug 19, 2022 •

edited

pablojimpas commented Aug 23, 2022 •

edited

johanstokking commented Aug 24, 2022 •

edited

adriansmares left a comment •

edited