Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support payload normalization #508

Merged
merged 9 commits into from Aug 29, 2022
Merged

Conversation

johanstokking
Copy link
Member

@johanstokking johanstokking commented Aug 19, 2022

Summary

References #395

Changes

This adds initial support for normalized payload as discussed in the referenced issue. There's currently only support for temperature, wind speed and wind direction, just to start making things work end-to-end.

The idea is that the schema provides reusable building blocks. This PR includes a "number" with the current value but one may also put statistics of multiple values in there, even any percentile. This is demonstrated with temperature, direction and speed.

Notes for Reviewers

  • Since JavaScript is loosely typed, the uplink normalizer can return a singular object or an array of objects. Most of the time, there's a single measurement in the data, so I went with this. It might be confusing, however. Also strongly typed runtimes may have to do some work to detect the return type
  • The uplink normalizer belongs to the decoder. It must be declared in the same file, and examples are declared as part of the decoder. This means that the uplink normalizer is not really a first class codec, but I think it won't be, as it's so obvious that it runs after decoding, unlike other codecs

Looking forward to hearing your thoughts @pablojimpas

@johanstokking johanstokking added this to the 2022 Q3 milestone Aug 19, 2022
@johanstokking johanstokking self-assigned this Aug 19, 2022
@johanstokking johanstokking changed the title Feature/normalized payload Support payload normalization Aug 19, 2022
Copy link
Member Author

@johanstokking johanstokking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is only failing because of the old Go version in Actions still

tools/runscript/main.go Outdated Show resolved Hide resolved
@pablojimpas
Copy link
Contributor

This PR includes a "number" with the current value but one may also put statistics of multiple values in there, even any percentile.

Are there currently end devices that send this type of aggregated statistics? Our integration for now stores raw time series data and then calculates this kind of statistics as the user wish. However, having these values provided by the end devices themselves will be useful to reduce the number of uplinks without losing too much insight.

Looking forward to hearing your thoughts @pablojimpas

I will check the schema changes in more detail over this weekend and test it with some sample data to see how it looks. I'll let you know if I have further comments, but so far is looking good! Thanks for putting in the work, once we have this first one ready I'm willing to contribute with other types of measurements and writing normalizers for some devices.

@johanstokking
Copy link
Member Author

Are there currently end devices that send this type of aggregated statistics?

Short answer is no, not really.

I gave this another thought, including the implications, and I think we shouldn't do this for most dimensions. Where it might make sense is things that change very often, more often than the sensible time to send a raw data point. What comes to mind is distance and light measurements.

@johanstokking johanstokking force-pushed the feature/normalized-payload branch 2 times, most recently from b36aa36 to 3f7078b Compare August 22, 2022 15:57
schema.json Outdated
Comment on lines 272 to 277
"pressure": {
"type": "number",
"description": "Atmospheric pressure (hPa)"
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though sane values are [900..1100] otherwise we're seeing world records, I think we should limit min/max values to unit validity and allow room for wrong measurements.

The wind cannot come from 470°, the relative humidity cannot be 120%, the wind speed cannot be negative. Likewise, a temperature of -500℃ doesn't exist either.

But.... What happens when something is off with the temperature sensor though? Is it an acceptable consequence to not consider the result normal payload, and not process it as such? Or would we leave an outlier of -280℃ to the application layer to handle?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this another thought; I think it's up to the normalizer function to validate stuff. Naively converting a voltage in a temperature without bounds checking is just wrong. We have errors and warnings output for this as well.

@johanstokking johanstokking marked this pull request as ready for review August 22, 2022 16:03
@johanstokking johanstokking force-pushed the feature/normalized-payload branch 2 times, most recently from 66a77c8 to 789c055 Compare August 22, 2022 16:11
@pablojimpas
Copy link
Contributor

pablojimpas commented Aug 23, 2022

Short answer is no, not really.

I gave this another thought, including the implications, and I think we shouldn't do this for most dimensions. Where it might make sense is things that change very often, more often than the sensible time to send a raw data point. What comes to mind is distance and light measurements.

I can speak from my experience with agricultural sensors and I agree with your thought, if the uplink interval is set to, let's say 1 hour, having min, max, avg, std_dev…might be crucial for data analysis. Temperature and humidity, for example, change pretty quickly from one hour to the next one during sunrise or under certain conditions. On the other hand, if the device has an uplink interval of 10 or 15 minutes, statistics during that interval are almost useless for these measurements that don't change too fast, the raw data point is enough in most cases.

The interval at which statistics begin to be useful is different for each measurement/use-case, but it's definitely a nice thing to have as an option. I think it would be good if this “stats aggregation” could become a good practice for device makers in the long term, increasing the uplink interval to improve the scalability of the network.

@pablojimpas
Copy link
Contributor

I was doing some testing with mock data to check the validation, and I'm not sure why it doesn't work for me, this website does not find errors in the validation process: https://www.jsonschemavalidator.net/s/OEHRvXwj

@johanstokking
Copy link
Member Author

johanstokking commented Aug 24, 2022

I can speak from my experience with agricultural sensors and I agree with your thought, if the uplink interval is set to, let's say 1 hour, having min, max, avg, std_dev…might be crucial for data analysis. Temperature and humidity, for example, change pretty quickly from one hour to the next one during sunrise or under certain conditions. On the other hand, if the device has an uplink interval of 10 or 15 minutes, statistics during that interval are almost useless for these measurements that don't change too fast, the raw data point is enough in most cases.

Right. How about adding an object for stats then?

{
  "air": {
    "temperature": 21.5,
    "temperatureStats": {
      "min": 21.1,
      "max": 21.9,
      "count": 5,
      "median": 21.6,
      "stdDev": 0.2
    }
  }
}

This way, the sensor can choose what to send.

I was doing some testing with mock data to check the validation, and I'm not sure why it doesn't work for me, this website does not find errors in the validation process: https://www.jsonschemavalidator.net/s/OEHRvXwj

This is because schema.json is full of definitions, and it's context-specific which definition is used.

Here you go: https://www.jsonschemavalidator.net/s/z5oQYMYU (notice oneOf in the bottom)

Copy link
Contributor

@adriansmares adriansmares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tooling changes look good to me.

My only concern with this implementation in which the readings are provided as an array of objects is that IoT Central will not generate dashboards for them. I'm not aware of other platforms with this limitation, but I would not be surprised to find that more of them have this issue.

In my experience telemetry readings are heavily tied to the path to the value (readings.temperature_sensor.value) - and the path is hard to reconcile when arrays are considered - if readings[0].temperature_sensor.value is the temperature of sensor 1, but every 10 uplinks we actually have two readings and the same temperature sensor is found at readings[1].temperature_sensor.value, how do you keep up with this as a consumer ?

The discussion in the issue hinted at a 'source' parameter, but I don't see that one in this schema. I think the source should be the key of a reading, instead of the array index. Consider the Dragino LHT65 - it has two temperature outputs, which in the current schema seem to encode ambiguously.

@johanstokking
Copy link
Member Author

My only concern with this implementation in which the readings are provided as an array of objects is that IoT Central will not generate dashboards for them. I'm not aware of other platforms with this limitation, but I would not be surprised to find that more of them have this issue.

Yeah we discussed this at large in #395 (comment) and further.

What I want to do is declare a new application upstream message type with normalized payload (and some metadata). If there are two measurements, AS would publish that message type twice. So consumers would only deal with normalized payload and get each normalized payload individually without having to deal with arrays.

@adriansmares
Copy link
Contributor

Yeah we discussed this at large in #395 (comment) and further.

What I want to do is declare a new application upstream message type with normalized payload (and some metadata). If there are two measurements, AS would publish that message type twice. So consumers would only deal with normalized payload and get each normalized payload individually without having to deal with arrays.

Ok, that should work.

lib/payload.json Outdated Show resolved Hide resolved
lib/payload.json Show resolved Hide resolved
lib/payload.json Outdated Show resolved Hide resolved
@johanstokking johanstokking merged commit d810556 into master Aug 29, 2022
@johanstokking johanstokking deleted the feature/normalized-payload branch August 29, 2022 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants