Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to reduce the verbosity of the datamodel #172

Closed
faceless2 opened this issue Feb 15, 2016 · 5 comments
Closed

Proposal to reduce the verbosity of the datamodel #172

faceless2 opened this issue Feb 15, 2016 · 5 comments

Comments

@faceless2
Copy link

The model rightly stores timestamps and sources for most of the data fields, as this is useful for locally generated data and required when that data may come from multiple sources. For instance, instead of storing "navigation.headingTrue" as a single value we store it as

"vessels": {
  "123123123"
    "navigation": {
      "headingTrue": {
        "value" 123,
        "$source": "someurn",
        "timestamp": "2016-02-02 12:34:56"
      }
    }
  }
}

However, this can get pretty wordy. In particular, the model makes no allowance for data received from other vessels over AIS, which will all come from the same source with the same timestamp.

I'd like propose two optimizations.

First, allow the source and timestamp to be inherited from the parent node in the model. So in the snippet above, if the consumer needed to identify the source for the "vessels.123123123.navigation.headingTrue" data field, it would check "vessels.123123123.navigation.headingTrue.$source". If it wasn't found, it would go on to check "vessels.123123123.navigation.$source" and "vessels.123123123.$source" (and, theoretically, on up the tree) until it was found.

Second, this change allows us to make a further optimization. If a field has a single value and will always have a single value, it's value can be stored directly instead of in the "value" property.

For example, instead of the above you could have:

"vessels": {
  "123123123"
    "$source": ref,
    "timestamp": "2016-02-02 12:34:56"
    "navigation": {
      "headingTrue": 123
    }
  }
}

I'm not trying to do away with the "value" and "values" fields, which I think are important to manage multiple values. But there are many situations in the model where they are overkill, redundant, redundant and redundant. Allowing the source and timestamp to be "promoted" up the tree like this allows you to store the same data in less space with no loss of information. This would have consequent knock-on improvements for the delta and full formats on the wire as well.

To underline my point, here's a hypothetical excerpt from the current model which might represent the data received over AIS from another vessel:

{
    "mmsi": "123456789",
    "name": "Ship name",
    "navigation: {
        "position": {
            "latitude": 12,
            "longitude: 123,
            "$source": "ais",
            "timestamp": "2016-02-15 13:01:00",
        },
        "headingTrue": {
            "value": 25
            "$source": "ais",
            "timestamp": "2016-02-15 13:01:00",
        },
        "courseOverGround": {
            "value": 25
            "$source": "ais",
            "timestamp": "2016-02-15 13:01:00",
        },
        "speedOverGround": {
            "value": 4
            "$source": "ais",
            "timestamp": "2016-02-15 13:01:00",
        },
        "state": {
            "value": "motoring",
            "$source": "ais",
            "timestamp": "2016-02-15 13:01:00",
        },
        "rateOfTurn": {
            "value": 5
            "$source": "ais",
            "timestamp": "2016-02-15 13:01:00",
        },
        "destination": {
            "name": "Ulan Bator",
            "eta": "2099-01-01 23:59:59",
            "$source": "ais",
            "timestamp": "2016-02-15 13:01:00",
        },
    },
    "communication": {
        "aisclass": "a",
        "callsignvhf": "callsignhere"
    },
    "registrations": {
        "imo": "imohere"
    },
    "design": {
        "draft": {
            "minimum": 10,
            "maximum": 10,
            "$source": "ais",
            "timestamp": "2016-02-15 13:01:00",
        },
        "length": {
            "overall": 30,
            "$source": "ais",
            "timestamp": "2016-02-15 13:01:00",
        }
    }
}

And here's exactly the same data stored with my proposed change.

{
    "mmsi": "123456789",
    "name": "Ship name",
    "$source": ais",
    "timestamp": "2016-02-15 13:01:00",
    "navigation: {
        "position": {
            "latitude": 12,
            "longitude: 123,
        },
        "headingTrue": 25,
        "courseOverGround": 25,
        "speedOverGround": 4,
        "state": "motoring",
        "rateOfTurn": 5
        "destination": {
            "name": "Ulan Bator",
            "eta": "2099-01-01 23:59:59",
        },
    },
    "communication": {
        "aisclass": "a",
        "callsignvhf": "callsignhere"
    },
    "registrations": {
        "imo": "imohere"
    },
    "design": {
        "draft": {
            "minimum": 10,
            "maximum": 10,
        },
        "length": {
            "overall": 30,
        }
    }
}

The benefits here aren't just limited to AIS-sourced data. The current model requires me to store source and timestamp for many fields which may be static - draft and length in the above example, which (on many boats) will never change. I could set a default "$source" in the vessel map, and override it where I have more specific data; for example:

{
    "$source": "static",
    "timestamp": "1970-01-01 00:00:00",
    "navigation: {
        "position": {
            "latitude": 123,
            "longitude: 123.
            "$source": "gpsurn",
            "timestamp": "2016-02-15 13:01:00",
        },
    },
    "design": {
        "draft": {
            "minimum": 10,
            "maximum": 10,
        },
        "length": {
            "overall": 30,
        }
    }
}

The source for the position is specified, but the source for the values under draft and length is inherited.

Consumers would need only minimal changes to handle this - pseudo-code to retrieve the value would be:

var value = vessel.get("navigation.headingTrue");
if (value instanceof Map) {
    value = value.get("value");
}
@sumps
Copy link
Contributor

sumps commented Feb 15, 2016

There are some good points in here but, please have a read of this closed issue which covers many of the same points... #92

@sumps
Copy link
Contributor

sumps commented Feb 15, 2016

It would also be good to know what you think of data timeouts an invalid date in deltas issues #93 and #94

@tkurki
Copy link
Member

tkurki commented Feb 15, 2016

I understand your concern. However you are making a few assumptions that may not be true.

You are referring to the data model and HTTP api response as a single JSON document implemented one to one in a server's memory. This is mostly true for the current version of Node server, but I think (haven't checked) for example timestamps are actually shared - it's the same String in memory, stemming from the incoming delta message.

As far as I know Java server stores the whole model as a key value list and the single JSON response is generated on the fly when requested. In fact I've been thinking of the same thing as an optimization - less work to just keep a single "latest deltas" map, especially with multiple values.

Even for AIS data you still get the static data and the dynamic data with different intervals & timestamps.

Furthermore the data model acts also as a REST type HTTP api spec: you can request a single data item like http://demo.signalk.org/signalk/v1/api/vessels/urn:mrn:signalk:uuid:c0d79334-4e25-4245-8892-54e8ccc8021d/navigation/speedThroughWater. So sometimes the timestamp & source should be there and sometimes not.

The logic for sharing timestamps in the hierarchy is also complex. First you have the timestamp under navigation for speedThroughWater, then somebody turns on your MFD: you start getting navigation.position and the same timestamp must be pushed down to speedThroughWater.

If you feel strongly that this is something that really, really needs to be addressed how about making the behavior where timestamps are shared optional? The API user can request whichever. Or add a query parameter to suppress sources and/or timestamps alltogether.

I also place a lot of value in the consistency in the data model. For somebody accessing the data your proposal is definitely not the principle of least surprise.

One more and in fact a major point: if you count the bytes a Signal K serves over its lifetime most of them will be from the streaming api (delta messages), not REST.

@faceless2
Copy link
Author

@tkurki - several fair points there. In fact that was such a complete answer I feel like it might have come up before!

Yes, I am assuming there's a single model as that's how I've implemented it, but that's not really what's driving this suggestion, which was a result of what I saw as a lot of duplicated information in the model. It wasn't so much the storage, it was the additional time taken to traverse and update the model as a result of these fields. However I appreciate that it's there because you're viewing the model as a canonical data store, rather than a JSON object to be sent over the wire verbatim (I have a feeling our respective implementations take a slightly different view on this point, which personally don't see as a bad thing. I'll touch on this in #94)

And yes, it does introduce some complexity WRT the timestamps, as you point out, and I hadn't fully considered this.

Worth floating the idea, but I'm happy to close this one then. I agree a better solution, if one is required, is to define a query format that will present this sort of view on the model rather than adjusting the model itself.

@sumps will pitch in on those one I've had a better read through them

@tkurki
Copy link
Member

tkurki commented Feb 16, 2016

Yep, we should have articulated ages ago the different viewpoints:

  • canonical data model
  • a single JSON document with schema
  • REST api
  • delta streaming api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants