New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypercore Header DEP #34

Merged
merged 7 commits into from Jul 6, 2018

Conversation

Projects
None yet
5 participants
@pfrazee
Member

pfrazee commented Jun 26, 2018

Standardizes a HypercoreHeader message which can be used as the first entry in hypercores to declare the datastructure along with custom fields.

@RangerMauve

I really like the MyCustomHeader example! Didn't know you could do something like that. :D

@bnewbold

I still kind of wish we could deal with nested schemas like: hyperdb-v2/hyperdrive-v4, but I think the discussion consensus was to just keep it simple.

I'm really excited that this came out backwards compatible with existing
hyperdrive!

I don't remember that much about the wire representation of protobuf, but we
will implicitly still be trying to parse a lot of random bytestrings to see if
they "match" this schema. Should we mention that folks should specifically
avoid having a protobuf message that has a string field as the first entry,
because that could accidentally get interpreted as a "type" string? Do we need
to set a sane max version length (1024 bytes?) to prevent gnarly/weird parsing?

# Rationale and alternatives
[alternatives]: #alternatives
This standard is backwards compatible with v1 of hyperdrive, which used an `Index` message as the first entry in the metadata hypercore.

This comment has been minimized.

@bnewbold

bnewbold Jun 27, 2018

Contributor

Oh, :party:!

}
```
Data structures should not add additional fields to the `HypercoreHeader`. They should put any additional data into the `extension`. This is easy to do with protobuf's nested messages. For example:

This comment has been minimized.

@bnewbold

bnewbold Jun 27, 2018

Contributor

Should implementations fail fast on additional fields, or should they ignore additional fields? The later is future-compatible, the former is not.

This comment has been minimized.

@pfrazee

pfrazee Jun 27, 2018

Member

Ignore, probably. I'll add a note.

}
```
The `type` string should be unique to the data structure (refer to existing DEPs to avoid conflicts). When a breaking change is made to the data-structure, it should be given a new `type` string. For example, the `"hyperdrive"` type string will be updated to `"hyperdrive-v2"` for its second version.

This comment has been minimized.

@bnewbold

bnewbold Jun 27, 2018

Contributor

Does this imply that the - character is special or reserved? Should successor versions use the -vNNN syntax?
To clarify my bikeshedding, I think encoding the version as an unstructured string is a reasonable choice, but I think a default delimiter between "schema" and "version/variant" should be chosen.

Some common aesthetically pleasing separators would be -, /, ., _. How do you read one-two_three-four? As two identifiers (one-two followed by three-four), or three (one, two_three, four)?

To propose something concrete, lets say dash (-) is the separator, and identifiers should have dashes in them. Let's also say that the version is optional but recommended ("hyperdrive-v0"), and that the x- or X- prefixes are interpreted as "experimental" (aka, no DEP exists yet).

This comment has been minimized.

@pfrazee

pfrazee Jun 27, 2018

Member

I'm okay with some conventions but I'm not sure I want to standardize a syntax. Just not sure it's worth it. Reason being:

Let's also say that the version is optional but recommended ("hyperdrive-v0"), and that the x- or X- prefixes are interpreted as "experimental" (aka, no DEP exists yet)

This is totally reasonable, but you then need to interpret "hyperdrive-v0" == "hyperdrive", and I just don't know if we need that level of detail for this.

A program that is reading a hypercore will examine the `type` to determine how to process the hypercore.
```js

This comment has been minimized.

@bnewbold

bnewbold Jun 27, 2018

Contributor

I have a soft preference for using pseudo code rather than literal javascript in the DEPs. In additional to being more implementation-agnostic, an argument against using actual code is that it comes with an implication that this exact API is stable, or that this exact code is the canonical code that people should always copy/paste into their applications even years down the line.

This comment has been minimized.

@pfrazee

pfrazee Jun 27, 2018

Member

Okay I'll rewrite with that in mind.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jun 27, 2018

I still kind of wish we could deal with nested schemas

I'm not against revisiting this. It should pretty much always be layered (one structure on top of another) so all we need is a standard separator. The downside is that it adds some complexity (multiple types emitted instead of just 1).

I don't remember that much about the wire representation of protobuf, but we
will implicitly still be trying to parse a lot of random bytestrings to see if
they "match" this schema.

Hmm, yeah that's true.

Should we mention that folks should specifically avoid having a protobuf message that has a string field as the first entry, because that could accidentally get interpreted as a "type" string?

Tricky. If it ends up possible that non-header messages have strings at tag 1, then we have to interpret an unknown type string in the same way we interpret no header at all. We could add some magic numbers to the schema, except that wouldn't be backwards compatible. Might not be a huge deal.

Do we need to set a sane max version length (1024 bytes?) to prevent gnarly/weird parsing?

Is there a way to put a max string length on protobuf fields? @mafintosh

```protobuf
message HypercoreHeader {
required string type = 1;

This comment has been minimized.

@pfrazee

pfrazee Jul 2, 2018

Member

I'm starting to wonder if we should call this dataStructure instead of type, because "type" is a really generic word that could be used elsewhere in the stack while "data structure" is very specific to what this is.

This comment has been minimized.

@mafintosh

mafintosh Jul 6, 2018

Noting that this can be updated without any backwards compat issues because of protobuf

This comment has been minimized.

@bnewbold

bnewbold Jul 6, 2018

Contributor

type is very generic. I'd bikeshed to contentType or contentSchema, but loosely held opinion (dataStructure works for me).

A program that is reading a hypercore will examine the `type` to determine how to process the hypercore.
```

This comment has been minimized.

@yoshuawuyts

yoshuawuyts Jul 2, 2018

Contributor

file type here might be helpful to read the rendered version

This comment has been minimized.

@pfrazee

pfrazee Jul 2, 2018

Member

We intentionally are trying to write pseudo code

At time of writing, Dat's standard data structures are being expanded upon (hyperdb) while existing data structures are being updated with breaking changes (hyperdrive). In order for these changes to be deployed smoothly to the network, it's important that the following needs be met:
1. Programs should correctly identify the data structure type and version within a hypercore. This will enable the program to use the correct handling code (eg hyperdb, hyperdrive-v1, hyperdrive-v2).

This comment has been minimized.

@yoshuawuyts

yoshuawuyts Jul 2, 2018

Contributor

the example of hyperdrive-v1 is somewhat at odds with the example of hyperdrive throughout the rest of the article. Might be best to pick one and stick with it?

This comment has been minimized.

@pfrazee

pfrazee Jul 2, 2018

Member

Agree

This comment has been minimized.

@mafintosh

mafintosh Jul 6, 2018

Agreed that we should change it to hyperdrive

@yoshuawuyts

This comment has been minimized.

Contributor

yoshuawuyts commented Jul 2, 2018

Are the proposed version names for hyperdrive / hyperdb also part of the DEP, or do they only serve as an example? If they only serve as an example, it might be good to make it extra clear that the names aren't being specified in this DEP.

edit: typo

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jul 2, 2018

Are the proposed version names for hyperdrive / hyperdb also part of the DEP, or do they only serve as an example? If they only serve as an example, it might be good to make it extra clear that the names aren't being specified in this DEP.

Agree. They're just examples

@pfrazee pfrazee referenced this pull request Jul 6, 2018

Closed

Upcoming Meeting Agenda - 4th July 2018 #25

0 of 6 tasks complete
@bnewbold

This comment has been minimized.

Contributor

bnewbold commented Jul 6, 2018

To summarize the current state of this in my eyes:

  • needs a couple tweaks to naming and examples in the DEP text, but I would approve to merge trusting that Paul will make the changes first
  • I'd like to move forward, but with the expectation that we should code this up ("working code") and make sure that it allows beaker/dat to disambiguate and handle both old and new hyperdrive before 100% commiting to this scheme (aka, move to Draft ASAP, also prove this out in code to get to Active status ASAP, as getting this correct for the long-term is crucial for the ecosystem)
@pfrazee

This comment has been minimized.

Member

pfrazee commented Jul 6, 2018

Just fixed that "hypercore-v1" thing and added a note that the strings will be in the data structures' DEPs.

I renamed type to dataStructureType. I'm open to discussion on this. I have some code that's trying to handle dats generally in preparation for more types. In it, I referred to each general item as a "dat" and the instance of hypercore/hyperdrive/hyperdb as that dat's "data structure". So:

var dat = {
  key: '...',
  dataStructureType: 'hyperdrive',
  dataStructure: hyperdrive(...)
}

The word dataStructure is a little clunky but it does seem to work alright.

@pfrazee pfrazee merged commit 61681ac into datprotocol:master Jul 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment