New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application data schemas & how to manage decentralized development #820

Open
pfrazee opened this Issue Jan 14, 2018 · 67 comments

Comments

Projects
None yet
@pfrazee
Member

pfrazee commented Jan 14, 2018

Yesterday, @0x0ade and @neauiore brought up how to deal with competing schema features from multiple applications.

I and @neauoire wanted to implement special event posts in Rotonde. Those would end up cluttering Fritter, which made us came up with a way how to set a post's visibility. Here's our idea, awaiting feedback :)

They proposed a small spec for Fritter and Rotonde interop:

  • Add client field to posts, associating the post with a client identifier. F.e. "rotonde", "fritter".
    • For past messages not containing a "client" field, assume that the client is matching.
    • The client can store its version in the ID, but it's the client's responsibility to handle it. F.e. ClientA will handle "clientb-1.0.0" and "clientb-1.0.1" as two completely separate clients. Likewise, ClientB will handle "clienta:1337" and "clienta:1338" as two separate clients. ClientA will read the version number from clienta: clients and ClientB will read the version number from clientb- IDs, though
  • Add optional visibility field, allowing the strings "public" (default if none given), "whisper" or "client"
    • "public": The post will be rendered in your feed.
    • "whisper": The post will only be rendered in your feed if you're the author of the post, or if your archive URL is listed in the array stored under target.
      • Note: "private" would be a lie, as the post itself is publicly shared, just not publicly rendered.
    • "client": The client automatically generated this post, f.e. "followed Person". The post will only be rendered by a compatible client. The client will gather the information as for how (or if) to render the post from any non-standard properties (f.e. action: "follow"). Clients will handle future non-standard vs standard property conflicts on their own, on a case-by-case basis.

I tweeted a bit about the challenge of decentralized development. This has definitely been on my mind.

I want to use this issue to discuss broader approaches. We can use details of @0x0ade's proposal as an example; it's very helpful in that regard!

@pfrazee pfrazee added the discussion label Jan 14, 2018

@neauoire

This comment has been minimized.

neauoire commented Jan 14, 2018

This came about when we were thinking of adding new types of messages. There is a new feed entry type that would be something along the lines of X started following Y; we started to think about platform specific entries and how to indicate to the other platforms how to parse these uncommon entries.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jan 14, 2018

This issue is exactly the problem we encountered when we first worked on SSB/Patchwork. There wasn't a sufficient framework for making changes that wouldn't accidentally introduce noise into the system. This manifests with a lot of different questions:

  • How do I add new features and stay compatible with other apps?
  • How do I add new features and not disrupt other apps?
  • How can I predict the effects of my features on the larger network?
  • Others I'm sure

My first thought is that we need to switch over to JSON-LD. For those not familiar, expand these details to learn a bit more:

What JSON-LD does is, it adds a global specificity to schemas. Each key in an object is a URL. For example:
{
  "@context": "http://schema.org/",
  "@type": "Person",
  "name": "Jane Doe",
  "jobTitle": "Professor",
  "telephone": "(425) 123-4567",
  "url": "http://www.janedoe.com"
}

This "expands" to mean:

{
  "@type": "http://schema.org/Person",
  "http://schema.org/jobTitle": "Professor",
  "http://schema.org/name": "Jane Doe",
  "http://schema.org/telephone": "(425) 123-4567",
  "http://schema.org/url": "http://www.janedoe.com"
}

So, not only is that helpful for removing ambiguity and adding documentation, but it also provides a mechanism for adding new attributes without creating ambiguity. For instance:

{
  "@context": ["http://schema.org", {
    "ical": "http://www.w3.org/2002/12/cal/ical#"
  }],
  "@type": "Event",
  "name": "Lady Gaga: Live!",
  "ical:summary": "Lady Gaga Concert",
  "ical:location": "New Orleans Arena, New Orleans, Louisiana, USA",
  "ical:dtstart": "2011-04-09T20:00Z"
}

In that case, we're able to add "ical" attributes to a "schema.org" schema.

This should stop new features from disrupting each other. Basically, if you have a usage that differs from the documented schema usage, you'll need to add your own attribute(s).

There's some helpful documentation at https://json-ld.org/ and https://json-ld.org/spec/latest/json-ld-api-best-practices/

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jan 14, 2018

In my opinion, there are two downsides to using JSON-LD:

  1. It requires some javascript to deal with the contexts (namespaces).
  2. You need to publish a spec for any context you create.

For 1, I'd like to see a Javascript library which helps you normalize the schemas into an expected form. That shouldn't be too hard to get. I'm still evaluating what's available.

For 2, seeing as we have a toolset to help with Web publishing, one option is to use dat. You'll almost certainly want a shortname for your spec though, and it's more overhead for development than I'd prefer, because you become obligated to maintain the spec. I'm not sure that's a fun way to do development. I personally tend to prototype quite a bit before I'm ready to publish a spec, and I'd be annoyed if I had to spend mental energy to manage namespace URLs.

One thing @taravancil and I have been talking about is a non-universal, search-based URL scheme called the "thing" scheme. Expand details for more on the thing scheme:

Basically something like thing://Whatever_I_Want_to_Write_Here. The idea is that the browser would handle that URL by opening the configured search app (google even) with the query "Whatever I Want to Write Here." The thing url wouldn't refer to a specific resource; it'd refer to a search.

For namespaces, the idea is that you'd use a thing url, and then publish your schema with that url. Those schemas would then be "conventionally" unique instead of "universally" unique. That is, it wouldn't be guaranteed to be unique, but the url would be unique enough that it's unlikely to collide by accident. The search app would then have to help us figure out the right hit based on network signals and trust.

Example:

{
  "@context": "thing://paul_frazees_fritter/schema",
  "@type": "Taco",
  "category": "breakfast",
  "ingredients": ["eggs", "bacon"]
}

Eventually, I'd publish a document describing the schemas under thing://paul_frazees_fritter/schema and so if somebody wanted to find the docs for my attributes, they could.

It may be a really bad idea in the end, but I like the usability, and I think it's worth considering.

@neauoire

This comment has been minimized.

neauoire commented Jan 14, 2018

I've had to work with JSON-LD in the past and it quickly becomes hard to maintain as the overhead in the code is pretty high. And the barrier of entry to make new clients becomes also suddenly very high. If it's a library, then that means everyone have to carry on this extra bit of code and the network pays for it.

I might not fully understand the ramifications of this but I was thinking, maybe we could just have a sort of "consortium for communication standards" and just host public conversations on the syntax of new patterns could be implemented? In line with how RSS was at first?

I really wish for this to remain as lean as possible. We're basically just building RSS readers for communication feeds. Where/how did it break with Scuttlebutt, I don't want to find ourselves doing the same mistakes.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jan 14, 2018

@neauoire I generally agree with the sentiment that JSON-LD adds more overhead than I prefer. My original idea was to just duck-type JSON schemas. That might work. It's very convention focused and I think that's actually good for our community -- dev UX is more important to me than "doing it right," so long as we don't do things SO WRONG that things fall apart.

Like I said in my tweet, developers have a kind of obligation to the users not to f*** up their experiences. I think that's an interesting way to pose the situation.

A central place to discuss the schemas is kind of centralizing but 🤷‍♂️ it's informal enough that I'm not too bothered by that. There's no friction to the community doing something else.

JSON-LD is one of many thoughts I have. (This issue is basically now my alternative to the blogpost I had brewing.) I'll follow up with some other ideas.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jan 14, 2018

Client identifiers are an interesting idea. I think they might be too opaque, maybe? Because there's no meaning to them, even with versions, unless you look them up.

It might make more sense to use feature identifiers, which is basically an informal way to do JSON-LD contexts. For instance, suppose we declare which features are being used in the post, and then declare which of them are required for the post to be usable:

{
  "uses": ["fritter-social-feed", "rotonde-whispers"],
  "requires": ["rotonde-whispers"], // if you dont support this, dont render the post
  "text": "Hi, bob!", // from 'fritter-social-feed'
  "visibility": "dat://bob.com" // from 'rotonde-whispers'
}
@neauoire

This comment has been minimized.

neauoire commented Jan 14, 2018

developers have a kind of obligation to the users not to f*** up their experiences

Amen to that.

Well, let's start with a few concrete examples then. How are you planning on handling whisper type messages. That's a good scenario here where, our direct messages are visible when looked at from fritter. How would you like for us to express that this message should not be made public by default:

  • We both implement type=whisper target='dat://' as direct messages and not make them visible.
  • Refactor our architectures to handle JSON-LD schemes.
  • If fritter is not planning on having direct messages, then we can keep this platform centric with a visibility tag. I prefer the target="" approach.
@pfrazee

This comment has been minimized.

Member

pfrazee commented Jan 14, 2018

@neauoire right, adding JSON-LD doesn't automatically solve the question of "how to handle incompatibility." For instance, if we use JSON-LD and you add a Rotonde namespace with support for target, you really need other clients not to render it if they don't support that feature.

We can solve that in this case by coming to agreement about the specs that Fritter and Rotonde use, but is there a more general solution to this variety of problem? That's what I'm trying to address on with the requires idea.

@csarven

This comment has been minimized.

csarven commented Jan 15, 2018

I think you're on the right track.. just some high-level suggestions:

The general rule of thumb is to re-use existing vocabs if/where applicable, and then define/publish new terms for the remainder. Needless to say, publishing/maintaining new stuff is expensive. A nice thing to do is to also make relations to similar terms out there. That could be an alias, a specialisation or a generalisation and so forth. See briefly: https://www.w3.org/TR/dwbp/#dataVocabularies

I think both Rotonde and Fritter should consider starting off with the ActivityStreams 2 vocabulary: https://www.w3.org/TR/activitystreams-vocabulary/ and see how much mileage that gives. Expand to include other vocabularies to solve your respective problems eg. relatively speaking, use a general purpose vocab like schema.org to fill in the gaps. You can always publish your own terms - anyone can say anything about anything on the Web applies. If there is information that both of you would like to express but there are no existing terms for it out there, go ahead and define it. If you want to take it up a notch, approach other applications out there and see if there are terms you both agree can be covered under a common/shared/public namespace; for instance via https://w3id.org/ ( https://github.com/perma-id/w3id.org ). There really is a lot of stuff out there you can borrow from. So, you may want to look things up at http://lov.okfn.org/dataset/lov/ to get a feel of what's out there and borrow ideas from (or chase down some reasoning behind the vocab designs).

I personally wouldn't approach the problem here with "requires" client. The payload's description with shared or resolvable vocabs in place is expressive enough to signal to the consumer whether they want to use it and how. If you still want to communicate that client info, you may want to consider it from the other direction, for example along the lines of, payload generated or rendered by client ( https://www.w3.org/TR/annotation-vocab/#renderedvia , https://www.w3.org/TR/prov-o/#wasGeneratedBy ... ).

There is a lot more to all this then I can type out here in one go, so I hope this is of some use.

@kevinmarks

This comment has been minimized.

kevinmarks commented Jan 15, 2018

Fritter's use case is very close to what Activity Streams was designed around, so if you're committed to going with JSON-LD, adopting that makes sense. The field names have been thought through and based on converging multiple social networks over time - the name/summary/content split is a very useful pattern that makes things clearer, and twitter's lack of that has led them into unfortunately convoluted methods.

There are, as you say, deeper problems with having variable support for vocabulary and interpretation that namespaces don't really solve, especially if you want forward compatibility. While you can just render arbitrary JSON, it needs interpretation of the fields to be shown usefully - a Fritter post's threadRoot, threadParent and createdAt are all very opaque when presented directly.

Although JSON aligns well with data structures, it is lacking the underlying distinction that HTML preserves between textual data for presentation to users and explanatory structure and metadata for apps and parsers. Having a well defined default behaviour for unknown elements (show the text contents) and attributes (ignore them) makes these kind of extension and interop easier.

Another reinvention of this approach is showing up in the static site generators, which combine key/value front matter with markdown body text as a way to split the two.

In practice, any successful format is going to have heterogenous implementations and support, with overlapping interpretations of the data. The microformats approach has been to make peace with this, and use the metadata available in HTML to indicate which elements represent the useful content for other applications, and to converge the field and structure names on this basis.

This is not an exclusive approach - that is the point; you can add microformats to your generated HTML without affecting your internal data structures - h-entry would be the natural one for Fritter.

Then you can add other formats as desired to the content of the posts. You can use post type discovery as a heuristic to distinguish the kinds of post.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jan 15, 2018

@csarven @kevinmarks thanks for the helpful links and thoughts. I like the idea that we can have a library of vocabs to pull from, and that you can pull at-will from multiple vocabs. Tara and I have had the Activity Streams work on our minds throughout.

I think "well-defined fallback behavior" may be a key requirement here. With microformats, the HTML tags provide fallbacks. With our JSON-backed world, this may require some kind of standard protocol (akin to the requires field I suggested) or it may require a case-by-case protocol, such as a standard grammar for all social-feed messages that includes attributes to use when nothing is renderable. (This all reminds me of the Robustness Principle: "Be conservative in what you send, be liberal in what you accept.")

In case you're curious @kevinmarks why we're not exploring non-JSON formats, another key requirement is convenient developer ergonomics. The code must be accessible, even to non-professional programmers, and we're assuming JS is the environment. Even with something like JSX, the HTML syntax doesn't map very cleanly to JS objects, and JS devs just want to work with objects! This isn't to harsh on microformats at all -- we just have different requirements for Beaker's apps.

And this is part of my concern with JSON-LD; I don't want to introduce anything that feels like "needless boilerplate" (which suggests it's disconnected from the dev's sense of getting things done) nor do I want to make objects less convenient to work with (foo.relationship is preferable to foo["foaf:relationship"]). So far I haven't seen a library that gets me excited. The best I've seen is https://github.com/simplerdf/simplerdf, and even that's a bit kludgey to me. This might just be something I have to think through, but if anybody has examples of JSON-LD being used in Javascript that they think is very clean, please share.

@soyuka

This comment has been minimized.

soyuka commented Jan 15, 2018

In ApiPlatform we're mostly focused on using JSON-LD in combination with Hydra (backend and frontend). I know that these format have been really helpful in terms of SEO and interoperability between systems.
Relying on standards and RFC helps to process and extract data in a generic matter.

However, JSON-LD stays JSON and there's nothing that tells you not to use foo.relationship instead of foo["foaf:relationship"] (ie: not using any namespace). In fact, I'd add that, in combination with a given Schema (can also not be from the Schema.org), one is free to do anything he wants. There are some key parts of the specs that should be considered (for example @context, @type and @id).

Client-side works related to these format:

@csarven

This comment has been minimized.

csarven commented Jan 16, 2018

As already encouraged, do remain on course with JSON-LD. Some comments re HTML:

If representing structured and exchanging data in non-JSON(-LD) syntaxes was up for consideration, there is W3C Recommendation RDFa which can be used with any markup, eg. in HTML, SVG.

Transformations between RDF syntaxes keeps the information lossless. Moreover, anyone can define and publish their own vocabulary and relate it with the others', all meanwhile reusing the same RDF model to automate the consuming and decision making process. With RDF, there won't be any collisions in term usage across data because they have globally unique identifiers (using "http") as opposed to some arbitrary string.

So, JSON-LD/ActivityStreams2 to RDFa/ActivityStreams2 is lossless in that, all of, and the same semantics can be preserved. In contrast to alternative approaches, there is no additional complexity introduced by having the system learn a new out of band vocabulary which happens to be lossy and only covers a portion of AS2. If multiple vocabularies are used in JSON-LD/RDFa, everything works as expected out of the box.

Going from RDFa to JSON-LD is exactly the same. Information with its semantics will be all intact.

@csarven

This comment has been minimized.

csarven commented Jan 16, 2018

@pfrazee re SimpleRDF, if foo is your graph, then foo.relationship will work if you give it a context where relationship is defined like:

  • Single value: "relationship": { "@id": "http://example.org/relationship", "@type": "@id" }
  • Multiple values: "relationship": { "@id": "http://example.org/relationship", "@type": "@id", "@array": true }

SimpleRDF wraps rdf-ext, so perhaps look under https://github.com/rdf-ext/ as well but that'd be lower-level stuff. You can pick and choose your parsers and serializers, but I think you'll mostly need to work with rdf-parser-jsonld and maybe rdf-serializer-jsonld.

And as @soyuka says, if you don't want to deal with that, you can just treat JSON-LD as plain JSON ... Needless to say, that bakes in the knowledge about what to expect and deal with in the data to your application. Nothing "wrong" with that. As long as your code is consistent in how it handles and generates data in JSON-LD, you're okay.

@soyuka

This comment has been minimized.

soyuka commented Jan 16, 2018

Nothing "wrong" with that. As long as your code is consistent in how it handles and generates data in JSON-LD, you're okay.

Especially if you associate your own json schemas to describe the spec, validate or whatever.

@kevinmarks

This comment has been minimized.

kevinmarks commented Jan 16, 2018

Your assumption that JSON is more developer accessible than HTML is a bit of a tricky one - as you have to add more conventions to overcome the visibility issues, you may find that the initial obviousness is not longer true - this reminds me of 'markdown is simpler than html' when I always have to look up the link syntax in markdown.

The 'needless boilerplate' was a concern for Activity Streams, so the spec does say you can use the properties without the context (implied by MIME type).

When a JSON-LD enabled Activity Streams 2.0 implementation encounters a JSON document identified using the " application/activity+json" MIME media type, and that document does not contain a @context property whose value includes a reference to the normative Activity Streams 2.0 JSON-LD @context definition, the implementation must assume that the normative @context definition still applies.

RDF is it's own model - there are people such as @csarven who find it helpful in their worldview, and other that find it more complex than they need. As far as schema.org goes, the way that combines object inheritance with RDF makes it much harder to understand.

As for 'just building RSS readers', RSS in practice has a huge variation of markup and choices - any tech that takes off odes tend to accrete more over time. Ultimately you are going to extract the bits you understand and skip the rest, but following some uniform naming of fields as far as possible is good, and the activity streams one has done a fair bit of analysis to do that.

@cwebber

This comment has been minimized.

cwebber commented Jan 16, 2018

Hi @pfrazee, really happy to read this thread. As you probably know, ActivityPub (of which I'm co-editor) uses ActivityStreams, and ActivityPub is used by a growing number of social networking applications, including Mastodon. It would be great to have Dat/Beaker join us.

I've been pushing for the idea of a peer to peer ActivityPub application, and maybe Dat/Beaker is actually the right way to go there! (We intentionally made certain that ActivityPub does not require specifically the https:// uri scheme, so maybe dat:// will work very well?)

Re: RDF'ness of json-ld: the main reason for json-ld is to allow for the expressiveness of linked data without requiring people go "full RDF". It's totally possible to write an application using ActivityStreams using more naive json tooling, and we made sure that this was a requirement when we worked on ActivityStreams. In fact Mastodon is an example of an application that does support ActivityStreams as valid json-ld, but internally operates on it as more naive json. So this is more than possible.

I'd love to talk more about this with you if you'd like to. Maybe the Social Community Group would be a good place to discuss?

@AlbertoElias

This comment has been minimized.

AlbertoElias commented Jan 16, 2018

My only thought is that if we do go with JSON-LD, where I understand the developer friction, there should definitely be a Web API to handle it, as requiring a parser library for something that would become so common might even be what stops it from being widely used.

I like the schema flexibility and that it's supported, and I do believe that, if we do go down this route, we will converge in schemas, so it won't be a pain for newcomers, while also allowing for innovation to happen.

There's also AcitivtyPub built on ActivityStreams which Mastodon uses. Currently it's thought out to work nicely in the Client-Server model, but I think it adapts nicely to the serverless decentralized model.

@0x0ade

This comment has been minimized.

0x0ade commented Jan 16, 2018

As a newcomer, it's inspiring and fascinating to see the ongoing discussion. I just wanted to share a few things on my mind. Please ignore me if I'm being too naive / dumb, as I don't want to disrupt the ongoing discussion.

  • I can't see how or where I would use microformats in a P2P social feed webapp*.
    dat:// websites are static and the data is stored separately - Rotonde and Fritter thus render the posts at runtime. Embedding data in the rendered HTML for SEO or similar purposes shouldn't be our concern for now.
    Personally, it makes more sense to use a format that was meant for data storage and data transfer in the first place, and JSON is practically a JavaScript-native format with parsers for almost every other language.

    • *: A highly unrealistic case where I would use microformats is if we were to store all posts pre-rendered in index.html and their respective post pages, making the user's feed readable in a truly static context, but this brings many other issues.
  • I'm currently going through the peer to peer ActivityPub documentation. It helps me in understanding ActivityStreams a lot, but...

    ActivityPub was written intentionally to be layerable on any protocol that can support
    HTTP GET and POST verbs.

    As far as I know, the dat:// protocol only supports GET (DatArchive.readFile (doc) and standard resource fetching) across the network and POST (DatArchive.writeFile (doc)) into your own archives. This would be analogous to POSTing to your own outbox.

    - You can POST to someone's inbox to send them a message
    + You can POST to your own outbox and hope that the receiver is fetching your outbox

    Due to that limitation, we don't even have an "inbox". Instead, we fully rely on the client to fetch any outboxes to build our feed and find any mentions. As far as I know, in Fritter, your "inbox" is composed of the people you're following. (I don't know if / how it changes for notifications.) In Rotonde, your "inbox" is composed of the people you're following and the people they're following (if you enabled "discovery").

    - You can GET from your inbox to read your latest messages
    + You can GET from the outboxes you fetch (basic case: people you follow) to read your latest messages

    I'd happily adopt a dat-compatible peer to peer version of ActivityPub in the future.

  • I'm still looking at which JSON-LD schemas / contexts to possibly support in Rotonde. ActivityStreams 2 seems like the most obvious choice, but it initially "shocked" me - I initially didn't even understand whether a Create is enough or if I need to Add it to my feed, and I'd prefer to f.e. delete a post from my WebDB instance instead of checking if it got Deleted. It adds a lot of bloat which I (as a naive newcomer) would prefer to avoid: I just want to read and write my entries, not what I'm doing with it. The dat versioning history is already there for that.

I'm sorry if anyone is now grabbing and shaking their head, but I just wanted to get this out.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jan 16, 2018

I'm reading up on the ActivityPub vocab and JSON-LD dev ergonomics. I'm hopeful that I can make them work for us, but I'm not optimistic yet. @0x0ade your second and third bullets about the inbox/outbox divide and the superfluity of Activities like Create is something I just verbalized myself in the W3 social IRC. If there's an opportunity, it's probably to only use the Object-based schemas, and not the Activity-based schemas.

If I had to rank my requirements, they are:

  1. Developer experience (clarity, ease of use, enjoyability)
  2. User experience (good behaviors in all cases, good fallbacks, minimal "missing messages")
  3. Compatibility with any existing ecosystem

Those are all requirements, so compat does matter to us, but it's less of a priority than DX and UX are.

As the ActivityPub folks have pointed out, fairly minimal conformance is all it takes to make compatibility possible. But I'd also mention that solving these questions with "Use an existing vocab" is actually not solving the question of decentralized development; it's dodging it. Vocab choice is a concern for the developer of a specific application, rather than a framework for thinking about how to design our schemas.

I'll keep considering JSON-LD and ActivityPub, but I'm primarily interested in the techniques we're going to use for managing fallback behaviors in the face of unexpected schema differences.

@csarven

This comment has been minimized.

csarven commented Jan 17, 2018

That's it! Start with Object, and if/when you want to express an Activity about an Object, you can extend. Activity will refer to the Object.

I agree that choosing the "right" vocabs go a long way but there is no one size fits all, and certainly not indefinitely. It not only holds true for you own application and the data is generates/consumes, but also what you are trying to achieve here across applications. Things change - data changes. Embrace that as one of the core attributes of your system, instead of getting blocked by the idea that your applications may not end up using the right vocabs, or be perfectly understood by another. There is plenty of space for interop, just as well as things getting missed.

Different applications are going to go at it differently, so you have two general approaches: 1) try to define something that will work for everyone, or 2) use the best you can for your own application and cross your fingers for interop. I think 1 doesn't really work even if we have everyone on the table. Moreover, there are applications that don't exist yet which will have different requirements than we can foresee. 2 is pragmatic, and if different applications generate data that's 80% meaningful to another application out of the box, then that's fairly successful. For concepts that's not immediately meaningful to an application, it is possible to direct the application to investigate further to see if and what other ways it could become meaningful - follow your nose type of exploration

In practice, applications cluster around vocabularies organically - due to trends, usefulness, evolvability, and so forth. That's something to keep in mind.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jan 18, 2018

A few reads that are worth including in this discussion:

I think Robin makes an extremely good point here:

It should therefore be a core tenet of linked data that publishers should not have to think about interoperability through existing vocabularies (unless they are specifically taking part in an existing, relatively predictable data community). If the system is predicated on people thinking about reuse before they can even start publishing, then it will largely fail — especially in reaching the vast amounts of “small data” that exist in the wild.

This also very closely fits my thinking:

One of the greatest values in publishing reusable data is that you know neither who will want to use it nor how. Because of that, unless it is obvious that you're targeting a given community, the chances are that it is not worth thinking about how to fit your model into a shared one. The first order of business is to do a good job getting the data out there, and the best way of doing that is likely to simply expose something close to your own internal model (which isn't to say that you shouldn't learn from how things like yours are commonly modelled). The odds are very high that conversion will be needed no matter what, for at least some of your users (and not unlikely for most). A resilient linked data ecosystem needs to treat data conversion as a natural, core, and common part of everyday life. Munging happens. It just always does.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jan 22, 2018

I spent the weekend preparing a solution to this issue that I'd like to propose.

My goal was to make something that won't generate too much frustration. That's a pretty high bar to clear in an opinionated space, so I hope this gets close, and I apologize if I've come short! I'm also trying to find something which the un-opinionated developer finds palatable, and that should explain the pragmatic approach of this spec.

My proposal is called JSON Lazy (JSON-LZ). Links: README.md, DESIGN.md.

I chose that name for two reasons:

  1. As a playful nod to the JSON-LD community, with whom I want to remain compatible.
  2. To evoke the core philosophy that compatibility should be solvable as an afterthought.

(We can rename if we're concerned it's too confusingly similar to JSON-LD.)

JSON-LD advocates should read the design doc to get a clear understanding of why I'm proposing an alternative to JLD. Please feel free to argue the points and propose alternatives. I took multiple passes at using JLD's schemas for this proposal, but didn't feel like I was getting intuitive results. My strategy instead is to remain compatible by avoiding conflicts, and that means that a JSON can use both JSON-LD and Lazy. I think we can find some simple strategies from Lazy's tooling to understand JSON-LD too, but complete support is a non-starter.

The discussion in this thread about whether to adopt ActivityStream's vocabulary has been valuable for helping to clarify this task, and I thank everybody for pitching in their thoughts. However, I think it's important to say that we should not solve this problem by adopting a single application-schema proposal. What we're attempting to solve is the process of application dev in a decentralized network, not the particular schema needs of social media applications.

Therefore, rather than asking, "Should we use ActivityStreams?" we should instead ask, "How can we make it easy for devs to add ActivityStream down the road?"

I hope to incorporate your feedback, so please let me know your thoughts and concerns.

@cwebber

This comment has been minimized.

cwebber commented Jan 22, 2018

Arg... well, if you decide to create a new mechanism for extensible JSON, effectively that's your decision... however I'll say that I think that the JSON-LD community has worked pretty hard to sort out a lot of these things already. I think there's a big advantage to being able to share interoperability with groups like schema.org, ActivityStreams, and to be able to take advantage of tooling like linked data signatures. I'd strongly encourage you to reconsider in fragmenting this space and help us collaborate to bring unity here instead!

@cwebber

This comment has been minimized.

cwebber commented Jan 22, 2018

Perhaps it's not a bad idea to consider joining the json-ld Community Group and raise your thoughts/concerns there? Great folks, and IME very open to discussing things.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jan 22, 2018

@cwebber My proposal isn't a final decision, but it should anchor our discussion from here on out. Lazy is an example of what I see as decent dev ergonomics. I really hate disappointing the LD community, so if we can fix the ergonomics of JSON-LD, then I'd be certainly be happier.

That said, fragmentation in schemas must be expected in a decentralized network, so I'm not very compelled by your argument that we must adopt JSON-LD to avoid fragmentation. If JLD-based software fails to work when JLD metadata isn't present, then we should dismiss JLD as a solution.

Something to be aware of: While this thread has represented a lot of folks from the LD world (and that has been great!) I've had a parallel thread on the SSB network going, and I've talked in private with the active Beaker/Dat devs. These are the actual users of the system, and the response from them has been, "I don't like JSON-LD and I probably won't use it." I'm doing my best to square that circle.

Lazy supports the schemas from schema.org and ActivityStreams, so I'm not concerned about losing access to those bodies of work. Note: Schema.org has usage examples for Microdata, RDFa, and JSON-LD.

@neauoire

This comment has been minimized.

neauoire commented Jan 22, 2018

Well, as expected I am in favour of the JSON-LZ format as it's the realization of my original suggestion. I am guessing that @0x0ade will equally be in favour to give this a try :)

@kevinmarks

This comment has been minimized.

kevinmarks commented Jan 22, 2018

I think you have addressed an imaginary problem to avoid dealing with a real one.

Currently you have a handful of field names that don't need namespaces. You are going to have to cope with people adding arbitrary new fields to your json, because it is user editable in Beaker, and is on their filesystem to edit too.

So, debating the format of how to deal with a theoretical field name collision is architecture astronautics.

In practice, what people will likely do to embed a different format would be to add a new field name and put an object in it, rather than mix fields into your top level object. LZ adding a way to document that this has happened is somewhat plausible; relying on it being done accurately is less so . The nuances of different namespace proposals are not something I am going to comment on further.

However, your comment on ActivityStreams I do think is worth responding to, as this is a real problem -naming things in social networks to maximise interoperability and decrease confusion is important.

The discussion in this thread about whether to adopt ActivityStream's vocabulary has been valuable for helping to clarify this task, and I thank everybody for pitching in their thoughts. However, I think it's important to say that we should not solve this problem by adopting a single application-schema proposal. What we're attempting to solve is the process of application dev in a decentralized network, not the particular schema needs of social media applications.

Good. So adopt the schema for social media applications that has 10 years of real world experience in, instead of making another one ab initio.

Activity Streams grew out of the OpenSocial convergence of existing social network schemas. It was implemented by dozens of social networks, including 2 of Googles, and both MySpace and Facebook (both by Monica, in a virtuoso bit of coding and career trajectory). It's the standard format for Gnip's unified stream API, and Granary will convert silos into it. ActivityPub uses it too. You don't need to pick up any JSON-LD baggage to use the structures.

Therefore, rather than asking, "Should we use ActivityStreams?" we should instead ask, "How can we make it easy for devs to add ActivityStream down the road?"

If this isn't a problem you are that engaged by, why not adopt it and move on?

@pfrazee

This comment has been minimized.

Member

pfrazee commented Jan 22, 2018

"Just use X schema" is quite simply the antipattern. Our role as the Beaker team is not to tell devs what schemas to use; it's to develop the tools that devs need to make their independent decisions work together.

Field name collision is always theoretical until it happens -- but that's not really what Lazy is about anyway. It's about providing optional tools, which can be used in addition to duck-typing, and which can assist with munging and help avoid common errors. There may be some better techniques and more interesting tools that we can use in/instead-of Lazy, so let's make the discussion about that. Andre Staltz and I had a productive DM discussion about it; I may reproduce some pieces from that in this issue.

As an aside -- Tara and I might still adopt ActivityStreams for the Fritter app, because that'll be our choice as the devs of that app. That's orthogonal to this broader discussion though, so let's please move past it.

@neuroplastic

This comment has been minimized.

neuroplastic commented Jan 30, 2018

@BigBlueHat is right: critical knowledge communities already use RDF-based formats. The Indieweb and JSON-LD teams are right: please no more fragmentation. But Beaker's back-and-forth with Indieweb and JSON-LD teams seems to be missing something:

We're all following the @timbl of 1993. Why don't we follow the @timbl of 2017?

SOLID addresses the Indie and decentralised web missions, plus re-establishes the open semantic web. Before Beaker brought back that old NCSA Mosaic tingle, @csarven's Dokieli did.

However, SOLID, like Indieweb, is still stuck at a developer-level UX. But Beaker's decentralised profiles -> pods can solve this.

I do not want to see Beaker being held back by the very things holding SOLID and Indieweb back. I think Beaker should be able to try new formats as needed, IF it results in being able to do SOLID without a server.

I want the openness of the early web, WITH the convergent evolution of RDF, ML, and node. I suspect the solution to the present debate is to move the LD part of JSON-LD into JS. HTML is moving from index.html to index.js, so let index.js handle the RDFa. (RFC @yoshuawuyts ?)

I want to see Beaker being pulled forward by SOLID, not pushed forward by Rotonde.

@BigBlueHat

This comment has been minimized.

BigBlueHat commented Jan 30, 2018

@neuroplastic excellent thoughts. It's probably past time we all got back to work on http://rdf.js.org/ and friends. Thanks for the push!

@yoshuawuyts

This comment has been minimized.

yoshuawuyts commented Feb 1, 2018

@neuroplastic sorry, I don't think I'm following. What is it you're asking?

@neuroplastic

This comment has been minimized.

neuroplastic commented Feb 1, 2018

@yoshuawuyts, @pfrazee and @taravancil are making a push right now against the UX barrier that has kept the original vision of the ReadWriteWeb (and the semantic web) from succeeding. In your, @dominictarr's, et alia's work, we see a modular approach to handling HTML from within node.js javascript. As e.g. @jondashkyle demonstrates, this can be a simpler UX than direct HTML wrangling.

@pfrazee is identifying JSON-LD as a blocker for usability of the p2p RWW. But JSON-LD is a critical piece of the puzzle to enable scientific and general data integration in an open web. This issue is an argument that can potentially make or break JSON-LD, and hence, SOLID. We can't let LD go, not when @timbl is this close with SOLID. (If it helps, young hurried devs, think of the LD as JSON-Leibniz&Diderot).

The elements in JSON-LD @pfrazee is calling attention to - @context and type - are so epistemically freighted, no wonder there's been no usable solution, hence the tenson in this issue's thread. The web needs a breakthrough here, and it needs it now.

@msporny, @kevinmarks, @timbl, @csarven, @RubenVerborgh: are you ready to agree there is no declarative solution to the usability problem of RDF formats?

If so, then the solution must be that an application is already a set of contexts and types. The same JSON must be ingestible by different apps able to apply different LD contexts and types. This is where i see @pfrazee leading, but, critically, IndieWeb and SOLID are stuck in the old app paradigm. For SOLID and the RWW to work, JSON-LD needs new imperative/functional tooling, and the radical simplection happening in choo, dat, and e.g. hyperscript is the place.

@yoshuawuyts, can we rebuild SOLID using choo, Dat, and Beaker?

@BigBlueHat

This comment has been minimized.

BigBlueHat commented Feb 1, 2018

@neuroplastic there's loads of great thoughts (and concerns) tucked in both your comments...but I'm not sure they're addressable in a single issue on the GitHubz. Maybe you could kick off a few more narrowly targeted and/or actionable requests to various mailing lists and projects?

Most of the people here are working toward what you seem to be hoping for, but it is (as ever) going to take time, effort, and (above all) collaboration.

It looks like you've got enough knowledge of the space (and the "actors" in it) to write-up some pretty focused proposals. Maybe toss up some GitHub repos, Gists, or wiki pages some place, and pass those around to various WGs. File clearly closable issues (where possble), and kick off some experiments. Any and all of these things could help move the world forward.

Cheers!
🎩

@neuroplastic

This comment has been minimized.

neuroplastic commented Feb 2, 2018

@BigBlueHat, sorry to thread-crash. I do have FOSS contribs. along these lines in the works. But Dat, Beaker, and choo are moving orthogonally to established patterns in web standards, which is creating an opening for the kind of unexpected move with which @timbl started all this ca. 1993. But this thread also brings up the fear that XKCD#927 ('How standards proliferate') will happen AGAIN, for the ten-thousandth time since Dan Connolly brought SGML to the W3C in 1995.

I spoke out of turn, but i wanted to throw a spotlight on how close @timbl's vision is, IF Indieweb and SOLID will support the lateral step Dat, Beaker, and choo are taking, and IF they in turn will support the lower-case semweb. Don't let all that is SOLID melt into air.

@BigBlueHat, thanks for being a kind and helpful gatekeeper.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Feb 3, 2018

Just to update on this, we've got a phone-call scheduled with @msporny to discuss this more. We'll keep everyone updated!

@dominictarr

This comment has been minimized.

dominictarr commented Feb 7, 2018

think of the LD as JSON-Leibniz&Diderot

@neuroplastic I am now more confused

@msporny

This comment has been minimized.

msporny commented Feb 7, 2018

The call is at 3pm ET today, dial-in details are here. Anyone is welcome to join, the call is open to the public.

https://lists.w3.org/Archives/Public/public-linked-json/2018Feb/0003.html

@neuroplastic

This comment has been minimized.

neuroplastic commented Feb 8, 2018

@dominictarr Here's the spec for JSON-Leibniz&Diderot:

https://en.wikipedia.org/wiki/Characteristica_universalis
https://en.wikipedia.org/wiki/Denis_Diderot#Encyclop%C3%A9die

.. Nailing JSON-LD for beaker and p2p may be the difference between freedom from facebook, vs. freedom from facebook and freedom from an academic publishing and ranking system that needs to go just as badly.

@webdesserts

This comment has been minimized.

Contributor

webdesserts commented Feb 8, 2018

For interested parties, here are the meeting notes for the above call.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Feb 8, 2018

I've found one notable difference between how I was thinking with Lazy and how LD works.

  • In LD, the goal is to expand every key into a full URL. So, eg, the compact form {"name": "Paul"} can expand to {"https://schema.org/name": "Paul"}.
  • In Lazy, the goal is to categorize every key under a vocab ID. So, eg, we know the name in {"name": "Paul"} belongs to the schema.org vocab.

So, Lazy focuses on vocabularies as a group, while LD focuses on individual attributes.

Why does this matter? Lazy has 2 goals: to help you detect schema incompatibilities, and to help you transform between schemas. By thinking in terms of vocabs, Lazy is able to do a sort of "support detection exchange," where the JSON object declares its vocab IDs and which ones are required, and the app declares the same (but for itself) and we walk away knowing about the support. (See Detecting schema support.)

If we're going to make the same concept work for LD, we'll need to find a way to use the common root in attribute IRIs as vocab IDs. For example, consider the following object:

{
  "@context": {
    "name": "http://xmlns.com/foaf/0.1/name",
    "homepage": {
      "@id": "http://xmlns.com/foaf/0.1/homepage",
      "@type": "@id"
    }
  },
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/"
}

We'd need to somehow determine that "http://xmlns.com/foaf/0.1/" is the vocab ID.

@msporny

This comment has been minimized.

msporny commented Feb 9, 2018

In Lazy, the goal is to categorize every key under a vocab ID.

Why can't you just do something like this:

{
  "@context": { "@vocab": "http://xmlns.com/foaf/0.1/" },
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/"
}

Doesn't that accomplish the Lazy goal of categorizing every key under a vocab ID? With JSON-LD 1.1, you can even switch the vocab ID based on duck typing. So, if someone says something is of type "Foobar", the JSON-LD processor will go "oh, I know FooBar, switching to @context XYZ" so that all keys are associated with vocab ID XYZ. This enables vocabularies as a group, right?

@pfrazee

This comment has been minimized.

Member

pfrazee commented Feb 9, 2018

@msporny Yes but there's no way to specify additional vocabs, right? That's what I'd really like to have. If not, then I think we'd need an algorithm that infers vocab IDs by looking at all the IRIs and extracting common roots. (Kind of like the reverse of prefixes.)

Could you point to the spec section on the duck typing? I'm not sure what the mechanism is you're referring to.

(In your example I think you intended to use { "@vocab": "http://xmlns.com/foaf/0.1" } no /name at the end.)

@msporny

This comment has been minimized.

msporny commented Feb 9, 2018

@msporny There's no way to specify additional vocabs, right?

Don't quite know what you mean by "additional vocabs". You can specify vocabs out of thin air, they don't have to be dereferenceable, for example this is valid (albeit, a bit strange):

{
  "@context": { "@vocab": "urn:beaker:totally-made-up-vocab#" },
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/"
}

Could you point to the spec section on the duck typing? I'm not sure what the mechanism is you're referring to.

Sure, duck typing is a standard feature of JSON-LD 1.0:

https://json-ld.org/spec/latest/json-ld/#specifying-the-type

... and setting a "scoped context" when a duck type is specified is a proposed JSON-LD 1.1 feature (see example 55):

https://json-ld.org/spec/latest/json-ld/#scoped-contexts

(In your example I think you intended to use { "@vocab": "http://xmlns.com/foaf/0.1" } no /name at the end.)

You're right, corrected for future readers of the thread.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Feb 9, 2018

Don't quite know what you mean by "additional vocabs". You can specify vocabs out of thin air, they don't have to be dereferenceable, for example this is valid (albeit, a bit strange)

The @vocab sets the default vocab but sometimes we need more than one vocab on the object. All the additional have to be set by term mappings (and may use prefixes), right? In which case we'll need an algorithm for extracting the common root URI.

Your urn:beaker:totally-made-up-vocab# may answer my question about "Lax" IRIs. I wouldn't use beaker as the namespace of course.

@dlongley

This comment has been minimized.

dlongley commented Feb 9, 2018

@pfrazee,

All the additional have to be set by term mappings (and may use prefixes), right?

You can specify additional vocabs by using the scoped context feature:

https://json-ld.org/spec/latest/json-ld/#scoped-contexts

@pfrazee

This comment has been minimized.

Member

pfrazee commented Feb 9, 2018

@dlongley thanks.

Hmm. This mostly works but it's very syntax heavy. Even after reducing the example in that spec section to its bare essentials, it's not very clear:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "interest": {
      "@context": {"@vocab": "http://xmlns.com/foaf/0.1/"}
    }
  },
  "name": "Manu Sporny",
  "interest": {
    "name": "JSON-LD",
    "topic": "Linking Data"
  }
}

For comparison, in Lazy this looks like:

{
  "schema": [
    "http://schema.org/",
    {"name": "http://xmlns.com/foaf/0.1/", "attrs": "interest.topic"}
  },
  "name": "Manu Sporny",
  "interest": {
    "name": "JSON-LD",
    "topic": "Linking Data"
  }
}

The internal model of LD is almost certainly more thorough and consistent, but the Lazy example is much clearer to read.

@dlongley

This comment has been minimized.

dlongley commented Feb 9, 2018

@pfrazee,

I'm having a little trouble understanding the meaning of the Lazy example. Particularly "attrs": "interest.topic" ... is that a typo? Or does it mean that only "topic" is in the foaf vocabulary? What about "interest.name"? In the JSON-LD example both "name" and "topic" are in the foaf vocabulary.

@dlongley

This comment has been minimized.

dlongley commented Feb 9, 2018

@pfrazee,

If there's some other syntactic magic that would significantly improve the readability in the JSON-LD case I'm sure the community would consider it. Using an array for @vocab, for example, could be a starting point to aligning with what you're doing there with schema.

Though, I am concerned that it's hard to know, without sufficiently examples, whether the proposed Lazy schema approach actually looks better (or covers enough use cases).

@pfrazee

This comment has been minimized.

Member

pfrazee commented Feb 9, 2018

In the JSON-LD example both "name" and "topic" are in the foaf vocabulary.

Not a typo. Only topic is in FOAF. The JSON-LD example expands to this:

[{
  "http://schema.org/name": [{"@value": "Manu Sporny"}],
  "http://xmlns.com/foaf/0.1/interest": [{
    "@id": "https://www.w3.org/TR/json-ld/",
    "http://schema.org/name": [{"@value": "JSON-LD"}],
    "http://xmlns.com/foaf/0.1/topic": [{"@value": "Linking Data"}]
  }]
}]

If there's some other syntactic magic that would significantly improve the readability in the JSON-LD case I'm sure the community would consider it. Using an array for @vocab, for example, could be a starting point to aligning with what you're doing there with schema.

That's roughly what I'm trying to figure out. Almost certainly it's harder because Lazy is speculative. If you look at the readme, the two main tools I'm trying to get are attribute iteration by vocab which is useful for transforming between schemas, and support detection which is useful for detecting fatal ambiguity. These are the tools I know I need now, but I might need more later.

Syntactic clarity is always subjective but I know {"@context": {"@vocab": "..."}} isn't ideal.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Feb 9, 2018

(Tara and I joke that a mob of Linked Data engineers are going to arrive at our door with pitchforks some day. We appreciate yall tolerating us -- I hope I'm making useful observations and not just wasting your time. I'm doing my best to bring experience from working on SSB and Beaker/Dat.)

@cwebber

This comment has been minimized.

cwebber commented Feb 9, 2018

I think if we can figure out how to converge and avoid further fragmentation of this space, especially if we can find out how to make things cleaner for current and future users of json-ld (which may mean adding things, or it may just mean better explaining what we have), then that's time well spent :)

@dlongley

This comment has been minimized.

dlongley commented Feb 9, 2018

@pfrazee,

If only topic needs to be in the foaf vocabulary then the JSON-LD example could be:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "topic": "http://xmlns.com/foaf/0.1/topic"
  },
  "name": "Manu Sporny",
  "interest": {
    "name": "JSON-LD",
    "topic": "Linking Data"
  }
}

I find the JSON-LD here more clear than the comparable "Lazy":

{
  "schema": [
    "http://schema.org/",
    {"name": "http://xmlns.com/foaf/0.1/", "attrs": "interest.topic"}
  ],
  "name": "Manu Sporny",
  "interest": {
    "name": "JSON-LD",
    "topic": "Linking Data"
  }
}

If you only want topic to use foaf within interest in JSON-LD it would be:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "interest": {
      "@context": {"topic": "http://xmlns.com/foaf/0.1/topic"}
    }
  },
  "name": "Manu Sporny",
  "interest": {
    "name": "JSON-LD",
    "topic": "Linking Data"
  }
}

I don't personally feel there's anything more than a marginal difference between the two syntaxes here, but others are obviously free to disagree. I do, however, imagine we could find a variety of competing examples where it's more clear in one syntax than in the other (and vice versa). Perhaps the question would fall to which syntax better handles the common case -- or if we can easily support different kinds of syntactic sugar depending on the case.

If you could provide a list of concrete use cases where the JSON-LD approach just seems unwieldy and perhaps even propose some syntactic sugar to help mitigate, then I think the JSON-LD community would be quite interested in making some changes. We always want to make things better and easier to use.

@pfrazee

This comment has been minimized.

Member

pfrazee commented Feb 9, 2018

@dlongley Thanks for the examples and thoughts. My goal is to make sure I totally understand LD and explain my thought process as I go. Then I'm going to take another pass at a toolset that uses LD as-is. If I walk away saying "I really need some syntax changes" then I'll propose them.

My chief criticism of the LD syntax here is, I don't think the rules are self-explanatory. You really need to internalize LD's mechanics to read the data.

Looking at this example:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "topic": "http://xmlns.com/foaf/0.1/topic"
  },
  "name": "Manu Sporny",
  "interest": {
    "name": "JSON-LD",
    "topic": "Linking Data"
  }
}

I can imagine a number of questions for somebody new:

  • What's a @context? What's a @vocab? When do I need @vocab vs @context?
  • Is this saying the vocabulary for the @context object is "http://schema.org/"?
  • It's not -- then why isn't @vocab in the root object then?
  • Is "http://xmlns.com/foaf/0.1/topic" a vocabulary too?
  • What's the difference between what @vocab is doing and what topic is doing?

Speaking from personal experience, I did not realize for a long time that the purpose of LD's context was to expand each attribute into an IRI. My original assumption was that LD attached a "vocabulary ID" to each attribute. I had visited the json-ld playground and spec about five times before I realized that.

For comparison, here's a complex example of Lazy:

{
  "schema": [
    "alice-allisons-calendar-app",
    {"name": "alice-date-ranges", "attrs": ["startDate", "endDate"]},
    {"name": "bob-bunsens-rsvps", "attrs": "rsvp.*", "required": true},
    {"name": "bob-bunsens-rsvps2", "attrs": "rsvp.deadlineDate", "required": true}
  ],
  "type": "event",
  "name": "JSON-LZ Working Group Meeting",
  "startDate": "2018-01-21T19:30:00.000Z",
  "endDate": "2018-01-21T20:30:00.000Z",
  "rsvp": {
    "requested": true,
    "deadlineDate": "2018-01-18T19:30:00.000Z"
  }
}

This is neither perfect nor revelatory, but I do think it's very clear for somebody who's just sat down to it. It's declaring schemas, it's specifying which attributes belong to the schemas, and apparently some schemas are required. The mechanics are very direct.

@webdesserts

This comment has been minimized.

Contributor

webdesserts commented Feb 9, 2018

As a pretty average dev who is still trying to grasp what LD can do I can vouch for the fact that @context and @vocab mean nothing to me. I'm not attached to the LZ spec, but at the very least @schema means something and I can start to picture what some of the props are doing.

@scotttrinh

This comment has been minimized.

scotttrinh commented Feb 9, 2018

As a pretty average dev if I encountered metadata on an object like schema or @context I most certainly would do some googling. IMHO, neither syntax is immediately transparent and revelatory in-and-of-itself, so I would tend to go with LD just since it has so much existing documentation, examples, community, etc. If the difference between LZ and LD is simply naming conventions, I'd say we're underestimating the developer community here. If you're building decentralized applications already, you've already gone through a certain amount of 🤔 to have arrived at an understanding of what beaker, dat, ssb, etc, and this doesn't seem like an impossible hurdle. Having said that, I understand that we want the decentralized web to go mainstream and we have to account for a (hopefully) lower barrier of entry into writing/forking/extending these apps.

I listened through the call audio and I really appreciate @pfrazee 's desire to make the process of working on these apps as enjoyable and friction-less to the developer as possible, being one of those in the target demographic. But, it seems like dropping LD just because @context/@vocab is (arguably) more opaque than schema seems at least a little like we're worrying too much about a documentation issue.

Just my $0.02 as a JS dev interested in building decentralized apps!

@dlongley

This comment has been minimized.

dlongley commented Feb 9, 2018

@pfrazee,

I would have nearly the same set of questions when looking at Lazy's approach to the problem as well.

  • Why is schema a list? Why isn't it just a map where I can list each property as a key?
  • What is "attrs"? What is that strange value in the "name" field and how does it work?
  • I see "required" ... is this like json-schema or not? Why didn't it reuse it?
  • So on...

Any developer that is interested in what's going on in @context or in schema is going to have to read about it. We just need to make sure that we give them the simplest, quickest (least cognitive load) answer when they do.

Developers like to know "what is this thing and why is it useful?". For people who just want to write code and get things done -- we need a good, concise answer. Perhaps more importantly, we want devs to walk away feeling like we've helped them solve a problem or that we've made whatever they are building more useful for others.

Edited to add some extra emphasis on that last part ... helping devs make things that are more useful for others is a key part of LD. It's about getting more use out of the same data (and vocabularies) and writing fewer applications that do the same thing because of it.

@tantek

This comment has been minimized.

tantek commented Jun 14, 2018

Have been watching this thread, and want to also encourage vocabulary re-use wherever possible, as it greatly helps sharing content and bridging across various heterogenous systems.

<https://github.com/kevinmarks> mentioned h-entry already, which builds on all the experience and expertise from RSS and Atom.

Similarly, consider <https://microformats.org/wiki/h-card> (based on the IETF vCard standard, and adapted for the web and JSON-friendly) for references and descriptions of people.

Most recently I saw the calendar-app event example in a previous comment, which could re-use <https://microformats.org/wiki/h-event> as well (similarly based on the IETF iCalendar standard). Happy to present an example using LZ syntax etc with h-calendar vocabulary if that would be helpful.

All of these vocabularies have been both implemented and successfully interoperably deployed peer-to-peer across numerous websites, publishing and parsing / consuming. (stats and examples available at indiemap.org)

Happy to help answer any questions about how these vocabularies are developed and how they’re community maintained, either here or chat.indieweb.org.

(Originally published at: http://tantek.com/2018/164/t2/vocabulary-sharing-content-bridging-systems)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment