Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need use-cases section #8

Closed
tantek opened this issue Jun 22, 2016 · 4 comments
Closed

Need use-cases section #8

tantek opened this issue Jun 22, 2016 · 4 comments

Comments

@tantek
Copy link
Member

tantek commented Jun 22, 2016

Would be great to see a use-cases section that lists (even briefly) reasons that motivated a simplified JSON representation, and any additional use-cases for which jf2 can be used for.

@dissolve
Copy link
Member

Updated the ED with current usage descriptions. Does this make sense to you?

@aaronpk
Copy link
Member

aaronpk commented Apr 25, 2017

Since beginning the jf2 spec, I've continued developing XRay, and its format has diverged from the original jf2. Tonight I spent a while trying to reconcile the changes to submit a PR to the spec. I was unable to come up with a short PR, and instead got drawn in to thinking about the motivations behind a simpler mf2 JSON format to begin with.

I use XRay in a number of projects for various purposes.

  • My website runs every external URL through XRay to handle consuming the Microformats on the page, converting it to a simplified form. This is used whenever I reply to a post to display the reply context, as well as to fetch the post contents when I make a repost.
  • Loqi uses XRay to create a one-line summary of URLs pasted into IRC.
  • webmention.io uses XRay to parse the source URL of webmentions to extract useful data about the webmention, and makes this data available via an API.
  • IndieNews uses XRay to parse submitted URLs to display the name and author of the posts.
  • Quill uses XRay to show a preview of in-reply-to URLs.
  • My rudimentary reader uses XRay to extract the h-entry data from posts to display in my reader.

There are a number of things that XRay does when extracting the mf2 data.

  • Finds the author of a post following the authorship algorithm
  • Follows the comments presentation algorithm to remove the name property if it's a duplicate of the content.
  • Figures out the primary object on the page, or whether the page represents a list of posts, which is sometimes tricky. (some discussion on representative object)
  • Is vocabulary-aware, so always returns a consistent set of properties, and doesn't return unknown properties. e.g. published is always a single string, and category is always an array.
  • Sanitizes all HTML, allowing only a small subset of HTML tags and Microformats classes on the HTML elements.
  • For any values that might be embedded objects, e.g. a person-tag or in-reply-to property, always returns the URL in the value and moves the embedded object to a refs object, making it easier to consume.
  • The author property is a simplified h-card containing only name/photo/url properties that are single values.

As you can see, a lot of what XRay is doing is cleaning up some of the the "messy" parts of Microformats JSON. Not necessarily the specific JSON format, but more about the overall structure, such as how an author of a post can be in many different places in a parsed Microformats JSON object. This is not to place blame on Microformats, since what it's doing is creating a JSON representation of the original HTML, and allowing authors flexibility in how they publish HTML rather than prescribe specific formats is a core principle.

What this means is XRay is actually acting more as an interpreter of the Microformats JSON, in order to deliver a cleaned-up version to consumers. Most of my projects that use XRay could actually be considered "clients", such as how I use XRay to parse posts for my reader, whether that's output to me in IRC or re-rendered as a post on IndieNews.

My primary need for an alternative Microformats JSON format is actually a client-to-server serialization, where the client is getting a cleaned up version of external posts, and can assume that the server it's talking to is responsible for taking the messy data and normalizing it to something it expects. In this sense, the use case of jf2 is a client-to-server serialization, whereas the Microformats JSON is a server-to-server serialization. This would then be a core building block for Microsub, a spec that provides a standardized way for clients to consume and interact with feeds collected by a server.

The main current challenge in defining a spec for this use case is how tied to specific vocabularies it should be. For example, Microformats JSON says that every value should always be an array. However, there are a few properties for which it never makes sense to have multiple values, and creates additional complexity in consuming it, e.g. published, uid, and location. It's easier to consume these when the values can be relied upon to always be a single value. With the author of a post, the author of an h-entry may be an object or a string, making it more complicated to consume that when it can vary, so XRay's format always returns a consistent value. However this is tied to the h-entry vocabulary, since other Microformats vocabularies don't have an author property. In general, the success I've had with XRay's format is due to the fact that it makes hard decisions about what properties it returns, and is consistent about whether those properties are single- or multi-valued, in order to provide a consistent API to consumers.

I am just not sure how to balance wanting to provide that simplicity for consuming clients while also allowing flexibility in publishing, while also not hard-coding too much into a spec that might be obsoleted later.

(originally published at https://aaronparecki.com/2017/04/24/15/jf2)

@dissolve
Copy link
Member

As of last checking your usage matches the current JF2 spec. If it does not you can always open issues to that effect

@dissolve
Copy link
Member

as that is a separate issue and the original question was dealt with some time ago, going to close as commenter timeout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants