Skip to content
An RDFa 1.1 query engine implemented in pure, extension-free XSLT 1.0
XSLT HTML
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
LICENSE
README.md
rdfa.xsl
rdfa2.xsl
test.xhtml
test.xsl

README.md

RDFa 1.1 in XSLT 1.0

This is a (burgeoning) implementation of RDFa 1.1 in XSLT 1.0.

Rationale

It's almost 2017: why are you doing anything in XSLT?

I have been using XSLT (1.0, to boot) since sometime between late 2000 and early 2001. XSLT is:

  • A DSL that concentrates on one thing: schlepping markup
  • An open standard
  • With lightning-fast implementations (usually)
  • Present in literally every Web browser going back to MSIE 5.0, save for a few Android builds in between
  • Much quicker to write for most needs than any DOM API or other imperative and/or object-oriented framework
  • Consumes anything that can be incarnated as XML
  • Incapable of producing XML markup that is not well-formed (unless you explicitly want to produce text that isn't XML)
  • Results are therefore much easier to validate (note "well-formed" and "valid" are two different concepts in markup-land)
  • Transformations for Web markup can be done on the command line, in a browser, in a filter on an origin server, or in a reverse proxy
  • Inclusion functionality means code reuse and division of labour

As General-Purpose Web Template Processor

From mid-2002 through 2005, I designed, implemented, and maintained a workflow that enabled over a dozen translators to create fine-grained internationalized content, and a room full of visual designers to dress it up. For this task, we used DocBook and XSLT. I created, in effect, a library, with notches in it that the visual designers could fill in with their changes to things like navigation and chrome. They didn't need to know how anything worked, just that they needed to sandwich their HTML in between a template named this or that and they would get the results they wanted.

As Lazy Man's CMS

I had an idea around 2007-2008, and I am somewhat embarrassed to admit that I didn't have it sooner: Use XSLT to turn (X)HTML into (X)HTML. Produce bare-bones markup on the server side containing just the content, however you see fit, and then use XSLT to tack the ancillary stuff on top in a separate process.

Better yet, use embedded metadata to signal resources which can be transcluded, along with XSLT's built-in document() function to haul them in. Use this method to recycle Atom/RSS feeds (as I do on my own site) or generate SVG data visualizations (as I did on the client project which inspired me to write this library).

I hereby reiterate that this technique can happen in the browser or not, in a reverse proxy, in a filter on the origin server, completely separate from the application, which thus can be any mix of technical platforms, because they only have to produce well-formed (X)HTML (and if they don't, you can get a filter for that, too).

...in a box, with a fox, in the rain, on a train, on a boat, with a goat...

Embedded Metadata?

For my own site, and similar experiments, I simply piggybacked off the default terms you'd find in rel attributes in <link> or <a> elements. If I was going to develop the technique at all, I'd need something a lot more sophisticated.

One of the biggest problems of metadata is defining what all the terms mean. And then maintaining those definitions, making sure they get used properly, resolving ambiguities and conflicts, collisions between terms, etc. How, or even if you handle this is ultimately a philosophical position. For me, the answer is RDF, which I will hold distinct from two other very closely-related concepts, linked data and the Semantic Web.

It is important to recognize that RDF is not a syntax, like JSON is. RDF has multiple equivalent syntaxes, including at least one in JSON.

RDF solves a number of problems with term management by making everything a URI, which means the same term can live with two different authorities and mean two different things. (It also means some far-off PhD can spend years developing an e-commerce vocabulary so I don't have to, with the bonus that if I use it, anybody else who understands that vocabulary can automatically understand my data.) By making those URIs dereferenceable URLs, you can put the documentation for those terms, both human- and machine-readable, a click or tap away, in an application of what we call linked data. On top of that you can add all the eggheaded logic, inferencing, smart agents and AI stuff, which is what we call the Semantic Web.

So what I see when I look at RDF is not just an overengineered description framework for metadata terms aimed at a pipe dream, but a natural and legitimate method for describing data structures, not just for interchange but also for internal use, and therefore an extremely practical way to organize the vagaries of everyday Web development.

You cannot tell me the JSON people don't have trouble managing things like the names of object keys, locations of API endpoints, etc yadda yadda.

So What?

Stepping back to mid-2006, I was toying with the idea of creating an RDF-based Web framework. The idea was that hitting a given URL would disgorge a glob of data which was the content of that URL, which would contain, among other things, other URLs, connected via well-understood attributes, which my (XSLT) templates would know to convert into a link, or an image, or an embedded piece of text, or whatever I wanted. This is a technique known as the impossibly-bad acronym HATEOAS, which for those of you who actually read Roy Fielding's dissertation, know stands for Hypertext as the Engine of Application State, or a really groovy way to make websites (if you can pull it off).

In 2006, RDFa wasn't invented yet, so I was using plain-Jane RDF/XML syntax and trying to glean some structure from that. It turns out this is a non-starter, because RDF/XML is just a bundle of statements, and there is no acceptable way (to me at least) to signal which ones belong to "the document" you just requested.

Enter RDFa

In 2008, long after I backburner my aforementioned attempt, we get RDFa. Initially, this is an extension to XHTML that enables RDF data to be embedded into a document. Later on it becomes a generic set of attributes which can be tacked on to HTML(5) or any XML vocabulary.

One interesting side effect of RDFa is that it solves the ambiguity problem: Unless otherwise specified, embedded RDF statements are assumed to be about the document in question. Or to straddle both RDF and HTTP terminology, the subject is the Request-URI.

What This Means

I'm looking to RDFa to create machine-readable data objects that also happen to be human-readable Web pages. Moreover, using content-negotiation techniques, I can say something like "the resource at the given URI always has the same meaning, irrespective of its syntax, whether HTML, JSON, Turtle, or RDF/XML."

This enables me to reorient my development targets in terms of discrete resources, or functions that generate discrete resources, and then those resources can be consumed downstream by literally anything, including my own applications. Then the site's user interface can be considered just another application.

This XSLT library, therefore, provides one particular way to implement the application known as the given website's user interface.

Programming Interface

rdfa:object-resources

Given one or more subjects and a predicate, return the object resources (URIs or blank nodes).

rdfa:subject-resources

Same thing, but given an object resource (or bnode), return the subject(s).

rdfa:object-literals

TODO This one is actually going to be tricky because vanilla XSLT 1.0 can only return result sets or strings.

(I'm still not quite sure how this one is going to work yet but it's almost certainly going to involve fishing the values out of a string.)

rdfa:object-literal-quick

Given a subject and predicate (and optional language/datatype), and assuming that you already know through some other mechanism that there is only one statement to this effect, and that the literal isn't an XMLLiteral, return that literal.

(This is so you don't have to fish a single value out of a weirdly-delimited string.)

rdfa:subjects-for-literal

Given a literal (and optional language/datatype) and predicate, return all associated subjects.

rdfa:has-predicate

Given a set of URIs, either subject or object, return the subset of URIs that test positive for a given predicate.

Status/Road map

  • Proof of concept to establish whether the damn thing can be made to work at all,
  • Once it is made to work, try to make it reasonably fast,
  • Sort out an interface which is amenable to the idiosyncrasies of XSLT,
    • Do something smart with lists and other collections
  • Test cases!
  • Documentation!

Scope & Limitations

  • This query engine is XSLT 1.0 (the only XSLT supported by Web browsers), processes only RDFa 1.1, and only (for now) (X)HTML.
  • XHTML input must have its <base> set to an absolute URI, and all relative URIs have to be relative to that address. It also helps if that address is the same as the Request-URI.
  • This thing cannot handle arbitrary relative URIs. This is a necessary tradeoff for making the thing usably fast. It currently checks minimal relative URIs per RFC 3986 as well as RFC 2396, as well as absolute path/query/fragment. Arbitrary ./ and ../ components are no go.
  • You are probably never going to see any kind of inferencing or reasoning with this thing, not without server-side help.
  • Web browsers do not cross domains with XSLT (which is funny when you consider that you can do a lot more damage with JavaScript), so there's that too.

Dependencies

This file relies (for now) on a handful of routines from XSLTSL, which is a useful thing in general.

Copyright & License

Copyright 2016 Dorian Taylor

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

You can’t perform that action at this time.