Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShEx support #23

Open
ericprud opened this issue Mar 12, 2019 · 6 comments
Open

ShEx support #23

ericprud opened this issue Mar 12, 2019 · 6 comments
Labels

Comments

@ericprud
Copy link

Feel like working with me on ShEx support?

@drobilla
Copy link
Owner

Sure, though I'm not really familiar with ShEx at all. I will read up and get a better idea of how this might fit into serd. Were you just thinking of syntax support, or...?

@ericprud
Copy link
Author

ericprud commented Mar 14, 2019

I was thinking of a full implementation. it's not a ton of code and I already have the yacc. Though I guess you don't already have yacc and bison in your build dependencies.

@drobilla
Copy link
Owner

Yeah, there are no dependencies at all. Some of the unique things about serd stem from it being a hand hacked parser, which is pretty tedious to write and maintain but lets me control everything.

Are you thinking ShexC? That would probably be quite some work, but ShexJ would be easier since I already have JSON reading code lying around (even though JSON-LD isn't in master yet... herculean effort, that one, and serd could only ever support a subset since the spec doesn't allow streaming. Oh well)

@drobilla
Copy link
Owner

Although that makes me realize an important question: can Shex be parsed as a stream (i.e. emitted as a sequence of statement(s, p, o) calls in the same order they are found in the document) without significant readahead? Serd is fundamentally based on this, things that can't stream don't really fit.

(Sorry if this is obvious, I haven't found the time to read the spec in detail yet)

@ericprud
Copy link
Author

ShExJ makes a lot of sense. There are plenty of tools to convert between ShExC and ShExJ if folks want to work in ShExC.

Re streaming, I guess everything is stream-able if you are willing to buffer enough. I believe @iovka and Jérémie Dusart are working on something related to this. The challenge is that validation is typically top-down, e.g. you start by validating <Obs1> as <ObservationShape>. In the process of that, you must then validate <Patient2>@<PatientShape>. The big challenge is: at what point do you decide you've seen all of the triples related to <Obs1> or <Patient2>?

This is similar to the problem of serialization; at some point you decide that you're not waiting for more triples from some node and you go ahead and write a . or ]. (Making a bad call doesn't seem as dire in serialization because you can always write a node out again, but that's not true of an anonymous blank node.) While we can construct screw cases, I expect we can address a lot of bulk-validation use cases with some heuristics to say when we assume we have all arcs out of node. Particularly easy would be nested anonymous BNodes such as what you see in FHIR/RDF.

@drobilla
Copy link
Owner

I think needing a model for validation itself is fine, and assumed that'd be the case, though streaming validation would be awesome if possible.

To support reading Shex* and writing the corresponding Turtle (or building a model out of it), though, that would need to be streamable. Essentially in order to parse a file, serd needs to be able to spit it out as triples as it goes. Seems like this should be possible here (maybe with some restrictions on key order, as it goes with JSON-LD, but I'm not sure in this case).

I imagine it would look something like parsing the ShEx file into a model, then having a function that takes that, and a data model, and validates one against the other (or, alternatively, just mash them all in the same model if that makes sense for ShEx).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants