Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RDFJS support (and update N3.js) #35

Closed
mielvds opened this issue Dec 7, 2018 · 22 comments
Closed

Add RDFJS support (and update N3.js) #35

mielvds opened this issue Dec 7, 2018 · 22 comments

Comments

@mielvds
Copy link
Collaborator

mielvds commented Dec 7, 2018

What your take on this?

I guess this would mean changing the C++ code to return a different object structure and change addTriples to addQuads in bin/hdt, but not much more.

@RubenVerborgh
Copy link
Member

All for it.

Not sure if we want to do it from C++ or JavaScript. I think the latter, so we can pass in different libs.

@mielvds
Copy link
Collaborator Author

mielvds commented Dec 7, 2018

True. But this would mean that, in JS, we have to loop over the result and convert the triples with for example rdf-string. Or is that not the case?

@RubenVerborgh
Copy link
Member

Yeah we want to avoid that. I think the C code should emit monomorphic structured objects of the shape { termType: int, value: string, type: string, language: string }.

@mielvds
Copy link
Collaborator Author

mielvds commented Jan 25, 2019

@rubensworks has made a valid remark about the equals()method. How can we create this one without looping?

@rubensworks
Copy link
Member

@rubensworks has made a valid remark about the equals()method. How can we create this one without looping?

We could achieve this by injecting a Prototype.

However, we may want to consider looping after all, as this would allow us to use custom Data Factories.

@mielvds
Copy link
Collaborator Author

mielvds commented Jan 25, 2019

maybe something along these lines: https://github.com/nodejs/nan/blob/master/doc/methods.md#api_nan_set_prototype_method?
See: https://github.com/nodejs/nan/blob/master/doc/object_wrappers.md

Of course, we don't want c++ everytime equalsis called.

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Jan 25, 2019

@mielvds That's not right, that would be
https://github.com/nodejs/nan/blob/master/doc/maybe_types.md#api_nan_set_prototype

So we just pass in the prototype (maybe on init), and then set it on created objects.

@mielvds
Copy link
Collaborator Author

mielvds commented Jan 25, 2019

yes, that's it!

@LaurensRietveld
Copy link
Collaborator

@mielvds @rubensworks @RubenVerborgh
We'd like to take a stab at this. Just a summary of the above + some questions:

All for it.

Not sure if we want to do it from C++ or JavaScript. I think the latter, so we can pass in different libs.

I've got the impression we've reached consensus about doing this in C++ to avoid looping over the results. I assume that means that we won't support passing different datafactory libs?

Yeah we want to avoid that. I think the C code should emit monomorphic structured objects of the shape { termType: int, value: string, type: string, language: string }.

I might be missing something here. The above is not rdfjs right? Shouldnt termtype be NamedNode, BlankNode or Literal?

Related to this: we'll have to do some term parsing in c++ to extract the language and type of literals. I'm assuming that using a full-fledged parser (such as serd) is overkill and would only add an extra HDT-Node dependency. Instead, we should parse the term ourselves, similar to how we do that here (only simpler, considering we can assume the term is valid)

@mielvds That's not right, that would be
https://github.com/nodejs/nan/blob/master/doc/maybe_types.md#api_nan_set_prototype

So we just pass in the prototype (maybe on init), and then set it on created objects.

Nice, good to see this as a reference. In our case, the equals method for terms should look something like this:

function equals(lhs,rhs) {
    return lhs.termType === rhs.termType && lhs.value === rhs.value && lhs.language === rhs.language && lhs.type === rhs.type
}

Considering we're returning quads, we'd need to take care of returning RDFJS quads as well.
I.e., the Term prototype should also include a DefaultGraph term next to the discussed equality function.

Additionally, we need to pass a prototype for the quad itself, with its own equality function.

@RubenVerborgh
Copy link
Member

I might be missing something here. The above is not rdfjs right? Shouldnt termtype be NamedNode, BlankNode or Literal?

It's not; the idea was to communicate an array of primitive types from C to JS, such that we don't repeatedly need to invoke JS constructors from C.

@LaurensRietveld
Copy link
Collaborator

I might be missing something here. The above is not rdfjs right? Shouldnt termtype be NamedNode, BlankNode or Literal?

It's not; the idea was to communicate an array of primitive types from C to JS, such that we don't repeatedly need to invoke JS constructors from C.

What would be the argument against having an interface like this? { termType: string, value: string, type: string, language: string } (where termType is either the string 'NamedNode', 'BlankNode' or 'Literal')

@rubensworks
Copy link
Member

I've got the impression we've reached consensus about doing this in C++ to avoid looping over the results. I assume that means that we won't support passing different datafactory libs?

👍

If consumers still want a custom datafactory, they can always just loop afterwards themselves.

@RubenVerborgh
Copy link
Member

Yeah, but the idea is not the pass the returned C object directly to consumers. It's not a valid RDF/JS object currently. Rather, the values returned from C would be used in JS-land to properly instantiate objects with the right factory. And for that, an int is faster.

@mielvds
Copy link
Collaborator Author

mielvds commented Dec 16, 2019

Can nan do this instantiation (via the a.o. set_prototype), or should we use some kind of proxy object?

@LaurensRietveld
Copy link
Collaborator

Yeah, but the idea is not the pass the returned C object directly to consumers. It's not a valid RDF/JS object currently.

What is the downside of returning an RDF/JS object from C instead? (given that the equals function is defined in JS)

Rather, the values returned from C would be used in JS-land to properly instantiate objects with the right factory. And for that, an int is faster.

Were you thinking about something in particular for instantiating these objects?
Looping is out of the question I assume. Alternatively, would defining getters for subject/predicate/object work, where the getters use a factory of the user's choosing, and pass these getters on to C so that they can be bound to the returned triples?

@RubenVerborgh
Copy link
Member

What is the downside of returning an RDF/JS object from C instead? (given that the equals function is defined in JS)

How will we attach that equals function?

Were you thinking about something in particular for instantiating these objects?
Looping is out of the question I assume.

Looping actually; should not be that expensive given that the objects are perfectly predictable. Perhaps an array of 4 elements could even be better; we'd need performance measurements (should performance be an issue).

pass these getters on to C

It's the JS/C crossings that are expensive and should be avoided.

@LaurensRietveld
Copy link
Collaborator

What is the downside of returning an RDF/JS object from C instead? (given that the equals function is defined in JS)

How will we attach that equals function?

Isnt that what you discussed here? #35 (comment)

@RubenVerborgh
Copy link
Member

That might do it indeed; but I have concerns about the performance of mixed C/JS-instantiated objects. But this could be the easiest route to try.

That said, it would not be compatible with arbitrary RDF/JS factories. But good enough to start, and see what baseline performance this affords us.

@mielvds
Copy link
Collaborator Author

mielvds commented Dec 18, 2019

Not sure if the following useful, but I mention it here just in case.
By using NodeJS Buffers, C++ and V8 can share the same memory and transfer object ownership to each other. This saves the memory and performance hit from having to copying data from C++ memory cells to V8 memory cells.

@LaurensRietveld
Copy link
Collaborator

That might do it indeed; but I have concerns about the performance of mixed C/JS-instantiated objects. But this could be the easiest route to try.

That said, it would not be compatible with arbitrary RDF/JS factories. But good enough to start, and see what baseline performance this affords us.

If what you're looking for is a baseline, and you suspect looping is more efficient than creating these objects in c, wouldn't the easiest baseline be an implementation where we only touch the js part? Ie, create the terms and quads by looping over the results, and extract the literal datatype/language tags in js?

@RubenVerborgh
Copy link
Member

Actually… yes 🙂

@rubensworks
Copy link
Member

I guess this issue can be closed now that #44 has been merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants