Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose AST of Turtle file #3

Open
leostera opened this issue Apr 23, 2019 · 3 comments
Open

Expose AST of Turtle file #3

leostera opened this issue Apr 23, 2019 · 3 comments

Comments

@leostera
Copy link

Hello again!

I'm playing around with the API, trying to parse a small turtle file and get my hands on the AST to do some code generation.

However, I can't seem to extract it from the store after the lagra_parser_turtle_parser has written to it. Do you have any pointers for me to hack on?

Thanks again for the work on lagra 🙌

@darkling
Copy link
Owner

There's no AST built for Turtle or N-Triples/N-Quads. Lagra de-serialises those direct to a triple store, where you can use lagra:find_all_t/2 (or lagra:find_all_q/2) to get triples (or quads) from it. (I'm going to add some prettier helper functions at some point, but those two should be ugly but sufficient for now).

The choice of not generating a complete AST is quite deliberate -- in using most RDF tools, I've found that they have a tendency to load a whole serialised document into RAM, which tends to cause major issues when dealing with large datasets. Part of the design choice is to make it possible to load very large RDF documents and dump the triples in an external data store without having to have insane amounts of RAM. To that end, the Turtle and N-Triples/N-Quads parsers are completely hand-coded and fully incremental parsers, which generate triples one at a time as they are identified, without storing the whole state of the input file as an AST.

At the moment, of course, the only store is the trivial one, which is all in-memory and very inefficient, so I'd expect it to become useless as a result of poor performance well before it becomes useless as a result of insufficient RAM...

@leostera
Copy link
Author

Right! I understand. Do you have any interest in reusing the lexing/parsing to build an AST? I could help contribute that.

But perhaps I am wrong — I am not looking into getting an AST describing particular entities but rather the core entity classes and their relationships. So in general, considering the size of the data represented by an ontology, the ontology itself should be reasonably small.

@darkling
Copy link
Owner

I think I'm confused about what you're trying to do, here.

RDF itself has no concept of an AST. There's a fundamental data model, which is the graph. The graph is typically represented (and manipulated) as a set of triples. There are many different renderings of a graph possible -- multiple different serializations, and each of those can render a graph in many different ways. It's just not useful to talk about the AST of one of those specific serialization formats, unless you're implementing a deserializer.

It sounds more like you want to be able to extract the resources of type rdfs:Class and do stuff with those? Possibly looking at the rdfs:subclassOf relationships between them? You don't need an AST for that -- it's just querying the triples with find_all_t/2...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants