Skip to content

Latest commit

 

History

History
265 lines (183 loc) · 16 KB

reading_rdf.md

File metadata and controls

265 lines (183 loc) · 16 KB

Reading RDF with dotNetRDF

One of the main things you'll want to do when working with RDF is to be able to read it in from Files, URIs and other sources in order to work with it using dotNetRDF. All the classes related to this are contained within the VDS.RDF.Parsing namespace. So when you want to read RDF you'll need the following statements at the start of your code file:

using VDS.RDF;
using VDS.RDF.Parsing;

dotNetRDF currently supports reading RDF files in all of the following RDF serialisations:

  • NTriples
  • Turtle
  • Notation 3
  • RDF/XML
  • RDF/JSON (Talis Specification)
  • RDFa 1.0 (Limited RDFa 1.1 support)
  • TriG (Turtle with Named Graphs)
  • TriX (Named Graphs in XML)
  • NQuads (NTriples plus Context)
  • JSON-LD (1.0 and 1.1)

Several of these serialisations have multiple variants of them with differing syntax rules. For a complete summary of the formats supported for writing with dotNetRDF see Formats Supported By dotNetRDF.

Graph Parsers

Graph Parsers implement the IRdfReader interface which defines a Load(…) method which takes an IGraph or a IRdfHandler and either a TextReader, StreamReader or a String. Basic usage is as follows:

IGraph g = new Graph();
IGraph h = new Graph();
TurtleParser ttlparser = new TurtleParser();

//Load using a Filename
ttlparser.Load(g, "Example.ttl");

//Load using a StreamReader
ttlparser.Load(h, new StreamReader("Example.ttl"));

While the above is a slightly contrived example you'll note that parsers are reusable, once instantiated you can use them as many times as you need. Another useful feature is that parsers are designed to be thread safe so multiple threads can use the same instance of a parser to parse different inputs simultaneously without interfering with each other.

Parsers are typically capable of throwing RdfParseException and RdfException so you should always use try/catch blocks around Parser usage e.g.

try 
{
  IGraph g = new Graph();
  NTriplesParser ntparser = new NTriplesParser();

  //Load using Filename
  ntparser.Load(g, "Example.nt");
} 
catch (RdfParseException parseEx) 
{
  //This indicates a parser error e.g unexpected character, premature end of input, invalid syntax etc.
  Console.WriteLine("Parser Error");
  Console.WriteLine(parseEx.Message);
} 
catch (RdfException rdfEx)
{
  //This represents a RDF error e.g. illegal triple for the given syntax, undefined namespace
  Console.WriteLine("RDF Error");
  Console.WriteLine(rdfEx.Message);
}

Reading RDF from Common Sources

Often it is not necessary to invoke a parser directly since you can use a helper class to achieve the same effect without having to create the appropriate parser yourself, the following subsections detail available helper classes for reading RDF.

Note that several of the sources detailed here also have helper Extension Methods that can be used to further simplify the code examples shown here.

Reading RDF from Files

If you just want to quickly read RDF from a file without having to decide which parser you need you can use the static FileLoader class which provides a Load(IGraph g, String file) method:

IGraph g = new Graph();
FileLoader.Load(g, "somefile.rdf");

The FileLoader will try to select the correct Parser based on the file extension of the file if it corresponds to a standard file extension, if this is not possible it will use the StringParser class which attempts to detect the format using simple heuristics.

You can also force the loader to use a specific parser by using the 3 argument form [Load(IGraph g, String file, IRdfReader parser)](xref:VDS.RDF.Parsing.FileLoader.Load(VDS.RDF.IGraph,System.String,VDS.RDF.IRdfReader).

Reading RDF from URIs

Often you will want to read some RDF from a URI, to do this we provide the Loader class which provides a LoadGraph(IGraph g, Uri u) method.

IGraph g = new Graph();
Loader loader = new Loader();
loader.LoadGraph(g, new Uri("http://dbpedia.org/resource/Barack_Obama"));

The Loader class uses an HttpClient instance to make web requests, and provides a variant constructor that allows you to pass in the configured HttpClient instance to use.

The LoadGraph method will automatically select the correct Parser to use based on the returned Content-Type header of the HTTP Response. In addition to the normal errors thrown by parsers the Loader may also throw a RdfException if the input URI is not valid or an HttpRequestException if an error occurs in retrieving the URI using HTTP.

You can also force the loader to use a specific parser by using the 3 argument form LoadGraph(IGraph g, Uri u, IRdfReader parser). The class also provides async variants of these methods.

By default both the .NET HttpClient and the dotNetRDF Loader class support following HTTP redirects. Due to some restrictions and cross-platform differences with when the .NET HttpClient will automatically follow redirects, the Loader class implements its own support for following redirects in addition to the redirects followed by the HttpClient. This additional redirect handling can be disabled by setting the FollowRedirects to false. To completely disable all automatic redirects, you must also pass in an HttpClient instance that is configured to not follow redirects as follows:

// Create an HttpClient configured to not follow redirects
HttpClient noRedirectClient = new HttpClient(
  new HttpClientHandler(){ AllowAutoRedirect = false});
// Create a Loader also configured to not follow redirects.
Loader loader = new Loader(noRedirectClient) { FollowRedirects = false };

Warning

Prior to dotNetRDF 3.0, this functionality was provided by the static UriLoader class, which was implemented using the older System.Net.HttpWebRequest API. This class has been retained with the 3.0 release, but is now considered obsolete and code should be updated to use the Loader class instead.

Reading RDF from Embedded Resources

If you choose to embed RDF files in your assemblies you can read RDF from these using the static EmbeddedResourceLoader class which provides a Load(IGraph g, String resource) method.

Graph g = new Graph();
EmbeddedResourceLoader.Load(g, "Your.Namespace.EmbeddedFile.n3, YourAssembly");

Note that the Resource Name must be an assembly qualified name. Like the other loaders this attempts to select the correct Parser based on the resource name.

You can also force the loader to use a specific parser by using the 3 argument form [Load(IGraph g, String resourceName, IRdfReader parser)](xref:VDS.RDF.Parsing.EmbeddedResourceLoader.Load(VDS.RDF.IGraph,System.String,VDS.RDF.IRdfReader).

Reading RDF from Strings

Occasionally you may have a fragment of RDF in a string which you wish to parse. To do this you can use the static StringParser class and it's Parse(IGraph g, String data) method.

Graph g = new Graph();
StringParser.Parse(g, "<http://example.org/a> <http://example.org/b> <http://example.org/c>.");

The StringParser uses some simple heuristics to try and determine the format of the RDF fragment which is passed to it.

Reading RDF from String (Alternate Method)

Alternatively since you can read RDF from any TextReader you can simply invoke a parser on a String directly using a StringReader e.g.

Graph g = new Graph();
NTriplesParser parser = new NTriplesParser();
parser.Load(g, new StringReader("<http://example.org/a> <http://example.org/b> <http://example.org/c>."));

This is roughly equivalent to how the StringParser works internally except this method requires you to know the format of the RDF in advance.

Reading RDF as a stream

Sometimes you may wish to read RDF in a stream oriented fashion, please see the Advanced Parsing section of this page for how to do that.

Serialization Variants

Several of the supported RDF serialisations have multiple variants of them with differing syntax rules.
Where multiple variants are supported dotNetRDF will default to accepting the most recent supported variant for input but will use the most conservative format for output (this is often, though not always the oldest variant).

However in some cases you may want to directly decide which syntax variant you use, in this case typically you construct a parser and provide a value from the relevant syntax enumeration e.g.

// Create a NTriples parser that uses the older stricter syntax
NTriplesParser parser = new NTriplesParser(NTriplesSyntax.Original);

Consult the documentation for a specific parser to see if multiple serialisation variants are supported.

Parser Configuration

Some Parsers have additional configuration which can be used to change their behaviour. For example if a Parser implements the ITraceableTokeniser interface then the TraceTokeniser property can be used to ask for Tokeniser Trace to be output to the Console. Similarly if it implements ITraceableParser then the TraceParsing property can be used to ask for Parsing Trace to be output. These features are often useful when debugging to discover why an RDF document is failing to parse since you can see how the input is being tokenised and parsed.

Additionally some Parsers allow you to instantiate them with a TokenQueueMode. This controls the type of queue used in the tokeniser process and can potentially affect the speed of parsing (though in most cases there is minimal difference). The available modes are:

Queue Mode Queue Behaviour
TokenQueueMode.QueueAllBeforeParsing The entire file is tokenised before parsing commences
TokenQueueMode.SynchronousBufferDuringParsing The file is tokenised as parsing proceeds, a limited number of Tokens are generated and buffered each time the parser asks for a Token.
TokenQueueMode.AsychronousBufferDuringParsing The file is tokenised in the background while parsing proceeds, if the parser asks for a Token and the tokeniser has yet to produce enough Tokens then the parser must wait for a Token to become available.

Store Parsers

Store Parsers differ from Graph Parsers in that the input they parse may contain multiple Graphs and so their output is actually a Triple Store rather than a single Graph. You will often see Store Parsers referred to as RDF Dataset parsers.

Store Parsers implement the IStoreReader interface which defines a similar to the IRdReader interface takes a ITripleStore or IRdfHandler and either a TextReader, StreamReader or a String. A Store Parser can be used as follows:

TripleStore store = new TripleStore();
TriGParser trigparser = new TriGParser();

//Load the Store
trigparser.Load(store, "Example.trig");

As with Graph Parsers various exceptions may be thrown.

Reading RDF Datasets from Common Sources

Similarly to Graph Parsers the Store Parsers can all be invoked indirectly with various methods of the EmbeddedResourceLoader, FileLoader, StringParser and Loader classes. See the API documentation for those classes for the relevant overloads.

Advanced Parsing

The examples we've shown so far all use an abstracted parsing model where you parse directly to a IGraph or ITripleStore. The downside of this is that you have to wait for your entire parsing operation to complete before you can work with the parsed data and that for very large inputs this can either take a substantial amount of time or exhaust available memory.

Behind the scenes our parser subsystem is actually fully stream based and is exposed to you via the IRdfHandler based overloads of relevant methods. This allows you much greater control over what is done with parsed data such as processing it in a stream oriented fashion.

In the examples given so far you have only been able to parse RDF into either a IGraph or a ITripleStore instance but this is not your only option. You can use a IRdfHandler to control explicitly what happens with the RDF you are parsing. Using any of the included implementations will require you to add the following using statement:

using VDS.RDF.Parsing.Handlers;

For example you may only want to count the Triples and not care about the actual values so you could use a CountHandler to do this:

//Create a Handler and use it for parsing
CountHandler handler = new CountHandler();
TurtleParser parser = new TurtleParser();
parser.Load(handler, "example.ttl");

//Print the resulting count
Console.WriteLine(handler.Count + " Triple(s)");

There are a variety of included implementations which do various things like redirecting Triples directly to a file, native Triple Store etc. You can also implement your own either entirely from scratch or just derive from BaseRdfHandler like our own implementations do to get most of the implementation for free.

Take a look a the Handlers API for more discussion on this topic.

Parser Behaviour

All the Graph Parsers provided in the library behave as follows:

  • File/Stream Management:
    • In the event of an error during Parsing the file/stream being Parsed will be closed
    • On successful completion of Parsing the file/stream being Parsed will be closed
  • If Parsing fails the Graph will not contain any of the Triples successfully parsed prior to the failure
  • If a Parser is asked to parse into a non-empty Graph then the Parser will first parse into an Empty Graph and then Merge that Graph with the provided Graph.

All the Store Parsers provided in the library behave as follows:

  • File/Stream Management as Graph Parsers
  • If the Parser produces a Graph which already exists in the destination Store then an error may occur depending on how that Store behaves when a Graph already exists

Parser Classes

These are the standard parser classes contained in the Library:

Parser Class Supported Input
JsonLdParser JSON-LD
NTriplesParser NTriples
Notation3Parser Notation 3, Turtle, NTriples, some forms of TriG
NQuadsParser NQuads, NTriples
RdfAParser RDFa 1.0 embedded in (X)HTML, some RDFa 1.1 support
RdfJsonParser RDF/JSON (Talis specification)
RdfXmlParser RDF/XML
TriGParser TriG
TriXParser TriX
TurtleParser Turtle, NTriples