michaeldiamond edited this page May 23, 2012 · 1 revision

What is ActiveRDF, and why should I care?

What is ActiveRDF?

ActiveRDF is a library that exposes arbitrary RDF data as Ruby objects. You, as programmer, manipulate Ruby objects (you invoke methods on them), and in the background ActiveRDF maps these method calls to queries/updates on the RDF data. Why should I use it?

Most programmers are used to object-oriented programming. ActiveRDF objects are very natural to deal with, you don’t have to think about the fact that you deal with RDF. The ActiveRDF objects also exposes many useful helper functions to look up or manipulate data.

Can I see some examples?

Sure, the following piece of code opens a connection to a SPARQL datasource on the web which contains some people. It looks up all people, and for each person it prints out their age.

require 'active_rdf'
ConnectionPool.add_data_source :type => :sparql, :url => "", :results => :sparql_xml 

# we register a short-hand notation for the namespace used in this test data 
Namespace.register :test, ''

all_people = TEST::Person.find_all

# print all the people, and their friends
all_people.each do |person|
  puts "#{person} is #{person.age} years old" 

How do I get started?

Check out our GettingStartedGuide, which shows you:

  • how to install ActiveRDF and its Adapters
  • how to read and write to RDF triple stores
  • how to create a Ruby on Rails application that gets its data from a triple store

When should I use it?

Whenever you use Ruby and you want to manipulate RDF data. For example, you are building a Semantic Web application. Or you want to play around a bit with RDF. Or you got a large RDF file and you need to find all properties mentioned in it. Any scenario that involves Ruby and RDF is suitable to ActiveRDF (at least, that’s what we believe).

Is ActiveRDF slow?

No. It depends on your usage. ActiveRDF uses a completely transparent proxy pattern: it only translates your method calls into database queries, each method call usually involves only one or two database queries. If you do many many object manipulations in a row, that will take some time. But it would also take time to do those manipulations on the database directly. The overhead of ActiveRDF in translating method calls to queries is really minimal, in most cases.

Can I use ActiveRDF today?

Yes. Definitely. Previous versions of ActiveRDF were a bit unpredictable because the code was very large and complicated. We have now spent considerable time reducing the complexity of the code, which resulted in a quite minimal framework. And less code usually means less bugs.

Does it work on Windows?

Yes. All of ActiveRDF works on Windows. Maybe some datastores are not so easy to get running on Windows (e.g. the Redland Windows port is a bit behind of the main development) but those are not part of ActiveRDF.

How is ActiveRDF distributed?

ActiveRDF is distributed as a gem. This means that you can install it on any Ruby system with rubygems with “gem install activerdf”. rdflite (our own lightweight RDF store) is also distributed as a gem, more specifically as a gem-plugin to ActiveRDF. If you install it, ActiveRDF will find it automatically and you will be able to use it.

Which datastore should I use?

That depends on your usecase. If you need only read access, you can use any store that supports SPARQL. For write support most stores (except YARS) have no HTTP interface, so we need a native bridge to their API (e.g. rjb or swig). Such native bridges are more involved to setup.

Several questions are relevant when deciding on an RDF store:

  • if you need simple installation on any Ruby system, use rdflite.
  • if you need online reasoning support use Sesame2 or Jena.
  • if you want reasoning, but you can do offline processing, you can compute the transitive closure of your data (with e.g. Sesame2) and load the extented data into a datastore without reasoning.
  • if you need fast query results with very large amounts of data, use YARS or Jars2.
  • if you need full-text search, use YARS or rdflite.
  • if you need a lightweight in-process library, use Redland or rdflite.
  • if you need a server that other people/processes can access as well, use Jena (with Joseki), Sesame2, or YARS.

Does ActiveRDF support OWL?

That depends on the datastore used. ActiveRDF does not do any reasoning itself, neither RDFS nor OWL (see below for our reasons) but instead relies on the reasoning of the underlying datastore. If your datastore does not support OWL reasoning you can of course still access your data (it’s all RDF), but you won’t get all expected inferences. Bear in mind that ActiveRDF does not enforce any OWL constraints, such as cardinality constraints. If you want that, write an ActiveRDF extension.

How about Rails?

ActiveRDF works great with Rails. You put some lines of code in your environment.rb to load ActiveRDF and setup some connections. Then you can let ActiveRDF automatically construct classes from the RDF data and use those classes as models in Rails. We are happily using ActiveRDF with Rails in several of our projects, but we are also still actively working on a more smooth integration (especially allowing existing Rails/ActiveRecord plugins to work with ActiveRDF where that makes sense). If you have ideas for improvement, please let us know.

Why do you and why don’t you …

Why don’t you support reasoning?

That is a conscious decision. We see ActiveRDF as an access layer on top of a database. The database should answer all queries, we only transform those questions to something that you can use easily (i.e. Ruby objects). We think it is inefficient to reason on our level, and that reasoning should take place inside the database, namely when query answers are computed. We could implement backward-chaining reasoning using query rewriting (before we send the query to the database) but that is not so easy.

Why don’t you support blank nodes?

Blank nodes are very difficult to handle, because they have no identity. Since they have no identity, there is no standard way to address them, or to “refind” them. An ActiveRDF object that represents an ordinary RDF resource rewrites method calls into queries using the object URI. For example, eyal.age could be rewritten into “select ?o where deri:eyal foaf:age ?o.”.

However, if eyal is a blank node, it does not have a URI, so query rewriting does not work as easily. The only way to get to the blank node (and then ask for the target of the outgoing arc “age”) is through some path to it. The path to a blank node depends on the blank node. If an OWL inverse functional property (IFP) for the blank node is known, that IFP provides us the path. For example, if we know the email-address of eyal, we can rewrite “eyal.age” into “select ?o where _:1 foaf:email eyal:email . _:1 foaf:age ?o”.

But we do not always know such OWL IFPs, and then the access path is not guaranteed to lead to a single node. Take for example “eyal.knows”: it is an access path to a set of blank nodes, but there is no way (if we lack an IFP) to specify the path to one specific node in this set.

Why do you use Ruby?

Because it is a great and very dynamic language. Ruby metaprogramming makes most of ActiveRDF quite straightforward (although we change the behaviour of the builtin type system quite radically, which would be much harder in other languages). Plus, with Rails we have a great framework for web development.

Other questions

Are there any similar frameworks out there?

RDF.rb is another Ruby implementation for RDF. There is also an object mapper similar to ActiveRDF called Spira for the RDF.rb library.

Do ActiveRDF objects conform to RDFS semantics?

Yes. And when they don’t, that’s a bug, and we’ll fix it. What do we currently do: objects can have multiple types (or multiple classes, in Ruby). So when you ask “eyal.class”, it might return “[RDFS::Resource, FOAF::Person, FOAF::Agent, ...]”. Then related, when you ask “eyal.is_a? FOAF::Person” the answer is “true”.

Furthermore, classes can sit in a multi-line hierarchy of superclasses. If you invoke a method on an instance, we will look for that method in all its superclasses (according to the datastore, so including the superclasses that are not in a linear inheritance hierarchy). The only (open) problem, is what we do when we find the method in several classes that do not have a superclass relation to each other. Normally one takes the implementation in the most specific class, but in a multi-line hierarchy “most-specific” does not always exist: is Person more specific than Book?

If you find other RDFS things that we don’t support, tell us.

What should I do if my question isn’t answered?

  • if you know the answer, add your question and answer to this list: it’s a wiki!
  • if you don’t know the answer, ask it on the mailing list.