Skip to content
jbrekle edited this page Jan 10, 2012 · 2 revisions

In order to get closer to the performance of relational database-backed Web applications, we developed an approach for improving the performance of triple stores by caching query results and even complete application objects. The selective invalidation of cache objects, following updates of the underlying knowledge bases, is based on analysing the graph patterns of cached SPARQL queries in order to obtain information about what kind of updates will change the query result.

Link to the Project page

The PHP implementation of the cache integrated into the Erfurt layer of OntoWiki. The caching component is part of the latest release and is also used by other web applications build on the Erfurt middleware. This implementation furthermore supports application specific object caching with SPARQL dependencies.

Location of the Cache Frontends

/owRoot/libraries/Erfurt/Erfurt/Cache/Frontend/

Query Cache

For developers, that wanna use the cache here is a short example of how to use only the Query Cache:

$query = "SELECT * WHERE {?s ?p ?o .}"

$queryCache = Erfurt_App::getInstance()->getQueryCache();
$sparqlResult = $queryCache->load( (string) $query, 'plain');
if ($sparqlResult === Erfurt_Cache_Frontend_QueryCache::ERFURT_CACHE_NO_HIT) {
    $startTime = microtime(true);
    $sparqlResult = $this->_sparqlQuery($query);
    $duration = microtime(true) - $startTime;
    $queryCache->save( (string) $query, 'plain' , $sparqlResult, $duration);
}

// do whatever you want with your SPARQL result

If you use OntoWiki and Erfurt SPARQL interfaces, the cache is used by default.

Object Cache

The more interesting part for OntoWiki and Extension developers is that Object Cache. If application object, created on the basis of a SPARQL query result, have to be cached, it is recommended to use that cache. That cache maintains automatically the actuality of object cache elements. If an SPARQL query result is being invalidated because of store changes, the ObjectCache is being notified to invalidate also all object cache entries, that were created on the basis of that newly invalidated SPARQL query. That is possible, if you keep in mind how to store the data in the cache, which is exemplarily visialized on the following depiction.

Diagram

The following example try to explain how to create a set of application objects with the help of the cache, which could maybe be used to create a resource list. Doing it in this way, that resources are only created one time. After the creation they can be received from the cached until they are invalidated because of store changes.

$erfurt = Erfurt_App::getInstance();
$queryCache = $erfurt->getQueryCache();
$objectCache = $erfurt->getCache();
foreach ($uris as $uri ) {
  if ( $resource = $objectCache->load((md5($uri)))) {
      $collection[ $uri ] = $resource;
  } else {
      $queryCache->startTransaction((md5($uri)));

      //that is a very Magic call -> much much SPARQL queries are used to create 
      //that object only by creating it with a given URI
      $resource = new Model_Resource( $uri ) ;

      $collection[ $uri ] = $resource;
      $objectCache->save ( $resource, (md5($uri))) ;
      $queryCache->endTransaction((md5($uri)));
  }
}

Important on that example are two lines of code:

$queryCache->startTransaction((md5($uri)));
...
$queryCache->endTransaction((md5($uri)));

These two lines create the aggretion between the object cache entries and all therefor used SPARQL queries.

In that way it is also possible to make nested calls of start/end-Transactions , maybe to aggregate multiple cached application objects to the same query cache entry.

Configuration

In the OntoWiki default.ini located in folder owRoot/application/config/ exists a section with default values of the cache configuration. Please copy that stuff into your personel OntoWiki config.ini.

;; Erfurt Query Cache
cache.query.enable = true
;; logging is not recommended (performance)
;cache.query.logging = 0
cache.query.type = database ; only database caching at the moment

;; Erfurt Object Cache
cache.enable            = true      ; clear the cache if you switch from 0 to 1!
cache.type             = database   ; database, sqllite

If you want to count how often a cache hit arises or is being invalidated, please enable the cache.query.logging variable

No hit

The Query Cache Frontend returns a constant if no cache hit was found:

const ERFURT_CACHE_NO_HIT = "fae0b27c451c728867a567e8c1bb4e53";

For a SPARQL query with a result, whose hash equals that, you have no caching. TODO: does something like "SELECT 123" return the value itself?