Skip to content
This repository has been archived by the owner on May 24, 2022. It is now read-only.

Versioning

Bo Ferri edited this page Jul 27, 2015 · 3 revisions

Graph Matching

Graph matching algorithms and related topics are an issue that is analysed, evaluated and handled since ages in research and industry. This mendeley group gives the current state overview about our research re. (RDF) graph matching (and related topics). The following papers build (especially) the foundation of our graph delta algorithm:

Here are our pre-conditions:

  • a GDM graph consists of a set of Concise Bounded Description (RDF Molecules, etc.) - these are our resources (i.e. records etc.)
  • a CBD of resource consists of a set of statements, i.e., all statements belong to this resource (however, the subject of the statement doesn't always need to be the resource node, i.e., this is only the case for non-hierarchical resources) and this CBD belongs to certain provenance (i.e. sub graph)
  • each resource is identified by an URI (i.e. there are no root nodes in a CBD that are bnodes)
  • each CBD of a resource is a tree (not a graph)
  • some structure information is already available explicitly, e.g., order property at each statement or node type label at each node (e.g. RESOURCE or LITERAL)

General approach:

  • we should compute the delta on basis of an abstract model (instead of a concrete representation (serialisation))
  • we should primarily choose a structure-based approach (to be able to identify matching bnodes (sub graphs) in a deeper hierarchy of the CBD)
  • we could utilise graph signatures (digests, hashes, ...) as pre-computing step to avoid unnecessary calculations
  • we could utilise object identifier information (or other schema information) to address sub objects (i.e. requires the implementation of the 'key definition' feature)
  • we could utilise hierarchy level information from the nodes

Concrete proposal:

  • we can implement a graph delta algorithm that is based on the signature-based algorithm of (4) (see also BNodeLand) and improve and adapt this algorithm to our GDM and conditions
    • the bnode signature algorithm builds the core component
    • we can add a property for the hierarchy of a statement in a CBD to reduce the size of the comparison set
    • we only need to compare resource by resource disc(i.e. retrieve the existing resource for a concrete provenance from GDBMS and compare it with the new one)

Versioning handling (simplified):

  • add new statements
  • mark removed statement as deleted
  • re-assign versioning pointer (e.g. HEAD)

Implementation

Currently, versioning is divided into 2 parts:

  • one algorithm deals with calculating the delta between two resources in our GDM (see Delta Algorithm)
  • and the other algorithm deals with writing these deltas back to the graph, i.e., do the versioning handling (see Versioning Algorithm)
Clone this wiki locally