Skip to content

The Property Graph Model

ahzf edited this page Jun 26, 2011 · 9 revisions

PropertyGraph

[Graphs](http://en.wikipedia.org/wiki/Graph_(mathematics\)) are not only useful for modeling [relations](http://en.wikipedia.org/wiki/Relation_(mathematics\)) between entities easily, but also to store additional data - called properties - within the vertices and edges. This property graph model is not a fundamental new data model, but a higher-order graph model and a collection of best practices for modeling complex related data. By this it tries to simplify the often very difficult task of mapping complex relations within an application domain onto a database model. Apart from minor variations this model has become the de-facto standard for all NoSQL-based graph databases available today.

A Property Graph is a directed, multi-relational and labeled graph consisting of vertices and edges. Both graph elements (the vertices and edges) are first-class citizens of the property graph and can contain properties for storing additional data. The most common characteristics of such a graph are:

  • The properties within the vertices and edges are key-value-pairs of type <String, Object> like e.g. "name": "Alice" or "Age": 20. The value range of the keys and values can be schemaless or defined by a vertex or edge schema. Some implementations like the Boost Graph Library or Blueprints.NET also allow generic property graph, so that the keys and values of the properties can be defined according to application-specific data types.

  • For simplicity and agile development vertices and edges are commonly schemaless. Nevertheless there are good reasons for an application to define the type(s) of vertices and edges in order to describe their semantic meaning, understand this meaning automatically and to have consistency criteria and indices for the data, the relations and even complex subgraphs. Has a vertex or edge a schema or type it is common to use the reserved property key "Type" or "_Type" for storing and setting this information. Especially within the Semantic Web vertices and edges are not limited to have only a single schema, therefore some graph databases support a collection of types for each graph element.

  • Edges are directed, labeled and sometimes typed connections between two vertices. Some graph databases allow to narrow the linking between different vertex and edge types based on a schema (comparable with OWL restrictions in RDF graphs). More than one edge between the same two vertices is often prohibited unless the edges types are different (Multi-edges, Parallel-Edges, Multigraphs). Very often the term "edge label" is used equivalent to the term "edge type" although there are some minor differences. Undirected edges are commonly modeled by using two associated directed edges.

  • Every vertex and every edge has a unique identification. While it is common to use the property key "Id" or "_Id" for referring to the identity of a graph element, the range of the values is very different. A lot of graph databases use Int32, some Int64 or Strings or even generic data types. Databases respecting the Semantic Web also allow to use Uniform Resource Identifiers (URIs).

  • Some graph databases support an explicit revision identification for every change of a vertex or edge. For this the property key "RevisionId" or "_RevisionId" may be used. The values are again very different. Most databases use Integers, but some also support timestamps or vector clocks to provide a safe version control within a distributed system.

  • Some graph databases permit to model the values of a property as a explicit collection of values. This so-called multi-value properties simplify the modeling of complex application domains, but are still not a common feature.