Skip to content

Property graphs

Joshua Shinavier edited this page Apr 6, 2023 · 3 revisions

Property graphs in Hydra

Hydra traces its origins to Algebraic Property Graphs (APG), and the Dragon implementation of APG. There is significant potential to apply the APG approach to TinkerPop now that there is open-source tooling available.

APG, like Hydra's formal model, is a hypergraph data model which generalizes property graphs from loosely-typed vertices and binary edges to strongly-typed vertices and hyperedges (i.e. edges which may have other projections apart from "out" and "in"). This helps it bridge the gap between typical record-shaped enterprise data (expressed in languages like SQL, Avro, or Protobuf) and graphs with a minimum of domain-specific mapping.

However, in order to support typical property graphs including those supported by the current version of Apache TinkerPop, we do need a mapping language; n-ary records need to be mapped down to binary edges. Such a mapping language is currently being developed for Hydra, and is captured in hydra/langs/tinkerpop/mapping. The main design considerations have been that:

  • Mappings are expressed as annotations on source schemas
  • Elements in the source shall be treated as property graph elements in the target I.e. we avoid complicated n-to-n mappings of source types to target types, instead treating the source data as if it were already a property graph as in APG.

An example of these annotations in the context of an Avro schema can be seen here, and example JSON data is here.

Annotation keys

There are several annotation keys, and their names are configurable. The default names are as follows:

  • @label (vertex): provides the label for the annotated element as a vertex
  • @label (edge): provides the label for the annotated element as an edge. The presence of other properties is used to distinguish edges from vertices if the @label keys are the same, as they are by default.
  • @id (vertex): provides a value specification (see below) for the id of the annotated element as a vertex
  • @id (edge): provides a value specification (see below) for the id of the annotated element as an edge
  • @key: provides the key for an annotated field as a property
  • @value: provides a value specification for an annotated field as a property
  • @outVertex: provides an incident vertex specification (see below) for the out-vertex of the annotated element as an edge
  • @inVertex: provides an incident vertex specification for the in-vertex of the annotated element as an edge
  • @outEdge: provides an incident edge specification for an out-edge of the annotated element as a vertex
  • @inEdge: provides an incident edge specification for an in-edge of the annotated element as a vertex
  • @outVertexLabel: provides a label for the out-vertex of an edge where this cannot be inferred
  • @inVertexLabel: provides a label for the in-vertex of an edge where this cannot be inferred
  • @outEdgeLabel: provides a label for an out-edge from a vertex
  • @inEdgeLabel: provides a label for an in-edge to a vertex
  • @ignore: specifies that an annotated field should be ignored, rather than treated as a property

Annotation format

Values

Vertex and edge ids are given by patterns like the following:

  • ${}: the value of the annotated field
  • ${path/to/value}: a simple path pattern, the steps of which are field names
  • prefix[pattern]suffix: a prefix or suffix appended to another pattern

For example, if a string-valued Avro field airportId is annotated with "@outVertex": "${}", this means that the entire value of airportId for a given record, like "KSJC", is the out-vertex id. If the field is annotated with "@outVertex": "airport-${}", then the out-id is "airport-KSJC". If the field is annotated with "@outVertex": "airport-${info/identifiers/icao}", then we expect there to be a field named info under the annotated field, whose type has a field named identifiers, whose type has a field named icao, which we use to construct the out-vertex id.

The ${} pattern produces a native Hydra term according to the source schema, which might be a string value, an integer value, or even a record or other complex term. The property graph schema (see below) which you supply in your application determines how these terms are converted to the appropriate data type for the application. When a pattern includes a prefix or suffix, the resulting term is always a string, and the terms produced by ${...} are converted to strings in a canonical way, based on JSON serialization.

Vertices and edges

Any record type annotated with @id is taken to the the source type for a vertex or edge type. The TinkerPop coder looks for additional annotations (indicating incident vertices or edges) to determine whether the records of that type should be treated as vertices or edges. While @id is mandatory, @label may be omitted, in which case the vertex or edge label is taken from the name of the source type.

Incident vertices

Incident vertex (out-vertex or in-vertex) specifications are simply value specifications for the vertex ids. If a plain path (like ${} or ${nearestCity}) is provided, then Hydra will resolve the path and see whether the type it resolves to is annotated as a vertex (with @id or @label). If so, then it will pick up the id and label from the annotations. If not, then it will treat the whole type as an id, and look for @outVertexLabel or @inVertexLabel as the label of the incident vertex.

Incident edges

Incident edge (out-edge or in-edge) specifications are handled similarly to incident vertex specifications. An out-edge specification starts with a known out-vertex and parses the value specification for the in-vertex, picking up the @id and @label annotations from the source type of the vertex if they are available, or looking for @inVertexLabel (for an out-edge) or @outVertexLabel (for an in-edge) if the value resolves to an id value rather than an annotated record type. There is currently no specification for properties of such edges.

Properties

By default, all of the fields of a record type which are not annotated with the keys given above, are assumed to be vertex or edge properties. Similar to vertex and edge labels with respect to the @label annotation key, the property key is based on the name on the field unless @key is provided explicitly. For example, a field named airportCode might be annotated with "@key": "icao" to indicate that the resulting property should be called icao, not airportCode.

The @value annotation allows the value of a field to be turned into a property value in the same way that identity-giving fields are turned into ids (using a value specification).

The @ignore annotation (with any value) tells the coder to ignore a given field, rather than treating it as a property.

Schemas

A schema, in this limited context of property graph mappings, is a helper object which provides the following information:

  • Application-specific names for each of the annotation keys described above
  • A coder (bidirectional encoder/decoder) for vertex ids, mapping Hydra terms to and from application-specific ids
  • A coder for edge ids (analogous to vertex ids)
  • A coder for property types, mapping Hydra types to and from application-specific property types
  • A coder for property values, mapping Hydra terms to and from application-specific property values

The definition of the Schema type can be seen in the TinkerPop mappings module.