Glossary

Bo Ferri edited this page Sep 25, 2015 · 36 revisions

Attribute

An attribute is a property or a type of a Relationship, e.g., “name”, “knows”. Attributes can be chained into an Attribute Path. Attributes can be part of a Vocabulary (or a Schema (via Attribute Paths)). In a simple (natural language) sentence (Statement) that consists of three parts (subject, predicate, object), e.g., “Walter knows Claire.”, attributes are predicates. In a Property Graph attributes are properties or Edges of a Node.

  • Hint: In non-hierarchical data structures like CSV files, the attribute corresponds to the attribute path
  • alternative names: property, relationship type, edge type, edge label, predicate

Attribute Path

An attribute path is an identifier or way to a certain element(s) or value(s) (e.g. Literals) of a Record in a Data Model. Attribute paths are an ordered lists of Attributes. A bunch of attribute paths can be composed to a Schema. A Mapping can refer to certain attribute paths.

  • This might be a column header in a CSV file or a certain node in a nested XML structure
  • Examples: /dc:creator/foaf:family_name (to address attributes in an RDF graph), /autor/name (to address attributes in an XML file) or /name (to address “flat” attributes – the column headers – in a CSV file)
  • Note: attribute paths are not always a unique identifier or way to certain element(s) or value(s); sometimes more complex Filters are necessary to address element(s) or value(s) uniquely (or more precise).
  • alternative names: property path

Blank Node

A blank node or bnode is an anonymous resource that belongs to another resource and has no own identifier, e.g., a MABXML ‘Datensatz’ consists of parts that are called ‘Feld’ – those parts can be represented as blank nodes. Blank nodes can be part of Statements within Records.

Back Office

The web-based UI of the DMP is called back office. It is intended for knowledge workers, e.g., system librarians, that have knowledge of the data formats and business logics, which needs to be applied to transform data from one format into another. Amongst others, currently the Data Perspective and Modelling Perspective are part of this UI. See http://demo.dswarm.org for a testing version.

Class

A class is a type or universal to specify or categorize a concrete object or particular, e.g., a Record class (as part of a Schema) can be book or document. Classes can be part of a Vocabulary.

Component

A component is a mechanism to describe the instantiation/utilisation of a Function or Transformation, i.e., a mapping of variables or static values to the parameters of the utilised function or transformation. A component can be part of a Transformation or Mapping that make use of existing functions or transformation in a concrete context.

Configuration

A configuration is a bunch of settings or parameters that are necessary to interpret the data of a Data Resource. A CSV file, for example, can be parsed by a given column delimiter, row delimiter, the encoding of the content, etc.. Those settings are required to interpret the dumb plain data into useful bits of information. The result of the application of a configuration to a data resource is a Data Model.

Content Schema

A content schema is a schema that builds on top of a structure schema (meta data model), e.g., 'MABxml' is a structure Schema for 'MAB', which is a content schema. Usually content schemata follow a key/value structure. That's why, a content schema refers to a (possible) ordered list of Attribute Paths that hold the parts of the key, i.e., their values form the key, e.g., in MABxml these are 'nr' and 'ind'. Concrete instances of those key/value pair are called Key Definitions. Furthermore, a content schema knows the attribute path where the values can be found re. the structure schema, e.g., in MABxml this 'feld > value'. Finally, a content schema can hold further details, e.g., the attribute path that directs to a legacy record identifier.

Data Hub

Data hub is the central Data Storage for content within the DMP. It is storing all available/registered data, i.e., all (normalized) content (Records) from Data Resources (represented/partitioned as Data Models. Data models can be viewed as sub graphs.

  • hint: The data describing the processing workflows is stored in the Metadata Repository
  • alternative name: internal data storage

Data Model

A data model is an interpretation of a Data Resource through a Configuration (e.g. a csv file can be parsed by a given column delimiter, row delimiter, the encoding of the content, etc.). A data model consists of a bunch of Records. Each input data resource is processed/loaded via a configuration into the Data Hub of the Data Management Platform. The result is a data model that refers to its data resource, configuration and Schema of the data (records).

  • note: data model is sometimes utilised as replacement for Meta Data Model (which has a different meaning)

Data Perspective

In the web-based UI (Back Office) the data Perspective enables the user to perform all data uploads (Data Resource creation/insert) and Configurations etc. in order to produce the correct Data Model which can be used for modelling purposes (i.e. via a Project).

  • alternative name: data view

Data Resource

A data resource refers to or denotes the actual data (“bits and bytes”) that is contained in a source or a target without any interpretation. A data resource (the data of a data resource) can be interpreted by a Configuration. The result of this interpretation/processing is a Data Model consisting of Records. A data resource can have a certain type.

  • alternative names : raw data, data source (for input data resource), resource, data
  • note: a data resource can be on both ends, i.e., there are input data resources and output data resource

Data Resource Type

A data resource type refers to or denotes a certain type of data source system (eg. csv, xml, mysql, ...). It can be compared to media type/content type (see Internet media type @ Wikipedia or RFC 2046: MIME type), e.g., text/xml (as example for a generic media type) or application/vnd.openxmlformats-officedocument.presentationml.presentation (as example for a rather specific media type (on top of a more generic type)).

Data Storage

A data storage is a system for storing and maintaining data. It can be, e.g., a file system or database. Currently, the DMP consists of two data storages. One for storing all processing metadata - the Metadata Repository - and another one for storing all content (data) - the Data Hub.

Data Type

A data type specifies the value type of a Literal (a simple value of an Attribute/Attribute Path), e.g., xsd:string, xsd:dateTime.

DMP

Acronym for Data Management Plattfrom. Also know as d:swarm.

Domain Model

An application or data has a domain model that describes all entities and relationships of this application or data. In a data processing tool (e.g. d:swarm) the domain model of the application is not the domain model of the data that can be processed with this application, i.e., the domain model of the application is on another higher (meta) level. d:swarm, for example, is designed for primarily working with bibliographic Records, i.e., the data domain model is library (with Entities such as resource (document, book), person (author, contributor), etc. (stored in the Data Hub)); however, the application domain model of the DMP is data processing (with entities such as Mapping, Job, Attribute Path (stored in the Metadata Repository)).

Draft (Mode)

While creating, updating, deleting mappings, filter or transformations in the Modelling Perspective and not yet saving the work, it is locally stored in the browser.

Edge

See Relationship.

Entity

An entity is the hypernym for describing all terms, parts, classes or types of a Domain Model. In our application domain model, these are for example Attribute Paths, Mappings, Transformations. They are stored in the Metadata Repository.

  • alternative name: resource

Execution Environment

The execution environment is an application where Tasks can be executed, i.e., where the modelled business logics are applied on (ingested) data (input Data Model) and the result (output data model) is generated (and (optionally) the Data Hub is updated).

  • alternative names : runtime environment, execution engine, execution system, runtime architecture

Filter

A business logic (e.g., graph pattern, e.g., a Cypher query) that reduces or specifies the data that should be addressed/processed via a Mapping. Filters can be defined on both endings of a mapping, i.e., there can be input filters (in Mapping Inputs) and output filters (in Mapping Outputs).

Function

A function is a data processing method or operation. It consists of a bunch of (input) parameters, an output (parameter) and a certain transformation logic that will be applied to the data in a data stream of a data processing workflow (Job). A simple function is a kind of atomic operation. Extended or composed functions are Transformations. Functions (or transformations) can be (re-)utilised in transformations or Mappings (via Components).

  • Example : A replacement function replaces a certain value <a> with a defined value <b>.

Graph

The data of the Data Models of d:swarm is stored in a graph (the Data Hub). Currently, this graph relies on a Property Graph-based representation.

Graph Data Model

A graph data model is a data model or Meta Data Model to describe the knowledge representation structure in a Graph. An example of an abstract graph data model is the Property Graph. (Graph) data models can be layered, e.g., the property graph data model can act as a meta data model for another data model, e.g., RDF (which is the case for the Graph Data Model applied in the Data Hub of d:swarm).

Job

A job is a collection of Mappings that can be executed with an input Data Model that contains data (Records) that make use of the utilised input Attribute Paths of the mappings. Mappings from a job are independent from an input data model and an output data model. A job can be configured to a concrete Task with an input data model and an output data model, i.e., a job can be instantiated/utilised by a bunch of tasks.

  • alternative names: data processing workflow, data transformation workflow
  • note: job is also used in more abstract descriptions (context), where a job refers to both the job and the task(s) that utilise the job

Key Definition

A key definition is a specific Filter that can be offered and utilised, when a Schema contains a Content Schema. This filter refers then to a certain key in a content schema, e.g., "102b" in the MAB format.

Literal

A literal represents a value of an Attribute within a Statement in a Record, e.g., in “The book has been published in 2011″ the value of the attribute ‘published (in)’ is ’2011′. A value can have a Data Type, e.g., string or integer.

Mapping

A mapping encapsulates a Function (or Transformation). It can refer to multiple Mapping Inputs and one Mapping Output. The mapping inputs are mapped to the parameters of the function and the mapping output is mapped to the output of the function. The instantiation/utilisation of transformations by mappings is realized in the same way transformations utilise (other) functions - by means of Components: Each mapping can have a (transformation) component that describes the parameter mappings and refers to the utilised function.

Mapping Input

A Mapping Input is an Attribute Path applied in a concrete Mapping on the input Data Model. A mapping input can consist of a (optional) Filter to reduce the amount of data for a certain attribute path.

Mapping Output

A Mapping Output is an Attribute Path applied in a concrete Mapping on the output Data Model. A mapping output can consist of a (optional) Filter to reduce the amount of data for a certain attribute path.

Metadata Repository

A Data Storage that contains all available metadata for describing the data processing logic etc. (application domain model; via (inter-connected) Entities, e.g. Schema, Task or Data Model) utilised in d:swarm.

Meta Data Model

A meta data model or simply data model is a knowledge representation structure to describe something, e.g., graphs. Examples of meta data models are Property Graph or RDF.

Modelling Perspective

The modelling Perspective is the central working space for knowledge workers that would like to work with the DMP, e.g., for designing, testing and debugging transformation logics (creating and maintain Projects). It is part of the Back Office application of d:swarm.

Node

A node or vertex is a fundamental part in a Graph Data Model (e.g. Property Graph) of a graph database. A node can have attributes (key-value pairs), i.e., properties. A node can have labels or types to categorise the node.

Perspectives

Perspectives are individual sites in the web-based UI (Back Office), which serve a certain scope.

Project

A project refers to all related parts that are relevant to design a whole data processing workflow, i.e., especially an (sample) input Data Model, a bunch of Mappings (that can selectively be exported as a Job) and optionally self-defined Functions (or Transformations).

Property Graph

A Property Graph is a Graph Data Model that consists of Nodes and Relationships. Whereby, the relationships are directed and binary, i.e., each relationship has always one source and one target node. A Property Graph represents/is a Graph. Currently, the Graph Data Model utilised for the Data Hub in d:swarm makes use of a Property Graph.

Qualified Attribute

Qualified Attributes are properties that describe a Statement more in detail, i.e., they qualify or contextualize the statement, e.g., provenance, version, evidence or weighting.

RDF

The resource description framework is a (meta) data model that consists of Statements that are simple sentences of the structure: subject, predicate, object. Whereby, subjects can be resources (cf. Records) with a URI as identifier or Blank Nodes. Predicates (Attributes) are usually represented by a URI as well. Objects can be a resource with a URI, a blank node or a Literal. Currently, the Graph Data Model utilised for the Data Hub in d:swarm makes use of RDF on top of a Property Graph.

A similar (meta) data model is EAV (entity-attribute-value).

Record

A record is a resource description or object (cf. Entity) that describes a certain thing, e.g., a book. A book, for example, has an author, a publishing date, a publisher, etc. Records are the main unit of work of the Data Management Platform. A Data Model (within the Data Hub) consists of a bunch of records (of certain record Classes). Whereby, a single record consists of a bunch of Statements. Records can follow a certain Schema.

alternative names : bibliographic record, Datensatz see : Bibliographic Record @ Wikipedia

Relationship

A relationship or Edge is a fundamental part in a Graph Data Model of a graph database. A relationship can connect two or more Nodes to each other. A relationship can be directed or undirected. A relationship can have a type or label to categorise it. A relationship can have properties to qualify it. A directed, binary relationship (incl. its start and end node) can also be viewed as simple subject-predicate-object Statement (Node-Edge-Node).

Schema

A schema is a collection of Attribute Paths, a (generic) Class for Records and (optionally) a Content Schema. Each Data Model refers to a schema. All pieces of information in a data model should be addressable via the attribute paths of its schema. The schema of a CSV file (Data Resource), for example, consists of the names (or positions) of the headers, i.e., each header is an Attribute (thereby, the Attribute Paths are equal to those attributes, since CSV files have a flat hierarchy). A schema can refer to certain sub-schemata or make use of other existing schemas to describe records (resources) at specific attribute paths, .e.g., a dcterms:creator attribute (path) can refer to a specific schema for describing persons etc. Hence, a schema can be a composition of schemata.

Statement

A statement – a simple sentence with the structure: subject, predicate, object – is the fundamental knowledge representation structure of the Graph Data Model in our Data Hub. Whereby, each subject and object is represented by a Node and the predicate is always a Relationship (Edge). Hence, a statement made of subject, predicate, object is represented by a node-edge-node structure. The option of properties at relationships enables the ability to add fine-grained (external) context information (Qualified Attributes), e.g., provenance, version or evidence, at the level of statements or to qualify a statement more precise, e.g., weighting etc. A Record consists of a bunch of statements.

Task

A task is a concrete execution unit of a Job, i.e., besides the Mappings from a job, a task defines an input Data Model and an output data model on which the job can be executed, i.e., it refers to the concrete data source and target of the data processing workflow. A task is the central data processing unit in d:swarm.

Transformation

A transformation is an extended Function, i.e., it is a composition of existing functions or transformations that are utilised/instantiated by Components of the transformation. The data processing workflow of a transformation is the ordered list or sequence of its components, i.e., components can be connected to each other and describe thereby the processing pipeline of the workflow. The detailed workflow of a transformation can be viewed and manipulated in the transformation logic widget. Transformations or functions can be instantiated/utilised by a Mapping, i.e., the (transformation) component of the mapping instantiates a transformation or functions by its parameter mapping.

  • alternative names: -transformation process, transformation pipeline, transformation logic-
  • note: transformation is also used in more abstract descriptions (context), where transformation refers to the transformation pipeline or logic of a mapping

Transformation Engine

The part of the data management application that executes the Tasks. See also Execution Environment.

URI

A URI is a uniform resource identifier for denoting something, e.g., resources (cf. Records), properties (cf. Attributes) or resource types (cf. Classes).

See also URI @ Wikipedia

Vocabulary

A vocabulary or ontology is a set of terms (universals). Terms can be Attributes (properties) or Classes (types).

Widgets

Every box in the Modelling Perspective in the web-based UI (Back Office) is called a widget. The following widgets are available:

  • Source Data Widget
  • Mapping Area
  • Target Data Widget
  • Configuration Widget
  • Transformation Logic Widget
  • Function List Widget
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.