Available Transformation Functions

Bo Ferri edited this page Nov 21, 2017 · 52 revisions
Clone this wiki locally

Note: d:swarm uses functions of the Metafacture framework for data transformation. You can find more detailed information about these functions at https://github.com/culturegraph/metafacture-core/wiki/Metamorph-functions and https://github.com/metafacture/metafacture-core/wiki/Metamorph-collectors.

Function Description Parameter Explanation Example
case Letter characters are transformed to lower or upper case. language locale en (for English)
upper lower case is converted to upper case SLUB DRESDEN
lower upper case is converted to lower case slub desden
compose Wraps the value in a prefix and postfix. Prefixing a mapping value “swarm” with “d:” will result in “d:swarm”. prefix prefix string d:
postfix postfix string
concat

Combines the values of several attributes into one element, adding option prefix and postfix strings, and passes result to output.

Value 1: “SLUB”

Value 2: “Dresden”

delimiter: “-”

prefix: “Pre”

postfix: “Post”

Result: “PreSLUB-DresdenPost”

delimiter delimiter used to separate concatenated values
prefix prefix string
postfix postfix string
constant Replaces the value with a constant string. value replace value
count Counts occurrences of an attribute and passes result to output. no parameter
equals Filtering based on equality of the inpupt attribute and the function parameter. If the same, the input attribute is passed to output. string comparison value
htmlanchor Creates an HTML anchor tag with the following pattern (without "+" and spaces):

<a href=" + prefix + value + postfix + ">title</a>

Example to be mapped: "slub-dresden"
Result: <a href="http://www.slub-dresden.de/">Homepage SLUB Dresden</a>

prefix prefix string http://www.
postfix postfix string .de
title link text Homepage SLUB Dresden
isbn ISBN cleaning, checkdigit verication and transformation between ISBN 10 and ISBN 13. Non-digit characters can be eliminated. ISBN can be validated. isbn13 transformation to ISBN 13
isbn10 transformation to ISBN 10
clean elimination of non-digit characters
verifyCheckDigit validation
normalize-utf8 UTF-8 normalization. Transforms umlauts into canonical form. no parameter
not-equals Filtering based on inequality. If unequal, attribute value is passed to putput. string comparison value
occurence

Filtering based on occurrence.

Values to be mapped (e.g. result of split): “SLUB” “Dresden” “d:swarm” “DMP”

only: “moreThen 2″

sameEntity: “True”

Result: “d:swarm” “DMP”

only Position of element

moreThen 2

3

lessThen

sameEntity

True

False

regexp Regexp matching returning the first occurrence of a pattern. The pattern is a Java regex pattern. format order of the capturing groups ${1}
match regex pattern ^isbn\d\d\-(\d{10,13})
replace Replaces a pattern with a string. The pattern is a Java regex pattern. pattern regex pattern ^isbn\d\d\-(\d{10,13})
with replace value
split

Splitting based on a regexp.

Value to be split: “SLUB-Dresden”

delimiter: “-”

Result: “SLUB” and “Dresden” are passed to output

delimiter regex pattern
substring

Extracts a substring.

value: “SLUB Dresden” start=0, end=7, returns “SLUB Dr”

end index position of the last character
start index position of the first characte
trim Trims all white spaces at the beginning and at the end of the attribute value. no parameter
urlencode Transforms all characters not allowed in a URL into URL-compatible characters. no parameter
regexlookup Performs a table lookup where keys may be regexes. lookupString A map or uploaded file that contains key/value pairs.
default Value used if no corresponding key is found.
dewey Dewey conversion and verification. precision A decimal number (represented in string format) showing the desired precision of the returned number; i.e. 100 to round to nearest hundred, 10 to round to nearest ten, 0.1 to round to nearest tenth, etc.
addLeadingZeros Add leading zeros to a Dewey number (if not present).
errorString Error string that should be written as value, if the input string is not a valid Dewey number.
http-api-request HTTP API GET request with the input value as URI. Note: the URI should probably composed in a previous component, i.e., the http-api-request function expects a valid URI. Note: the response needs to be processed in a further component, e.g.,parse-json. acceptType The accept type of the HTTP API request.
errorString Error string that should be written as value, if the HTTP API request fails for some reason.
parse-json Parses the input value with help of the given JSONPath. jsonPath The JSONPath to extract values from the given input JSON value. Note: the JSONPath must conform http://goessner.net/articles/JsonPath/.
errorString Error string that should be written as value, if the JSON parsing fails for some reason.
collect Collects all received values and concatenates them on record end. Useful for values of a field that occurs multiple times in a record. delimiter delimiter used to separate concatenated values
prefix prefix string
postfix postfix string
multi-collect Collects all received values and concatenates them on record end. delimiter delimiter used to separate concatenated values
prefix prefix string
postfix postfix string
numfilter Extract data based on matching a numeric filter. Syntax is ">" for greater then, "<" for less then, "==" for equals, ">=" for greater then or equals and "<=" for less then or equals. Note: all '<' and '>' signs should be encoded in attributes, like '&lt;' and '&gt;'. expression numeric filter expression
issn ISSN conversion and verification. format Formats/normalizes the given ISSN with a hyphen after the 4th digit (+ upper-cases the last character)
check Check the given ISSN with help of the checksum character at the end of the ISSN (default = true)
errorString Error string that should be written as value, if the input string is not a valid ISSN
convert-value Convert a value to a certain value type, e.g., RESOURCE, LITERAL or BNODE. format One of RESOURCE (value will be interpreted as resource URI), LITERAL (string literal) or BNODE (value will be interpreted as bnode identifier).
errorString Error string that should be written as value, if the input string is not a valid value for the value type (e.g. not an URI if it's a RESOURCE).
sqlmap Performs a SQL lookup for a given key with a given SQL query. databaseType default = mysql
host the host of the database, e.g., localhost (default = localhost)
port the port of the database, e.g., 3306 (default = 3306)
database the name of the database
login the username that should be utilised to connect to the database
password the password that should be utilised to connect to the database
query the prepared SQL statement that should be utilised to retrieve the value for a given key, e.g. 'SELECT value FROM mytable WHERE key = ?'
all Outputs an unnamed literal with "true" as value if all contained statements fire. This is essentially a conjunction (logical and-operation) of all contained statements. The name and value generated by the all-statement can be customised. value value that should be emitted, if this condition has been matched.
any Outputs an unnamed literal with "true" as value if any of the contained statements fires. This is essentially a disjunction (logical or-operation) of all contained statements. The name and value generated by the all-statement can be customised. value value that should be emitted, if this condition has been matched.
choose Collects all received values and emits the most preferred one on record end. no parameter
combine Collects all received values and combine them. value A template how the different values should be combined in a string.
dateformat Format a date in a specific format. Note: language code needs to be lowercase. inputformat Syntax corresponds to Java Date Format, default is dd.MM.yyyy
outputformat Syntax corresponds to Java Date Format, default is LONG
language 2-letter language code (lowercase!)
lookup Performs a table lookup. lookupString A map or uploaded file that contains key/value pairs.
default Value used if no corresponding key is found.
none Outputs an unnamed literal with "true" as value if none of the contained statements fires. This is essentially a logical not operation. The value generated by the none-statement can be customised. value value that should be emitted, if this condition has been matched.
setreplace Replaces strings based on a replacement table. lookupString A map or uploaded file that contains key/value pairs.
timestamp Current timestamp/time. format Syntax corresponds to Java Date Format (e.g. yyyy-MM-dd HH:mm), default is unix timestamp
timezone Default timezone is UTC, allowed example is Europe/Berlin
language 2-letter language code (lowercase!)
sql-db-request Executes a prepared SQL statement to a SQL database for a given key. This function can return multiple values for one key (as opposite to the resultset of the sqlmap function (which is 1, since it's a map) databaseType default = mysql mysql
host the host of the database (default = localhost) localhost
port the port of the database (default = 3306) 3306
database the name of the database
login the username that should be utilised to connect to the database
password the password that should be utilised to connect to the database
query the prepared SQL statement that should be utilised to retrieve the value for a given key. SELECT value FROM mytable WHERE key = ?
driver the JDBC driver that should be utilised to connect to the database (default = com.mysql.jdbc.Driver; note: the database driver needs to be part of the classpath of the execution environment) com.mysql.jdbc.Driver
ifelse Allows to choose between two different inputs. If the if-input emits, then the else-input will be ignored (vice versa). if the first part
else the second part (alternative)
siphash SipHash hashing. no parameter
base64 Base64 hashing. no parameter
tail Emit all received literals except of the first one. no parameter
head Emit only the first literal (all others are dropped). no parameter

D:SWARM Help - Step by Step