Skip to content
Permalink
Browse files
HOP-3927 DimensionLookup: update documentation according to new dialo…
…g layout
  • Loading branch information
sramazzina committed May 17, 2022
1 parent 495ec97 commit 6317a08e5b410911bcfb686958f0d9eecf5926bb
Show file tree
Hide file tree
Showing 14 changed files with 92 additions and 74 deletions.
@@ -39,7 +39,7 @@ You can use it to avoid the natural concurrency (parallelism) that exists betwee
[width="90%",options="header"]
|===
|Option|Description
|transform name|The name of the transform to wait for.
|Transform name|The name of the transform to wait for.
|CopyNr|The (0-based) copy number of the transform.
If the named transform has an explicit setting for "Change number of copies to start", and you want to wait for all copies to finish, you'll need to enter one row in the grid for each copy, and use this column to specify which copy of the transform to wait for.
For the default number of copies (1), the CopyNr is always 0.
@@ -55,7 +55,7 @@ To define and use multiple parameters, list the fields in order you want them to
[width="90%",options="header"]
|===
|Option|Description
|transform name|Name of the transform; This name has to be unique in a single pipeline
|Transform name|Name of the transform; This name has to be unique in a single pipeline
|Connection|The database connection to use for the query.
|SQL|SQL query to form the join; use question marks as parameter placeholders
|Number of rows to return|Zero (0) returns all rows; any other number limits the number of rows returned.
@@ -31,7 +31,7 @@ Use this transform if you deliberately want to slow down your pipeline.
[width="90%",options="header"]
|===
|Option|Description
|transform name|Name of the transform.
|Transform name|Name of the transform.
Note: This name has to be unique in a single pipeline.
|Timeout|The timeout value in seconds, minutes or hours
|===
@@ -29,18 +29,88 @@ This transform can be used to populate a dimension table or to look up values in

== Options

=== Common fields

|===
|Option|Description
|Transform name|Name of the transform.
|Update the dimension?|Enable to update the dimension based on the information in the input stream; if not enabled, the dimension only performs lookups and adds the technical key field to the streams.
|Connection|Name of the database connection on which the dimension table resides.
|Target schema|This allows you to specify a schema name.
|Target table|Name of the dimension table.
|Commit size|Define the commit size, e.g. setting commit size to 10 generates a commit every 10 inserts or updates.
|Caching a|
* Enable the cache?
Enable this option if you want to enable data caching in this transform; set a cache size of >=0 in previous versions or -1 to disable caching.
* Pre-load cache?
You can enhance performance by reading the complete contents of a dimension table prior to performing lookups.
Performance is increased by the elimination of the round trips to the database and by the sorted list lookup algorithm.
* Cache size in rows: The cache size in number of rows that will be held in memory to speed up lookups by reducing the number of round trips to the database.
|Get Fields button|Fills in all the available fields on the input stream, except for the keys you specified.
|SQL button|Generates the SQL to build the dimension and allows you to execute this SQL.
|===

=== Keys tab
Specify the names of the keys in the stream and in the dimension table.
This will enable the transform to perform the lookup.
[width="90%",options="header"]
|===
|Option|Description
|Dimension field|Key field used in the source system. For example: customer numbers, product id, etc.
|Stream field|Stream field containing the value get from the source system key field.
|===

=== Fields tab
For each of the fields you must have in the dimension, you can specify whether you want the values to be updated (for all versions, this is a Type I operation) or you want to have the values inserted into the dimension as a new version.
In the example we used in the screenshot the birth date is something that's not variable in time, so if the birth date changes, it means that it was wrong in previous versions.
It's only logical then, that the previous values are corrected in all versions of the dimension entry.[width="90%",options="header"]
|===
|Option|Description
|Dimension field|Fields containing the actual information of a dimension..
|Stream field to compare with|Stream field containing the incoming value to assign to that table's field.
|Type of dimension update|Type of update applied (see details below in the section Update of this document).
|===

=== Technical key tab
These tab contains the details related to the creation of the surrogate key of the dimension's record.
[width="90%",options="header"]
|===
|Option|Description
|Technical key|This is the primary key of the dimension.
|Version field|Shows the version of the dimension entry (a revision number).
|Start of date range|This is the field name containing the validity starting date.
|End of date range|This is the field name containing the validity ending date.
|Keys|These are the keys used in your source systems.
For example: customer numbers, product id, etc.
|Fields|These fields contain the actual information of a dimension.
|Technical key field|The primary key of the dimension; also referred to as Surrogate Key.
Use the new name option to rename the technical key after a lookup.
For example, if you need to lookup different types of products like ORIGINAL_PRODUCT_TK, REPLACEMENT_PRODUCT_TK, ...
|Creation of technical key a|Indicates how the technical key is generated, options that are not available for your connection type will be grayed out:

* Use table maximum + 1: A new technical key will be created from the maximum key in the table.
Note that the new maximum is always cached, so that the maximum does not need to be calculated for each new row.
* Use sequence: Specify the sequence name if you want to use a database sequence on the table connection to generate the technical key (typical for Oracle e.g.).
* Use auto increment field: Use an auto increment field in the database table to generate the technical key (supported e.g. by DB2).
|===

=== Versioning tab
These tabs defines the way the record's version is generated
[width="90%",options="header"]
|===
|Option|Description
|Version field|The name of the field in which to store the version (revision number).
|Stream Datefield|If you have the date at which the dimension entry was last changed, you can specify the name of that field here.
It allows the dimension entry to be accurately described for what the date range concerns.
If you don't have such a date, the system date will be taken.
When the dimension entries are looked up (Update the dimension is not selected) the date field entered into the stream datefield is used to select the appropriate dimension version based on the date from and date to dates in the dimension record.
|Date range start field|Specify the names of the dimension entries start range.
|Use an alternative start date? a|When enabled, you can choose an alternative to the "Min.
Year"/01/01 00:00:00 date that is used.
You can use any of the following:

* System date: Use the system date as a variable date/time
* Start date of pipeline: Use the system date, taken at start of the pipeline for the start date
* Empty (null) value
* Column value: Select a column from which to take the value.

|Table date range end|The names of the dimension entries end range
|===

== General considerations
As a result of the lookup or update operation of this transform type, a field is added to the stream containing the technical key of the dimension.
In case the field is not found, the value of the dimension entry for not found (0 or 1, based on the type of database) is returned.

@@ -58,7 +128,7 @@ These are the optional fields:

As the name of the transform suggests, the functionality of the transform falls into 2 categories, Lookup and Update...

== Lookup
=== Lookup

In read-only mode (update option is disabled), the transform only performs lookups in a slowly changing dimension.
The transform will perform a lookup in the dimension table on the specified database connection and in the specified schema.
@@ -90,7 +160,7 @@ Stay away from those.
We recommend you use sane data types like Integer or long integers.
Stay away from Double, Decimal or catch-all data types like Oracle's Number (without length or precision; it implicitly uses precision 38 causing us to use the slower BigNumber data type).

== Update
=== Update

In update mode (update option is enabled) the transform first performs a lookup of the dimension entry as described in the "Lookup" section above.
The result of the lookup is different though.
@@ -112,58 +182,6 @@ The result can be one of the following situations:
** select min(date_from) from dim_table where date_to = "2199-12-31 23:59:59.999"
** It is important to ensure that the incoming rows are sorted by the "Stream date field"

== Options

|===
|Option|Description
|transform name|Name of the transform.
|Update the dimension?|Enable to update the dimension based on the information in the input stream; if not enabled, the dimension only performs lookups and adds the technical key field to the streams.
|Connection|Name of the database connection on which the dimension table resides.
|Target schema|This allows you to specify a schema name.
|Target table|Name of the dimension table.
|Commit size|Define the commit size, e.g. setting commit size to 10 generates a commit every 10 inserts or updates.
|Caching a|
* Enable the cache?
Enable this option if you want to enable data caching in this transform; set a cache size of >=0 in previous versions or -1 to disable caching.
* Pre-load cache?
You can enhance performance by reading the complete contents of a dimension table prior to performing lookups.
Performance is increased by the elimination of the round trips to the database and by the sorted list lookup algorithm.
* Cache size in rows: The cache size in number of rows that will be held in memory to speed up lookups by reducing the number of round trips to the database.
|Keys tab|Specify the names of the keys in the stream and in the dimension table.
This will enable the transform to perform the lookup.
|Fields tab|For each of the fields you must have in the dimension, you can specify whether you want the values to be updated (for all versions, this is a Type I operation) or you want to have the values inserted into the dimension as a new version.
In the example we used in the screenshot the birth date is something that's not variable in time, so if the birth date changes, it means that it was wrong in previous versions.
It's only logical then, that the previous values are corrected in all versions of the dimension entry.
|Technical key field|The primary key of the dimension; also referred to as Surrogate Key.
Use the new name option to rename the technical key after a lookup.
For example, if you need to lookup different types of products like ORIGINAL_PRODUCT_TK, REPLACEMENT_PRODUCT_TK, ...
|Creation of technical key a|Indicates how the technical key is generated, options that are not available for your connection type will be grayed out:

* Use table maximum + 1: A new technical key will be created from the maximum key in the table.
Note that the new maximum is always cached, so that the maximum does not need to be calculated for each new row.
* Use sequence: Specify the sequence name if you want to use a database sequence on the table connection to generate the technical key (typical for Oracle e.g.).
* Use auto increment field: Use an auto increment field in the database table to generate the technical key (supported e.g. by DB2).
|Version field|The name of the field in which to store the version (revision number).
|Stream Datefield|If you have the date at which the dimension entry was last changed, you can specify the name of that field here.
It allows the dimension entry to be accurately described for what the date range concerns.
If you don't have such a date, the system date will be taken.
When the dimension entries are looked up (Update the dimension is not selected) the date field entered into the stream datefield is used to select the appropriate dimension version based on the date from and date to dates in the dimension record.
|Date range start field|Specify the names of the dimension entries start range.
|Use an alternative start date? a|When enabled, you can choose an alternative to the "Min.
Year"/01/01 00:00:00 date that is used.
You can use any of the following:

* System date: Use the system date as a variable date/time
* Start date of pipeline: Use the system date, taken at start of the pipeline for the start date
* Empty (null) value
* Column value: Select a column from which to take the value.
\\\\

|Table date range end|The names of the dimension entries end range
|Get Fields button|Fills in all the available fields on the input stream, except for the keys you specified.
|SQL button|Generates the SQL to build the dimension and allows you to execute this SQL.
|===

== Metadata Injection Support

All fields of this transform support metadata injection.
@@ -29,7 +29,7 @@ The Execute Unit Tests transform fetches and executes the available xref:pipelin
[width="90%",options="header"]
|===
|Option|Description
|transform name|name for this transform
|Transform name|name for this transform
|Test name input field|name of a field to get the unit test name from to determine which transforms to execute.
This option is only available when the transform receives input.
|Type of tests to run|Development or Unit Test
@@ -39,7 +39,7 @@ Using 0 rows for 'limit scanned rows' is a way to make sure the entire file is s
[width="90%",options="header"]
|===
|Option|Description
|transform name|the name for this transform
|Transform name|the name for this transform
|filename|the filename to scan for metadata
|limit scanned rows|the number of rows to limit the scan to (default 10,000).
Use 0 rows to scan the entire file.
@@ -31,7 +31,7 @@ Every time a file gets processed, used or created in a pipeline or a workflow, t
[width="90%",options="header"]
|===
|Option|Description
|transform name|The unique transform name within the pipeline
|Transform name|The unique transform name within the pipeline
|===

== Output fields
@@ -37,7 +37,7 @@ WARNING: the Files To Result is available for historical reasons. Check the xref
[width="90%",options="header"]
|===
|Option|Description
|transform name|The name of this transform as it appears in the pipeline workspace.
|Transform name|The name of this transform as it appears in the pipeline workspace.
|Filename field|Field that contains the filenames of the files to copy.
|Type of file to|Select the type of file to set in results.
|===
@@ -37,7 +37,7 @@ TIP: Lists also works on numeric values like integers. In this case, the list of
[width="90%",options="header"]
|===
|Option|Description
|transform name|Optionally, you can change the name of this transform to fit your needs.
|Transform name|Optionally, you can change the name of this transform to fit your needs.
|Send 'true' data to transform|The rows for which the condition specified is true are sent to this transform
|Send 'false' data to transform|The rows for which the condition specified are false are sent to this transform
|The Condition|
@@ -33,7 +33,7 @@ This transform returns matching values as a separated list as specified by user-
[width="90%",options="header"]
|===
|Option|Description
|transform name|Name of this transform as it appears in the pipeline workspace
|Transform name|Name of this transform as it appears in the pipeline workspace
|Lookup transform|Identifies the transform that contains the fields to match
|Lookup field|Identifies the field to match
|Main stream field|Identifies the primary stream to match the Lookup field with
@@ -41,7 +41,7 @@ The Get Files Row Count transform counts the number of rows in a file or set of
|Rows|
|Selected
|Show|
|transform name|
|Transform name|
|===

=== Content tab
@@ -60,7 +60,7 @@ The last value returned plus the size of the range (increment) are stored in a d
[width="90%",options="header"]
|===
|Option|Description
|transform name|The name of this transform as it appears in the pipeline workspace.
|Transform name|The name of this transform as it appears in the pipeline workspace.
|Name of value|The name of the (Integer type) output field (sequence or ID)
|Slave server|The hop server to get the unique ID range from.
This can be specified using a variable
@@ -47,7 +47,7 @@ Use the provided button in the graph model editor to generate the Neo4j Index an

|===
|Option |Description
|transform name|the name for this transform in the pipeline
|Transform name|the name for this transform in the pipeline
|Neo4j connection|the Neo4j connection to write the graph to
|Graph model|the xref:metadata-types/neo4j/neo4j-graphmodel.adoc[Neo4j graph model] to use
|Batch size (rows)|batch size to use for writing data to Neo4j
@@ -29,7 +29,7 @@ The Row Flattener transform allows you to flatten data sequentially.
[width="90%",options="header"]
|===
|Option|Description
|transform name|Name of the transform; this name has to be unique in a single pipeline
|Transform name|Name of the transform; this name has to be unique in a single pipeline
|The field to flatten|The field that must be flattened into different target fields
|Target fields|The name of the target field to which the field is flattened
|===

0 comments on commit 6317a08

Please sign in to comment.