Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/hop-user-manual/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,7 @@ under the License.
*** xref:pipeline/transforms/xmlinputstream.adoc[XML Input Stream (StAX)]
*** xref:pipeline/transforms/xmljoin.adoc[XML Join]
*** xref:pipeline/transforms/xmloutput.adoc[XML Output]
*** xref:pipeline/transforms/xmloutputadvanced.adoc[XML Output (Advanced)]
*** xref:pipeline/transforms/xsdvalidator.adoc[XSD Validator]
*** xref:pipeline/transforms/xslt.adoc[XSL Transformation]
*** xref:pipeline/transforms/yamlinput.adoc[Yaml Input]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,7 @@ The pages nested under this topic contain information on how to use the transfor
* xref:pipeline/transforms/xmlinputstream.adoc[XML Input Stream (StAX)]
* xref:pipeline/transforms/xmljoin.adoc[XML Join]
* xref:pipeline/transforms/xmloutput.adoc[XML Output]
* xref:pipeline/transforms/xmloutputadvanced.adoc[XML Output (Advanced)]
* xref:pipeline/transforms/xsdvalidator.adoc[XSD Validator]
* xref:pipeline/transforms/xslt.adoc[XSL Transformation]
* xref:pipeline/transforms/yamlinput.adoc[Yaml Input]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
////
:documentationPath: /pipeline/transforms/
:language: en_US
:description: The XML Output (Advanced) transform builds hierarchical XML from input rows, with optional write-to-file, XML-as-field output, splits, and XSD generation.

= image:transforms/icons/AXO.svg[XML Output (Advanced) transform Icon, role="image-doc-icon"] XML Output (Advanced)

[%noheader,cols="3a,1a", role="table-no-borders" ]
|===
|
== Description

The XML Output (Advanced) transform builds XML from input rows using a hierarchical, user-defined tree. You can *write to file*, *append the document as a string field* (for use by a later transform), or *both*. File-oriented options apply only when a file is written.

The XML tree is a recursive structure of elements, attributes and document-fragment nodes. Exactly one element in the tree must be marked as the row-*loop*: each input row produces one occurrence of that element with its full subtree. Optionally, ancestors of the loop can be marked as *group-by*: consecutive input rows that share the same group key are emitted under a single occurrence of the group element.

This transform complements the simpler `XML Output` transform. Use `XML Output` for a flat document of repeating rows; use `XML Output (Advanced)` when you need a deeper, custom-shaped XML structure (loops nested inside groups, attributes at any level, document fragments, namespaces, schema generation).

|
== Supported Engines
[%noheader,cols="2,1a",frame=none, role="table-supported-engines"]
!===
!Hop Engine! image:check_mark.svg[Supported, 24]
!Spark! image:cross.svg[Not Supported, 24]
!Flink! image:cross.svg[Not Supported, 24]
!Dataflow! image:cross.svg[Not Supported, 24]
!===
|===

== Options

The dialog is organized into three tabs: *File*, *Content* and *XML Tree*.

=== File tab

[options="header"]
|===
|Option|Description
|Transform name|Name of the transform.
|Output|Where to send the XML: *Write to file*, *Output XML as field*, or *Write to file and output XML as field* (both). Stored in pipeline XML as codes `writetofile`, `outputvalue`, and `both`.
|XML output field|Name of the field that receives the completed XML document (one value per split when splitting is enabled). Used when *Output* is *Output XML as field* or *both*.
|Include input fields in output|When *Output* includes an XML field: if enabled (default), each emitted row contains all input fields plus the XML field; if disabled, only the XML field is emitted (narrow stream useful for chaining).
|Filename|Base name of the output XML file (without extension). VFS URIs are supported. Required when *Output* writes to a file.
|Extension|File extension (without the leading dot). Defaults to `xml`.
|Encoding|Character encoding for the output file. Defaults to `UTF-8`.
|Include transform copy number in filename|Append the transform copy number to the filename.
|Include date in filename|Append the system date (`yyyyMMdd`) to the filename.
|Include time in filename|Append the system time (`HHmmss`) to the filename.
|Specify custom date/time format|Use a custom date/time pattern instead of the date/time toggles above.
|Date/time format|Java `SimpleDateFormat` pattern, used when the custom format toggle is on.
|Split every N rows|Maximum rows per file before rolling over to a new split, or per completed XML field segment when *Output* includes an XML field. `0` = no splitting.
|Zip output file|Wrap each output file in a zip archive (one entry per file). Generated XSDs are written next to the archive, not inside it.
|Do not open new file at start|Defer file creation until the first input row is received.
|Do not create file if no rows|Delete the output file at the end of the run if no rows were ever written.
|Add filename to result|Add the produced file(s) to the pipeline's result file list (only after at least one row is written).
|Show file name(s) ...|Pops up a list with sample filenames built from the current settings.
|===

=== Content tab

[options="header"]
|===
|Option|Description
|Compact|Suppress whitespace and EOL between elements; useful for byte-size-sensitive output.
|Blank line after XML declaration|Add a blank line right after the `<?xml ?>` declaration.
|Emit empty elements|Emit an open/close tag pair for an element that has no value and no children.
|Emit attribute when value is null|Emit an attribute even when its source value is `null`.
|Emit attribute when no field is mapped|Emit an attribute that has no mapped field, using its default value.
|Trim leading/trailing whitespace|Trim text values before emitting them.
|Default decimal separator|Default decimal separator for numeric values; per-node settings still take precedence.
|Default grouping separator|Default grouping separator for numeric values; per-node settings still take precedence.
|Generate sibling XSD file|Write a sibling `.xsd` schema next to each output file (or each split). The schema is derived from the configured XML tree and the upstream row metadata.
|DOCTYPE root element / system / public identifier|Emit a `<!DOCTYPE ...>` declaration between the XML declaration and the root element.
|XSL stylesheet href / type|Emit an `<?xml-stylesheet ?>` processing instruction. Type defaults to `text/xsl` when blank.
|===

=== XML Tree tab

The XML Tree tab is the visual designer for the output structure. The left pane lists the input fields received from the previous transform; the right pane is split between the target tree (top) and the property pane (bottom) for the currently-selected node.

==== Working with the tree

* Click *Get fields* to (re)load the input fields from the previous transform.
* Drag a field from the left pane and drop it onto an element in the tree. A new child element is created with that field name and `mappedField` pre-filled.
* Use the toolbar above the tree (or the right-click menu) to:
** *+ Element* / *+ Attribute* / *+ Fragment*: add a child node of the chosen kind under the selected element.
** *Delete*: remove the selected node and its descendants (the root cannot be deleted).
** *Up* / *Down*: reorder the selected node among its siblings.
** *Loop*: toggle the loop flag. Exactly one element in the tree must carry it; switching the loop on a different node automatically clears it elsewhere.
** *Group-by*: toggle the group-by flag on an ancestor of the loop element.
* Selecting a node populates the *Properties* form below the tree. Edits propagate to the model immediately.

==== Node properties

[options="header"]
|===
|Property|Description
|Name|Local name of the element or attribute.
|Namespace URI|Optional XML namespace URI. When set on the root element, it becomes the default namespace and is also written into the generated XSD as the `targetNamespace`.
|Kind|`Element`, `Attribute`, or `DocumentFragment`. The latter parses the source field's value and inserts it as XML nodes rather than escaped text.
|Mapped field|Input field whose value provides this node's content. For attributes and elements it sets the value; for nodes flagged `Group-by`, it identifies the group key only.
|Default value|Static text used when `Mapped field` is empty (or its value is `null`).
|Format / Length / Precision / Currency / Decimal / Grouping|Per-node value-meta overrides used when converting the field value to a string. Per-node settings take precedence over the global *Default decimal/grouping separator*.
|Loop|Marks this element as the row-loop element. Exactly one element must carry the flag.
|Group-by|Marks this element as a group-by ancestor of the loop. Consecutive rows with equal `Mapped field` values share a single occurrence.
|Force create|Output this node even when the value is `null` (uses the default value when set).
|Remove outer wrapper (duplicate parent tag)|For `DocumentFragment` nodes only: when the fragment's root element repeats the parent element name, strip that outer wrapper so the inner XML is inserted without a duplicated wrapper (for example when feeding XML from an upstream XML Output (Advanced) into a child fragment node).
|===

== Chaining and output-to-field

When *Output* is *Output XML as field* or *both*, the transform adds the configured *XML output field* to the stream for each completed document (or each split). A second XML Output (Advanced) transform can map that field with a *DocumentFragment* node. Use *Remove outer wrapper* on the fragment if the inner XML already has a root tag that would duplicate the parent element in the target tree.

== Group-by behaviour

For the group-by mechanism to collapse correctly, *the input rows must already be sorted by the group-by key(s)*. Use a Sort Rows transform upstream if needed. When the key changes, the open group element is closed and a new one is opened with the new key.

== XSD generation

When *Generate sibling XSD file* is enabled, the transform writes a `.xsd` schema next to each output file (or split). The schema:

* declares one global element matching the root of the configured tree;
* nests complex types corresponding to elements with children or attributes;
* sets `maxOccurs="unbounded"` on the loop element and on every group-by ancestor;
* renders attributes as `xs:attribute` declarations (with `use="required"` when the source node is `Force create`);
* renders document-fragment nodes as `<xs:any processContents="skip"/>` placeholders;
* maps Hop value types to XSD built-ins as follows: integer → `xs:long`, number/big-number → `xs:decimal`, date/timestamp → `xs:dateTime`, boolean → `xs:boolean`, binary → `xs:base64Binary`, everything else → `xs:string`;
* uses the root node's namespace as the schema's `targetNamespace` (and `elementFormDefault="qualified"`) when set.

The XSD is written outside zip archives and is added to the pipeline's result file list when *Add filename to result* is enabled.

== Memory profile

The transform uses StAX streaming and only buffers the XML state of the currently-open path of group elements. A single very large group is therefore O(largest group) in memory rather than O(document).

== Example: orders with grouped items

Input rows (already sorted by `orderId`):

[options="header"]
|===
|orderId|itemName|price
|1|foo|1.50
|1|bar|2.00
|2|baz|3.25
|===

Tree:

* `orders` (root, element)
** `order` (element, group-by, mapped field = `orderId`)
*** `id` (attribute, mapped field = `orderId`)
*** `item` (element, **loop**)
**** `name` (element, mapped field = `itemName`)
**** `price` (element, mapped field = `price`, format = `0.00`)

Output:

[source,xml]
----
<?xml version="1.0" encoding="UTF-8"?>
<orders>
<order id="1">
<item><name>foo</name><price>1.50</price></item>
<item><name>bar</name><price>2.00</price></item>
</order>
<order id="2">
<item><name>baz</name><price>3.25</price></item>
</order>
</orders>
----
Loading
Loading