apache · hansva · May 18, 2026 · May 8, 2026 · May 8, 2026 · May 8, 2026
diff --git a/docs/hop-user-manual/modules/ROOT/nav.adoc b/docs/hop-user-manual/modules/ROOT/nav.adoc
@@ -284,6 +284,7 @@ under the License.
 *** xref:pipeline/transforms/xmlinputstream.adoc[XML Input Stream (StAX)]
 *** xref:pipeline/transforms/xmljoin.adoc[XML Join]
 *** xref:pipeline/transforms/xmloutput.adoc[XML Output]
+*** xref:pipeline/transforms/xmloutputadvanced.adoc[XML Output (Advanced)]
 *** xref:pipeline/transforms/xsdvalidator.adoc[XSD Validator]
 *** xref:pipeline/transforms/xslt.adoc[XSL Transformation]
 *** xref:pipeline/transforms/yamlinput.adoc[Yaml Input]

diff --git a/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms.adoc b/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms.adoc
@@ -242,6 +242,7 @@ The pages nested under this topic contain information on how to use the transfor
 * xref:pipeline/transforms/xmlinputstream.adoc[XML Input Stream (StAX)]
 * xref:pipeline/transforms/xmljoin.adoc[XML Join]
 * xref:pipeline/transforms/xmloutput.adoc[XML Output]
+* xref:pipeline/transforms/xmloutputadvanced.adoc[XML Output (Advanced)]
 * xref:pipeline/transforms/xsdvalidator.adoc[XSD Validator]
 * xref:pipeline/transforms/xslt.adoc[XSL Transformation]
 * xref:pipeline/transforms/yamlinput.adoc[Yaml Input]

diff --git a/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms/xmloutputadvanced.adoc b/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms/xmloutputadvanced.adoc
@@ -0,0 +1,186 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+////
+:documentationPath: /pipeline/transforms/
+:language: en_US
+:description: The XML Output (Advanced) transform builds hierarchical XML from input rows, with optional write-to-file, XML-as-field output, splits, and XSD generation.
+
+= image:transforms/icons/AXO.svg[XML Output (Advanced) transform Icon, role="image-doc-icon"] XML Output (Advanced)
+
+[%noheader,cols="3a,1a", role="table-no-borders" ]
+|===
+|
+== Description
+
+The XML Output (Advanced) transform builds XML from input rows using a hierarchical, user-defined tree. You can *write to file*, *append the document as a string field* (for use by a later transform), or *both*. File-oriented options apply only when a file is written.
+
+The XML tree is a recursive structure of elements, attributes and document-fragment nodes. Exactly one element in the tree must be marked as the row-*loop*: each input row produces one occurrence of that element with its full subtree. Optionally, ancestors of the loop can be marked as *group-by*: consecutive input rows that share the same group key are emitted under a single occurrence of the group element.
+
+This transform complements the simpler `XML Output` transform. Use `XML Output` for a flat document of repeating rows; use `XML Output (Advanced)` when you need a deeper, custom-shaped XML structure (loops nested inside groups, attributes at any level, document fragments, namespaces, schema generation).
+
+|
+== Supported Engines
+[%noheader,cols="2,1a",frame=none, role="table-supported-engines"]
+!===
+!Hop Engine! image:check_mark.svg[Supported, 24]
+!Spark! image:cross.svg[Not Supported, 24]
+!Flink! image:cross.svg[Not Supported, 24]
+!Dataflow! image:cross.svg[Not Supported, 24]
+!===
+|===
+
+== Options
+
+The dialog is organized into three tabs: *File*, *Content* and *XML Tree*.
+
+=== File tab
+
+[options="header"]
+|===
+|Option|Description
+|Transform name|Name of the transform.
+|Output|Where to send the XML: *Write to file*, *Output XML as field*, or *Write to file and output XML as field* (both). Stored in pipeline XML as codes `writetofile`, `outputvalue`, and `both`.
+|XML output field|Name of the field that receives the completed XML document (one value per split when splitting is enabled). Used when *Output* is *Output XML as field* or *both*.
+|Include input fields in output|When *Output* includes an XML field: if enabled (default), each emitted row contains all input fields plus the XML field; if disabled, only the XML field is emitted (narrow stream useful for chaining).
+|Filename|Base name of the output XML file (without extension). VFS URIs are supported. Required when *Output* writes to a file.
+|Extension|File extension (without the leading dot). Defaults to `xml`.
+|Encoding|Character encoding for the output file. Defaults to `UTF-8`.
+|Include transform copy number in filename|Append the transform copy number to the filename.
+|Include date in filename|Append the system date (`yyyyMMdd`) to the filename.
+|Include time in filename|Append the system time (`HHmmss`) to the filename.
+|Specify custom date/time format|Use a custom date/time pattern instead of the date/time toggles above.
+|Date/time format|Java `SimpleDateFormat` pattern, used when the custom format toggle is on.
+|Split every N rows|Maximum rows per file before rolling over to a new split, or per completed XML field segment when *Output* includes an XML field. `0` = no splitting.
+|Zip output file|Wrap each output file in a zip archive (one entry per file). Generated XSDs are written next to the archive, not inside it.
+|Do not open new file at start|Defer file creation until the first input row is received.
+|Do not create file if no rows|Delete the output file at the end of the run if no rows were ever written.
+|Add filename to result|Add the produced file(s) to the pipeline's result file list (only after at least one row is written).
+|Show file name(s) ...|Pops up a list with sample filenames built from the current settings.
+|===
+
+=== Content tab
+
+[options="header"]
+|===
+|Option|Description
+|Compact|Suppress whitespace and EOL between elements; useful for byte-size-sensitive output.
+|Blank line after XML declaration|Add a blank line right after the `<?xml ?>` declaration.
+|Emit empty elements|Emit an open/close tag pair for an element that has no value and no children.
+|Emit attribute when value is null|Emit an attribute even when its source value is `null`.
+|Emit attribute when no field is mapped|Emit an attribute that has no mapped field, using its default value.
+|Trim leading/trailing whitespace|Trim text values before emitting them.
+|Default decimal separator|Default decimal separator for numeric values; per-node settings still take precedence.
+|Default grouping separator|Default grouping separator for numeric values; per-node settings still take precedence.
+|Generate sibling XSD file|Write a sibling `.xsd` schema next to each output file (or each split). The schema is derived from the configured XML tree and the upstream row metadata.
+|DOCTYPE root element / system / public identifier|Emit a `<!DOCTYPE ...>` declaration between the XML declaration and the root element.
+|XSL stylesheet href / type|Emit an `<?xml-stylesheet ?>` processing instruction. Type defaults to `text/xsl` when blank.
+|===
+
+=== XML Tree tab
+
+The XML Tree tab is the visual designer for the output structure. The left pane lists the input fields received from the previous transform; the right pane is split between the target tree (top) and the property pane (bottom) for the currently-selected node.
+
+==== Working with the tree
+
+* Click *Get fields* to (re)load the input fields from the previous transform.
+* Drag a field from the left pane and drop it onto an element in the tree. A new child element is created with that field name and `mappedField` pre-filled.
+* Use the toolbar above the tree (or the right-click menu) to:
+** *+ Element* / *+ Attribute* / *+ Fragment*: add a child node of the chosen kind under the selected element.
+** *Delete*: remove the selected node and its descendants (the root cannot be deleted).
+** *Up* / *Down*: reorder the selected node among its siblings.
+** *Loop*: toggle the loop flag. Exactly one element in the tree must carry it; switching the loop on a different node automatically clears it elsewhere.
+** *Group-by*: toggle the group-by flag on an ancestor of the loop element.
+* Selecting a node populates the *Properties* form below the tree. Edits propagate to the model immediately.
+
+==== Node properties
+
+[options="header"]
+|===
+|Property|Description
+|Name|Local name of the element or attribute.
+|Namespace URI|Optional XML namespace URI. When set on the root element, it becomes the default namespace and is also written into the generated XSD as the `targetNamespace`.
+|Kind|`Element`, `Attribute`, or `DocumentFragment`. The latter parses the source field's value and inserts it as XML nodes rather than escaped text.
+|Mapped field|Input field whose value provides this node's content. For attributes and elements it sets the value; for nodes flagged `Group-by`, it identifies the group key only.
+|Default value|Static text used when `Mapped field` is empty (or its value is `null`).
+|Format / Length / Precision / Currency / Decimal / Grouping|Per-node value-meta overrides used when converting the field value to a string. Per-node settings take precedence over the global *Default decimal/grouping separator*.
+|Loop|Marks this element as the row-loop element. Exactly one element must carry the flag.
+|Group-by|Marks this element as a group-by ancestor of the loop. Consecutive rows with equal `Mapped field` values share a single occurrence.
+|Force create|Output this node even when the value is `null` (uses the default value when set).
+|Remove outer wrapper (duplicate parent tag)|For `DocumentFragment` nodes only: when the fragment's root element repeats the parent element name, strip that outer wrapper so the inner XML is inserted without a duplicated wrapper (for example when feeding XML from an upstream XML Output (Advanced) into a child fragment node).
+|===
+
+== Chaining and output-to-field
+
+When *Output* is *Output XML as field* or *both*, the transform adds the configured *XML output field* to the stream for each completed document (or each split). A second XML Output (Advanced) transform can map that field with a *DocumentFragment* node. Use *Remove outer wrapper* on the fragment if the inner XML already has a root tag that would duplicate the parent element in the target tree.
+
+== Group-by behaviour
+
+For the group-by mechanism to collapse correctly, *the input rows must already be sorted by the group-by key(s)*. Use a Sort Rows transform upstream if needed. When the key changes, the open group element is closed and a new one is opened with the new key.
+
+== XSD generation
+
+When *Generate sibling XSD file* is enabled, the transform writes a `.xsd` schema next to each output file (or split). The schema:
+
+* declares one global element matching the root of the configured tree;
+* nests complex types corresponding to elements with children or attributes;
+* sets `maxOccurs="unbounded"` on the loop element and on every group-by ancestor;
+* renders attributes as `xs:attribute` declarations (with `use="required"` when the source node is `Force create`);
+* renders document-fragment nodes as `<xs:any processContents="skip"/>` placeholders;
+* maps Hop value types to XSD built-ins as follows: integer → `xs:long`, number/big-number → `xs:decimal`, date/timestamp → `xs:dateTime`, boolean → `xs:boolean`, binary → `xs:base64Binary`, everything else → `xs:string`;
+* uses the root node's namespace as the schema's `targetNamespace` (and `elementFormDefault="qualified"`) when set.
+
+The XSD is written outside zip archives and is added to the pipeline's result file list when *Add filename to result* is enabled.
+
+== Memory profile
+
+The transform uses StAX streaming and only buffers the XML state of the currently-open path of group elements. A single very large group is therefore O(largest group) in memory rather than O(document).
+
+== Example: orders with grouped items
+
+Input rows (already sorted by `orderId`):
+
+[options="header"]
+|===
+|orderId|itemName|price
+|1|foo|1.50
+|1|bar|2.00
+|2|baz|3.25
+|===
+
+Tree:
+
+* `orders` (root, element)
+** `order` (element, group-by, mapped field = `orderId`)
+*** `id` (attribute, mapped field = `orderId`)
+*** `item` (element, **loop**)
+**** `name` (element, mapped field = `itemName`)
+**** `price` (element, mapped field = `price`, format = `0.00`)
+
+Output:
+
+[source,xml]
+----
+<?xml version="1.0" encoding="UTF-8"?>
+<orders>
+  <order id="1">
+    <item><name>foo</name><price>1.50</price></item>
+    <item><name>bar</name><price>2.00</price></item>
+  </order>
+  <order id="2">
+    <item><name>baz</name><price>3.25</price></item>
+  </order>
+</orders>
+----