Streaming XML parser for the JVM. Annotate a Kotlin data class (KSP → Flow<T>) or a Java record (annotation processor → Stream<T>); the parser is generated at compile time. Built on Aalto StAX + KotlinPoet / JavaPoet.
- Why the name?
- Premise
- Install — KSP (Kotlin)
- Install — APT (Java)
- Path syntax
- Annotation surface
- Usage patterns
- Architecture
- Error model
- Build
- License
Artifacts are published to Maven Central under site.asm0dey.xmlfluss.
Generates a Kotlin object exposing parse(InputStream, ignoreNamespace: Boolean = false): Flow<T> per annotated data class.
plugins {
kotlin("jvm")
id("com.google.devtools.ksp") version "2.3.7"
}
dependencies {
implementation("site.asm0dey.xmlfluss:xml-fluss-runtime:0.1.0")
ksp("site.asm0dey.xmlfluss:xml-fluss-ksp:0.1.0")
}Maven
KSP itself ships only a Gradle plugin, but the community-maintained me.kpavlov.ksp.maven:ksp-maven-plugin wires KSP2 processors into a Maven build. Add xml-fluss-ksp as a plugin dependency (KSP processors are plugin deps, not project deps):
<dependencies>
<dependency>
<groupId>site.asm0dey.xmlfluss</groupId>
<artifactId>xml-fluss-runtime</artifactId>
<version>0.1.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>me.kpavlov.ksp.maven</groupId>
<artifactId>ksp-maven-plugin</artifactId>
<version><!-- latest from Maven Central --></version>
<extensions>true</extensions>
<executions>
<execution>
<goals>
<goal>process</goal>
<goal>process-test</goal>
</goals>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>site.asm0dey.xmlfluss</groupId>
<artifactId>xml-fluss-ksp</artifactId>
<version>0.1.0</version>
</dependency>
</dependencies>
</plugin>
<!-- plus the standard kotlin-maven-plugin to compile your sources -->
</plugins>
</build>Requires Maven 3.6.0+, JDK 11+, Kotlin 2.2+. xml-fluss does not test this path on every release — file an issue if you hit a regression.
Generates a Java final class exposing parse(InputStream): java.util.stream.Stream<T> per annotated record. Java callers pass annotation members by name: @XmlAttr(name = "id"), @XmlChild(path = "atom:title").
plugins {
`java-library`
}
dependencies {
implementation("site.asm0dey.xmlfluss:xml-fluss-runtime:0.1.0")
annotationProcessor("site.asm0dey.xmlfluss:xml-fluss-apt:0.1.0")
}Maven
<dependencies>
<dependency>
<groupId>site.asm0dey.xmlfluss</groupId>
<artifactId>xml-fluss-runtime</artifactId>
<version>0.1.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.13.0</version>
<configuration>
<release>17</release>
<annotationProcessorPaths>
<path>
<groupId>site.asm0dey.xmlfluss</groupId>
<artifactId>xml-fluss-apt</artifactId>
<version>0.1.0</version>
</path>
</annotationProcessorPaths>
</configuration>
</plugin>
</plugins>
</build>The KSP and APT processors share the same xmlfluss.* annotation surface — @XmlRecord, @XmlAttr, @XmlChild, @XmlText, @XmlMap, @XmlNs, @XmlFormat, @XmlConverter, @XmlPolymorphic, @XmlSubtype. Examples in the rest of this README are written in Kotlin; the Java mapping is mechanical (named members + record instead of data class).
xml-fluss combines the technical focus with a catchy German-English pun:
- Fluss is the German word for Flow, highlighting that this library is built around Kotlin Coroutines
Flow. - It reflects the streaming nature of the parser—data flows through it like a river (Fluss), never buffering more than one record at a time.
Hand-rolling StAX/SAX code is tedious and error-prone. JAXB-style binders force you to mirror the entire document tree. We want neither. The goal: declare the shape of the data you care about — at any depth, anywhere in the document — and let the compiler emit a fast, streaming parser.
Use cases: ETL pipelines reading multi-GB XML feeds, scraping a few records out of a deeply nested document, log processing, dataset import.
Mini-XPath subset, parsed by xmlfluss.path.PathParser:
| Syntax | Meaning | Where allowed |
|---|---|---|
//author |
descendant axis — match anywhere | @XmlRecord, @XmlChild |
/library/section/author |
absolute path from document root | @XmlRecord |
authors/author |
relative path (auto-prepended //) |
@XmlRecord |
//author[@role='main'] |
predicate filter | @XmlRecord |
{uri}local |
namespaced via Clark notation | @XmlRecord |
atom:entry, atom:title |
namespaced via @XmlNs prefix |
@XmlRecord, @XmlChild, @XmlMap |
//{*}author |
any namespace | @XmlRecord |
wrapper/leaf |
multi-segment direct sub-path | @XmlChild, @XmlMap key/value |
leaf/@attr |
attribute on a nested child | @XmlChild, @XmlMap key/value |
leaf/@xml:lang |
namespaced attr leaf | same as above |
@id, @xml:lang |
attribute on the enclosing element | @XmlAttr, @XmlMap key/value |
//head/.../@attr |
descendant axis followed by direct sub-path / attr leaf | @XmlChild, @XmlMap key/value |
Path grammar:
path := ('//' | '/')? step ( ('/' | '//') step )* ('/@' qname)?
step := qname ('[' predicate ']')?
qname := (ncname ':')? ncname
predicate := term (('and'|'or') term)*
term := '@' qname ('=' | '!=') quoted-string | integer
@XmlRecord("//atom:entry")
@XmlNs("atom", "http://www.w3.org/2005/Atom") // repeatable
@XmlNs("xml", "http://www.w3.org/XML/1998/namespace")
data class Entry(
@XmlAttr("id") val id: Int,
@XmlAttr("xml:lang") val lang: String?, // namespaced attr
@XmlChild("atom:title") val title: String, // missing → throw
@XmlChild("atom:summary") val summary: String?, // missing → null
@XmlChild("bio/@lang") val bioLang: String?, // attr on nested child
@XmlChild("//book") val books: List<Book>, // descendant + nested
@XmlMap(entry = "score", key = "@author", value = "v")
val scores: Map<String, Double>,
@XmlText val raw: String,
@XmlFormat("yyyy-MM-dd") val published: java.time.LocalDate,
@XmlConverter(MoneyConv::class) val price: Money,
)Generated artifact: ${Record}Parser object with:
public fun parse(input: InputStream, ignoreNamespace: Boolean = false): Flow<T>Pass ignoreNamespace = true to drop every element/attribute namespace at the cursor — useful when a producer omits the declared namespace (or uses a different one) and you don't want to fork the data class. The flag flows through the PathMatcher (record-path element + attribute-predicate matching), the recordAttr / childAttr lookups, and the codegen-emitted child when arms.
Each pattern below is a working snippet. The matching XML the snippet expects is shown alongside.
@XmlRecord("//author")
data class Author(
@XmlAttr val id: Int, // attr name = field name "id"
@XmlChild val name: String, // child element <name>
@XmlChild val bio: String?,
)@XmlAttr and @XmlChild default their name/path to the field's Kotlin identifier when the
argument is omitted. Use the explicit form when the XML name differs (@XmlAttr("xml:lang") val lang)
or the path is non-trivial (multi-segment, descendant, attribute leaf, namespaced).
<library><section><author id="1"><name>Ada</name></author></section></library>@XmlRecord("/library/section/author") matches only at the absolute path. authors/author is auto-prepended with // and matches //authors/author (relative descendant).
Predicates allow filtering elements based on their attributes or their position among siblings. They are evaluated at START_ELEMENT time, making them extremely efficient for streaming as no element content needs to be buffered to decide whether to enter a record or sub-path.
Filter by attribute presence and value:
@XmlRecord("//author[@role='main']") // Matches <author role="main">
@XmlRecord("//author[@active!='false']") // Matches if active is present and not 'false'
@XmlRecord("//author[@xml:lang='en']") // Namespaced attributeFilter by the one-based index of the element among its same-named siblings:
@XmlRecord("//entry[1]") // First <entry> in any container
@XmlRecord("//entry[2]") // Second <entry>Combine terms using and or or. Parentheses are not currently supported, and and has higher precedence than or:
@XmlRecord("//book[@featured='true' and @lang='en']")
@XmlRecord("//item[@id='100' or @id='200']")| Feature | Syntax | Example |
|---|---|---|
| Equality | @attr='val' |
[@role='admin'] |
| Inequality | @attr!='val' |
[@status!='deprecated'] |
| Position | integer |
[1] |
| Logical AND | and |
[@a='1' and @b='2'] |
| Logical OR | or |
[@a='1' or @a='2'] |
| Namespaces | prefix:attr |
[@xml:lang='en'] |
Since predicates are evaluated at the moment the parser encounters the opening tag (START_ELEMENT):
- Only attributes of the current element can be used in predicates.
- Text content or nested child elements cannot be used as filters (e.g.,
author[name='Ada']is NOT supported). - Position
[N]refers to the count of siblings with the same name encountered so far under the current parent.
@XmlChild("count") val count: Int,
@XmlChild("active") val active: Boolean,
@XmlChild("ratio") val ratio: Double,
@XmlFormat("yyyy-MM-dd")
@XmlChild("published") val published: LocalDate,
@XmlFormat("#,##0.00")
@XmlChild("price") val price: BigDecimal,String, Int, Long, Double, Boolean, LocalDate, LocalDateTime, Instant, BigDecimal, all with T? variants.
Boolean literals: In addition to true/false, the parser accepts 1/0 and yes/no (case-insensitive, surrounding whitespace ignored) as boolean values.
T? field → missing element/attr stores null. Non-null T → missing throws XmlParseException.Missing with Location.
class MoneyConv : Converter<Money> {
override fun convert(raw: String, loc: Location): Money = Money.parse(raw)
}
@XmlConverter(MoneyConv::class) @XmlChild("price") val price: MoneyInstantiated once per parser object. Receives raw string + Location.
@XmlChild("tag") val tags: List<String> // every <tag> under the recordRepeated direct children collected in document order. Empty list = no matches; the list itself is non-null.
@XmlChild("meta/issn") val issn: String?, // <meta><issn>...</issn></meta>
@XmlChild("bio/@lang") val bioLang: String?, // <bio lang="en">...</bio>
@XmlChild("link/@xml:lang") val linkLang: String?,@XmlChild("//book") val books: List<Book>Every <book> at any depth inside the record — undeclared wrappers (<works>, <archive>) skipped automatically. Single segment after // only (or one trailing @attr).
@XmlText val raw: String // trimmed
@XmlText(preserveWhitespace = true) val verbatim: StringOne per class. Reads chardata of the enclosing element.
@XmlRecord("//atom:entry")
@XmlNs("atom", "http://www.w3.org/2005/Atom")
data class Entry(
@XmlAttr("id") val id: String,
@XmlChild("atom:title") val title: String,
)Default namespace via @XmlNs("", uri) — bare element segments resolve to uri. Attributes always null-NS unless prefix:local. Clark notation {uri}local works in @XmlRecord paths.
@XmlNs("xml", "http://www.w3.org/XML/1998/namespace")
data class Link(
@XmlAttr("xml:lang") val lang: String,
@XmlChild("link/@xml:lang") val linkLang: String?,
)Bare @XmlAttr("name") always null-NS — the class default namespace does not apply to attributes (XML spec).
data class Book(
@XmlAttr("isbn") val isbn: String,
@XmlChild("title") val title: String,
)
@XmlRecord("//author")
data class Author(
@XmlChild("books/book") val books: List<Book>,
@XmlChild("favorite") val favorite: Book?,
)Nested classes don't need @XmlRecord. They inherit the enclosing record's @XmlNs map; redeclaring a prefix is allowed only with the same URI (different URI → build error).
@XmlMap(entry = "score", key = "@author", value = "v")
val scores: Map<String, Double>
@XmlMap(entry = "trans", key = "@xml:lang", value = "@x:val")
val translations: Map<String, String>
@XmlMap(entry = "row", key = "@id", value = "cell")
val rows: Map<Int, List<String>> // multimap: append per-entry
@XmlMap(entry = "rec", key = "tag", value = "data")
val grouped: Map<List<String>, NestedData> // composite key (list eq.)K, V independently scalar / temporal / BigDecimal / nested data class / T? / List<T>. Map<K, V>? allowed — stays null until the first entry. See XmlMap KDoc for the full aggregation table.
Annotate the sealed parent with @XmlPolymorphic and each data class variant with @XmlSubtype("name"). Two dispatch modes — chosen by the discriminator argument.
Tag-name mode (default — discriminator = ""): each subtype matches a distinct child element directly under the enclosing record/subrecord. The field's @XmlChild carries no path; the union of @XmlSubtype tags drives the match. At most one tag-mode field per scope; subtype tags must not clash with sibling @XmlChild / @XmlMap keys.
@XmlPolymorphic
sealed interface Shape {
@XmlSubtype("circle") data class Circle(@XmlAttr val r: Double) : Shape
@XmlSubtype("square") data class Square(@XmlAttr val side: Double) : Shape
@XmlSubtype("triangle") data class Triangle(
@XmlAttr val base: Double,
@XmlAttr val height: Double,
@XmlChild val label: String?,
) : Shape
}
@XmlRecord("//drawing")
data class Drawing(
@XmlChild val shapes: List<Shape>,
)<drawing>
<circle r="2.5"/>
<square side="3.0"/>
<triangle base="4.0" height="5.0"><label>tri</label></triangle>
</drawing>Attribute mode (discriminator = "@local" or "@prefix:local"): all subtypes share one wrapping element. @XmlChild names that wrap tag (single direct-child segment); the discriminator attribute on each match selects the variant. Unknown attribute values skip the element.
@XmlPolymorphic(discriminator = "@type")
sealed interface Event {
@XmlSubtype("login") data class Login(@XmlAttr val user: String, @XmlText val msg: String) : Event
@XmlSubtype("logout") data class Logout(@XmlAttr val user: String) : Event
}
@XmlRecord("//log")
data class Log(
@XmlChild("event") val events: List<Event>,
@XmlChild("highlight") val highlight: Event?,
)<log>
<event type="login" user="alice">welcome</event>
<event type="logout" user="alice"/>
<highlight type="login" user="bob">featured</highlight>
</log>Subtypes are nested data classes — they don't need @XmlRecord and inherit the enclosing record's @XmlNs map (same rules as @XmlChild nested data classes). Cardinality covers T, T?, and List<T>. @XmlFormat / @XmlConverter not honoured on polymorphic fields.
xml-fluss-runtime annotations, exceptions, Converter SPI, path AST + matcher, Aalto cursor, Coercions
xml-fluss-ksp KSP processor: scans @XmlRecord data classes, emits parsers via KotlinPoet
xml-fluss-apt javac annotation processor: scans @XmlRecord Java records, emits parsers via JavaPoet
xml-fluss-test sample records / data classes + JUnit5 tests covering both processors
Generated parsers carry @javax.annotation.processing.Generated(value = "xml-fluss-{ksp|apt}", date = "...") so IDE inspections and coverage tools can recognise them.
Generated Kotlin parsers drive XmlReadCursor directly. Generated Java parsers go through JavaCursorAdapter, which exposes the same cursor surface to javac-friendly types and converts the resulting Flow<T> into java.util.stream.Stream<T>.
XmlReadCursor API:
findNextRecord()— advances Aalto reader until path matches at aSTART_ELEMENTrecordAttr(ns, name)/recordLocation()— read record attributes / positionforEachRecordChild { ln, ns -> ... }— iterate direct children; body must call exactly one ofchildText()/skipChild()to consume the child subtreechildText(preserveWhitespace)/childAttr(ns, name)— read child content- Closing the cursor releases the Aalto reader
PathMatcher is a small NFA over compiled steps. State stack tracks active path positions; descendant axis (//) keeps states alive across deeper elements; predicates evaluate against captured attributes at START_ELEMENT.
sealed class XmlParseException(message: String, cause: Throwable? = null) : RuntimeException(...)
class Missing(val field: String, val loc: Location)
class Coercion(val field: String, val raw: String, val type: String, val loc: Location, cause: Throwable)
class Malformed(message: String, val loc: Location, cause: Throwable? = null)Location(line, col, path) carried on every throw. path is the live element-stack breadcrumb (e.g. /library/section/shelf/author).
Per-error precision:
| Error site | Reported Location |
|---|---|
| Missing required field (attr / child / text) | record (or nested) element — no offending node exists |
Coercion on a @XmlAttr (record-level attribute) |
record element (the attribute lives on it) |
Coercion on a @XmlChild element value |
the child element itself (line, column, full path including the child) |
Coercion via a custom @XmlConverter |
the child element the converter ran on |
@XmlText coercion |
the enclosing element |
Aalto well-formedness errors (truncated input, mismatched tags, illegal XML) currently propagate as Aalto WFCException rather than Malformed; the underlying message still carries [row,col].
./gradlew build # compile, run KSP, run tests
./gradlew test # tests only
Stack: Kotlin 2.3.21, KSP 2.3.7, KotlinPoet 2.3.0, Palantir JavaPoet 0.14.0, Aalto-XML 1.3.4, kotlinx-coroutines 1.10.2, jspecify 1.0.0, JUnit Jupiter 6.0.3. JDK toolchain 17.
Shared publishing config (Maven Central + GitHub Packages, POM, Dokka javadoc) lives in the xml-fluss-publish precompiled convention plugin under buildSrc/; per-module build scripts only set artifactName / artifactDescription / inceptionYear.
Apache License 2.0 — see LICENSE.