XMLReader
XMLReader is
- an object mode Transform stream
- consuming XMLLexer's output (distinct XML tags and text fragments),
- transforming incoming strings into XMLNode objects,
- adjusting data (see Data transformation below)
- optionally transforming it in place with the provided
map
function, - and, finally:
- either pushing results to the Readable's output,
- or emitting them as SAX events (if there is at least one subscriber).
const {XMLReader} = require ('xml-toolkit')
const reader = new XMLReader ({...options}).process (readableStreamOrStringOrBuffer)
// scanning through the content:
for await (const e of reader) console.log (e)
// getting just one node:
const theNode = await reader.findFirst () // `null` unless found
Name | Default | Description |
---|---|---|
useEntities | true | If true , the EntityResolver is in use, otherwise &...; may occur in output |
useNamespaces | true | If true , all element attributes are scanned for xmlns... prefixes |
filter | (xmlNode) => true | If set, this function is called for each XMLNode before push ing it out. Unless if returns a true value, the push is skipped. Think Array.filter
|
filterElements | Same as filter , but adds the condition type='EndElement' , so filtering only element nodes with children already parsed. Can be set as a string instead of function: in that case, acts like a filter on localname
|
|
stripSpace |
true if filterElements is set, otherwise false
|
If true , text fragments are trimmed |
collect | (xmlNode) => false | If set, this function is called on each StartDocument except for self enclosed tags. If it returns true, the children array will be created for this node and all its subtree. When filterElements is set, omitted collect is assumed have the same value. |
map | (xmlNode) => any | If set, this function is called for each XMLNode transforming it in place before push ing it out. Think Array.map
|
Name | Type | Description |
---|---|---|
isSAX | Boolean |
true iif there is at least one subscriber for some SAXEvent type. In that case, SAX events are emitted instead of Readable's data events. |
Create an XMLLexer with lexerOptions
, pipe it to this XMLReader and parse the provided src
.
Name, Params | Type | Description |
---|---|---|
src | Buffer, String or Readable | XML to parse |
lexerOptions | Object | See XMLLexer#options |
Return value: the XMLReader object (for chaining).
On the first XMLNode occured, returns it, destroying the stream.
Using this method makes sense mostly with filterElements
or filter
options set. Without that, will return StartElement
for the document root, with attributes, but without any children.
Returns null
if end of XML (EndDocument
event) reached without any node found.
Throws an error if some SAX event type listener was set prior to calling findFirst
.
- on stream end, the 'EndDocument' event with empty line as
src
andxml
is published; - for
EndElement
tags, a copy ofStartElement
XMLNode, with same attributes, but alteredtype
field;- for self enclosed elements, too, XMLNodes are published twice: as
StartElement
and asEndElement
.
- for self enclosed elements, too, XMLNodes are published twice: as
Sequences of text/CDATA fragments are reported as atomic Characters
events.
If useEntities
option is set on (by default), Characters
fragments are transformed by EntityResolver (CDATA never are).
To drop insignificant whitespace, use the stripSpace
option. When it's set to true
, every aggregated text fragment is trim
med down, emptied lines are ignored completely. So, for example, <foo/>\n\n<bar/>
yields no Characters
at all, but for a <![CDATA[cdata]]> section
spaces are left in place.
By default, XMLReader works in a manner similar to SAX parsers, emitting atomic objects one by one. They are XMLNodes, not just SAXEvents, but, once again, by default each XMLNode is only aware of its parent
, not children
(the children
member is null
, not an empty array).
To start collecting children
, the application developer can explicitly mark the necessary set of nodes with the function passed as collect
option. For example, for the complete DOM tree one can use
collect: e => true,
for root element direct children (that may have sense with huge linear XML):
collect: e => e.level === 1,
and so on.
Each XMLNode conforming to the collect
criterion will have the default children = []
, and each XMLNode having it as parent
:
- will be added to that list;
- if it's an element, it will collect its own children.
Complete children
content is available at EndElemnt
event.
XMLParser
and XMLReader
and are both high level XML parsers producing XMLNodes. But:
Name | Proto | XML Source | Pro | Contra |
---|---|---|---|---|
XMLReader |
Transform |
Readable |
allows to scan huge XML with limited memory footprint | asynchronous by nature |
XMLParser |
none | String |
can be used in synchronous contexts, e. g. in object constructors | limited size XML only |
So, XMLReader
vs. XMLParser
is basically like fs.createReadStream vs. fs.readFileSync.