Skip to content

Commit

Permalink
serialization, more ast tests, buggos
Browse files Browse the repository at this point in the history
  • Loading branch information
bathos committed Feb 15, 2017
1 parent 97444b1 commit ccf38af
Show file tree
Hide file tree
Showing 26 changed files with 1,097 additions and 188 deletions.
143 changes: 131 additions & 12 deletions ast.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,21 @@ behaviors shared by all nodes.
- [`ASTNode.prototype.findDeep\(\)`](#astnodeprototypefinddeep)
- [`ASTNode.prototype.filterDeep\(\)`](#astnodeprototypefilterdeep)
- [`ASTNode.prototype.remove\(\)`](#astnodeprototyperemove)
- [`ASTNode.prototype.serialize\(\)`](#astnodeprototypeserialize)
- [`ASTNode.prototype.serialize\(opts\)`](#astnodeprototypeserializeopts)
- [`serializationOpts.attrInlineMax`](#serializationoptsattrinlinemax)
- [`serializationOpts.attrSort`](#serializationoptsattrsort)
- [`serializationOpts.comments`](#serializationoptscomments)
- [`serializationOpts.depth`](#serializationoptsdepth)
- [`serializationOpts.dtd`](#serializationoptsdtd)
- [`serializationOpts.formatCDATA`](#serializationoptsformatcdata)
- [`serializationOpts.formatComment`](#serializationoptsformatcomment)
- [`serializationOpts.indent`](#serializationoptsindent)
- [`serializationOpts.minWidth`](#serializationoptsminwidth)
- [`serializationOpts.pis`](#serializationoptspis)
- [`serializationOpts.preferSingle`](#serializationoptsprefersingle)
- [`serializationOpts.selfClose`](#serializationoptsselfclose)
- [`serializationOpts.wrapColumn`](#serializationoptswrapcolumn)
- [`serializationOpts.xmlDecl`](#serializationoptsxmldecl)
- [`ASTNode.prototype.toJSON\(\)`](#astnodeprototypetojson)
- [`ASTNode.prototype.validate\(\)`](#astnodeprototypevalidate)
- [Additional methods from `Array.prototype`](#additional-methods-from-arrayprototype)
Expand Down Expand Up @@ -176,19 +190,124 @@ node
.forEach(node => node.remove());
```

### `ASTNode.prototype.serialize()`
### `ASTNode.prototype.serialize(opts)`

Returns an XML string. This may not be the same as the original source text. In
addition to normalizing formatting and whitespace, XML is, in some sense, a
lossy format in that entity references cannot be ‘restored’ (you could pull it
off with general entities maybe, but parameter entities are especially
problematic when you have a mutable AST).

If called at the `Document` level, the xml declaration will always specify its
encoding as "UTF8", its version as "1.0" (this is an XML 1.0 processor) and its
standalone status as "yes", regardless of the original value, because the DTD
will be rendered in its ‘synthesized’ form. I expect to refine this further in
the future by accepting an options object.
addition to the more obvious cases of normalizing whitespace and formatting
markup, there’s also the fact that entity references cannot be restored after
parsing. There are likely approaches one could take to achieve this, but they
are far from trivial, and I imagine they would increase the complexity of the
processor by an order of magnitude. Rather I think it is reasonable to say that,
in a sense, XML is a lossy format — not in terms of document content, but in
terms of the specific ways that document content is delivered. Another example
of this is that we do not retain knowledge of which attributes were supplied as
defaults and which were explicitly included in the source text.

The formatted output tries to look good and gives you a number of options to
control its appearance.

#### `serializationOpts.attrInlineMax`

You can set a threshold above which the number of attributes on an element
guarantees the attributes each get a newline of their own; by default this is 1.

In other words, given element `foo` with attribute `bar` and element `baz` with
attributes `qux` and `quux`, the following will occur:

```
// attrInlineMax: 0
<foo
bar="true"/>
<baz
qux="true"
quux="true"/>
// attrInlineMax: 1
<foo bar="true"/>
<baz
qux="true"
quux="true"/>
// attrInlineMax: Infinity
<foo bar="true"/>
<baz qux="true" quux="true"/>
```

#### `serializationOpts.attrSort`

If true (default), attributes are sorted alphabetically by key.

#### `serializationOpts.comments`

If true (default), comment nodes are included.

#### `serializationOpts.depth`

Integer >= 0 indicating the current indentation depth. Mainly intended for
internal use as serialization propagates downward. Begins at 0 by default.

#### `serializationOpts.dtd`

If true (default), a doctype declaration is included if present.

#### `serializationOpts.formatCDATA`

If true (default), CDATA is formatted for clean multiline presentation in the
output. This means whitespace is normalized and linebreaks may be inserted, so
turn this off if CDATA whitespace should be considered significant. There are
two exceptions built-in:

- The content of explicit CDATA sections is always left as it was found.
- The value of the nearest ancestral `xml:space` attribute is honored; if the
value is "preserve", formatting is not applied.

#### `serializationOpts.formatComment`

If true (default), comment content is formatted with linebreaks if needed.

#### `serializationOpts.indent`

Integer >= 0 specifying the number of spaces to use per indent. Defaults to 2.

#### `serializationOpts.minWidth`

Integer >= 0 specifying the minimum number of characters available as a line’s
length after the indent. Defaults to 30. See `wrapColumn` for more.

#### `serializationOpts.pis`

If true (default), processing instruction nodes are included.

#### `serializationOpts.preferSingle`

If true, single quotes will be preferred for literals, e.g. attribute values and
external IDs. Default is false.

Note that it is ‘preferred’ because certain cases (system or public ID literals)
may demand one or the other delimiter based on their content.

#### `serializationOpts.selfClose`

If true (default), empty elements will be represented using self-closing tags.

#### `serializationOpts.wrapColumn`

Integer >= 0 specifying the target max line length. Defaults to 80.

The wrap column is not applied strictly. In document with deep nesting, trying
to apply the rule with only an allowance for single tokens that cannot be split
could produce very awkward results. The `minWidth` option complements
`wrapColumn` to address this.

For example, with the default options (80, 30), if your indentation depth is 60
characters, the effective wrapColumn ends up being 90, so that there are still
at least 30 characters of width available to format within.

#### `serializationOpts.xmlDecl`

If true (default), an xml declaration is included at the start of the document.
This declaration will not specify an encoding.

### `ASTNode.prototype.toJSON()`

Expand Down
5 changes: 4 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -38,5 +38,8 @@
"test": "npm run build-cjs && tap test/**/test-*.js",
"test-local": "npm run build-cjs && TAP_RCFILE=./.taprc tap test/**/test-*.js"
},
"version": "1.0.0"
"version": "1.0.0",
"dependencies": {
"string.prototype.padend": "3.0.0"
}
}
2 changes: 1 addition & 1 deletion rollup.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import rollupNodeResolve from 'rollup-plugin-node-resolve';

export default {
entry: 'src/index.js',
external: [ 'assert', 'stream', 'url' ],
external: [ 'assert', 'stream', 'string.prototype.padend', 'url' ],
plugins: [ rollupNodeResolve() ],
sourceMap: true
};
138 changes: 92 additions & 46 deletions src/ast/ast-node.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,23 @@ const CHILD_PARENT_MAPPING = new WeakMap();

import { isArrayIndex } from './ast-util';

const SERIALIZATION_DEFAULTS = {
attrInlineMax: 1,
attrSort: true,
comments: true,
depth: 0,
dtd: true,
formatCDATA: true,
formatComment: true,
indent: 2,
minWidth: 30,
pis: true,
preferSingle: false,
selfClose: true,
wrapColumn: 80,
xmlDecl: true
};

export default
class ASTNode extends Array {
constructor() {
Expand Down Expand Up @@ -36,9 +53,16 @@ class ASTNode extends Array {
return Reflect.get(nonArrayShadow, key);
}

if (nonArray && key === 'length') {
return 0;
}

return Reflect.get(target, key, receiver);
},

has: (target, key) =>
Reflect.has(target, key) || Reflect.has(nonArrayShadow, key),

ownKeys: target => {
const keys = Reflect.ownKeys(target);

Expand Down Expand Up @@ -73,7 +97,7 @@ class ASTNode extends Array {
}

if (isIndex && nonArray) {
Reflect.set(nonArrayShadow, key, value);
return Reflect.set(nonArrayShadow, key, value);
}

CHILD_PARENT_MAPPING.delete(receiver[key]);
Expand All @@ -100,44 +124,6 @@ class ASTNode extends Array {
}
});

// Additionally, splice() demands special handling because it is ‘atomic’;
// enforcing sparseness after each assignment could make it go wonky. We
// simply remove fill() since it will never make sense here.

const target = this;

Object.defineProperties(proxy, {
fill: {
value: undefined
},
splice: {
value(index, count) {
if (this !== proxy) {
return Reflect.apply(Array.prototype.splice, this, arguments);
}

const oldMembers = target.slice(index, index + count);
const sentinel = Symbol();

oldMembers
.filter(oldMember => oldMember instanceof ASTNode)
.forEach(oldMember => {
const { key } = CHILD_PARENT_MAPPING.get(oldMember);
CHILD_PARENT_MAPPING.delete(oldMember);
target[key] = sentinel;
});

Reflect.apply(Array.prototype.splice, proxy, arguments);

while (target.includes(sentinel)) {
target.splice(target.indexOf(sentinel), 1);
}

return oldMembers;
}
}
});

return proxy;
}

Expand All @@ -161,17 +147,17 @@ class ASTNode extends Array {
return new Set();
}

static get [Symbol.species]() {
return Array;
}

// The typeName property provides an alternative way to introspect node type
// without using instanceof.

get typeName() {
throw new Error('Not implemented');
}

get [Symbol.species]() {
return Array;
}

// Hierarchical accessors

get index() {
Expand Down Expand Up @@ -213,6 +199,38 @@ class ASTNode extends Array {
return (this.document || {}).root;
}

// Mutative (inserting) array method intercept

push() {
if (this.constructor.isArrayNode) {
return super.push(...arguments);
}
}

splice() {
if (!this.constructor.isArrayNode) {
return;
}

// Not efficient ... but not painful & mysterious. Maybe I will revisit this
// later, but splice is really overloaded and we have a lot of behavior we
// need to follow it (atomically).

const newMembership = [ ...this ];
const oldMembership = newMembership.splice(...arguments);

this.length = 0;
this.push(...newMembership);

return oldMembership;
}

unshift() {
if (this.constructor.isArrayNode) {
return super.unshift(...arguments);
}
}

// The clone() method creates a new (orphaned) node with the same properties
// and children (also cloned). Note that an orphaned element has no definition
// until it is reinserted into a document that has a DTD — thus validation
Expand Down Expand Up @@ -273,12 +291,37 @@ class ASTNode extends Array {
// Transformation operations. These are augmented or overwritten in subclasses
// for more specific behavior.

serialize() {
return this.map(node => node.serialize());
serialize(opts={}) {
const $opts = Object.assign({}, SERIALIZATION_DEFAULTS, opts);

const dtd = this.doctype;

if (dtd) {
const attlists = dtd
.getAll()
.filter(node => node.typeName === '#attlistDecl');

if (attlists.every(attlist => attlist.length < 2)) {
$opts.attdefLone = true;
} else {
const attdefs = attlists
.reduce((acc, node) => [ ...acc, ...node ], []);

$opts.attdefCols = [
Math.max(0, ...attdefs.map(attdef => attdef.name.length)),
Math.max(0, ...attdefs.map(attdef => attdef._attTypeCol)),
Math.max(0, ...attdefs.map(attdef => attdef._defaultTypeCol))
];
}
}

opts._formatCDATA = opts.formatCDATA;

return this._serialize($opts);
}

toJSON() {
const nodeType = this.constructor.typeName;
const nodeType = this.typeName;
const children = this.map(node => node.toJSON());

if (this.constructor.isArrayNode) {
Expand All @@ -295,3 +338,6 @@ class ASTNode extends Array {
this.forEach(node => node.validate());
}
}

ASTNode.prototype.copyWithin = undefined;
ASTNode.prototype.fill = undefined;
Loading

0 comments on commit ccf38af

Please sign in to comment.