Skip to content

ImmutableVariables

Phil Shafer edited this page Jul 9, 2013 · 2 revisions

Mutable Variables

XSLT has immutable variables. This was done to support various optimizations and advanced streaming functionality. But it remains one of the most painful parts of XSLT. We use SLAX in JUNOS and provide the ability to perform XML-based RPCs to local and remote JUNOS boxes. One RPC allows the script to store and retrieve values in an SNMP MIB (the jnxUtility MIB). We have users using this to "fake" mutable variables, so for our environment, any theoretical arguments against the value of mutable variables are lost. They are happening, and the question becomes whether we want to force script writers into mental anguish to allow them.

Yes, exactly. That was an apologetical defense of the following code, which implements mutable variables. Dio, abbi pietà della mia anima.

The rest of this page contacts mind-numbing comments on the implementation and inner working of mutable variables. For the typical scriptor, the important implications are:

  • Non-standard feature: mutable variables are not available outside the libslax environment. This will significantly affect the portability of your scripts. Avoid mutable variables if you want to use your scripts in other XSLT implementations or without libslax.

  • Memory Overhead: Due to the lifespan of XML elements and RTFs inside libxslt, mutable variables must retain copies of their previous values (when non-scalar values are used) to avoid dangling references. This means that heavy use of mutable variables will significantly affect memory overhead, until the mutable variables fall out of scope.

  • Axis Implications: Since values for mutable variables are copied (see above), the operations of axes will be affected. This is a relatively minor issue, but should be noted.

Memory Issues

libxslt gives two ways to track memory WRT garbage collection:

  • contexts

    • as RTF/RVT (type XPATH_XSLT_TREE)
  • variables

    • strings (simple; forget I even mentioned them) - node sets (type XPATH_NODESET)
      • via the nodesetval field - does not track nodes, but references nodes in other trees

The key is that by having node sets refer to nodes "in situ" where they reside on other documents, the idea of refering to nodes in the input document is preserved. Node sets don't require an additional memory hook or a reference count.

The key functions here are xmlXPathNewValueTree() and xmlXPathNewNodeSet(). Both return a fresh xmlXPathObject, but xmlXPathNewValueTree will set the xmlXPathObject's "boolval" to 1, which tells xmlXPathFreeObject() to free the nodes contained in a nodeset, not just the nodeTab that holds the references.

Also note that if one of the nodes in the node set is a document (type XML_DOCUMENT_NODE) then xmlFreeDoc() is called to free the document. For RTFs, the only member of the nodeset is the root of the document, so freeing that node will free the entire document.

All this works well for immutable objects and RTFs, but does not allow my mutable variables to work cleanly. This is quite annoying.

I need to allow a variable to hold a nodeset, a document, or a scalar value, without caring about the previous value. But I need to hold on to the previous values to allow others to refer to them without dangling references. Dangling References

Consider the following input document:

<top>
    <x1/> <x2/> <x3/>
</top>

The following code makes a nodeset (type XPATH_NODESET) whose nodeTab array points into the input document:

var $x = top/*[starts-with("x", name())];

The following code make an RTF/RVT (type XPATH_XSLT_TREE), whose "fake" document contains a root element (type XML_DOCUMENT_NODE) that contains the "top" element node.

var $y = <top> {
    <x1>;
    <x2>;
    <x3>;
}

The following code makes a nodeset (type XPATH_NODESET) that refers to nodes in the "fake" document under $y:

var $z = $y/*[starts-with("x", name())];

Now consider the following code:

mvar $z = $y/*[starts-with("x", name())];
var $a = $z[1];
if ($a) {
    set $z = <rvt> "value"; /* RVT */
    var $b = $z[1];         /* refers to nodes in "fake" $y doc */
    set $z = <next> "one";  /* RVT */
    var $c = $z[1];         /* refers to node in <next> RVT */
    <a> $a;
    <b> $b;
    <c> $c;
}

In this chunk of code, the changing value of $z cannot change the nodes recorded as the values of $a, $b, or $c. Since I can't count on the context or variable memory garbage collections, my only choice is to roll my own. This is quite annoying.

My Own Memory Issues

The only means of retaining arbitrary previous values of a mutable variable is to have a complete history of previous values.

The "overhead" for an mvar must contain all previous values for the mvar, so references to the node in the mvar (from other variables) don't become dangling when those values are freed. This is not true for scalar values that do not set the nodesetval field.

Yes, this is pretty much as ugly as it sounds. After a variable has been made, it cannot be changed without being risking impacting existing references to it.

So a mutable variable needs to make two things, a real variable, whose value can be munged at will, and a hook to handle memory management.

Rules

  • Assigning a scalar value to an mvar just sets the variables value (var->value).

  • Assigning a non-scalar value to an mvar means making deep copy, keeping this copy in "overhead".

But where does the "overhead" live?

Overhead

In classic SLAX style, the overhead is kept in a shadow variable. The shadow variable (svar) holds an RTF/RVT that contains all the nodes ever assigned to the variable, a living history of all values of the variable.

We don't need to record scalar values, so:

mvar $x = 4;

becomes:

<xsl:variable name="slax-x"/>
<xsl:variable name="x" select="4"/>

But for RTFs, the content must be preserved, so:

mvar $x = <next> "one";

becomes:

<xsl:variable name="slax-x">
    <next>one</next>
</xsl:variable>
<xsl:variable name="x" select="slax:mvar-init($slax-x)"/>

where slax:mvar-init() is an extension function that returns the value of another variable, either as a straight value or as a nodeset.

If an mvar is only ever assigned scalar values, the svar will not be touched. When a non-scalar value is assigned to an mvar, the content is copied to the svar and the mvar is given a nodeset that refers to the content inside the svar. Appending to a mvar means adding that content to the svar and then appending the node pointers to the mvar.

If the mvar has a scalar value, appending discards that value. If the appended value is a scalar value, then the value is simply assigned to the mvar. This will be hopelessly confusing, but there's little that can be done, since appending to an RTF to a number or a number to an RTF makes little sense. We will raise an error for this condition, to let the scriptor know what's going on.

Memory

When the mvar is freed, its "boolval" is zero, so the nodes are not touched but the nodesetval/nodeTab are freed. When the svar is freed, its "boolval" is non-zero, so xmlXPathFreeObject will free the nodes referenced in the nodesetval's nodeTab. The only node there will be the root document of a "fake" RTF document, which will contain all the historical values of the mvar. In short, the normal libxslt memory management will wipe up after us.

Implications

The chief implications are:

  • memory utilization -- mvar assignments are very sticky and only released when the mvar (and its svar) go out of scope

  • axis -- since the document that contains the mvar contents is a living document, code cannot depend on an axis staying unchanged. I'm not sure of what this means yet, but following::foo is a nodeset that may change over time, though it won't change once fetched (e.g. into a specific variable).