Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
756 lines (709 sloc) 46.7 KB
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet exclude-result-prefixes="#all"
xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
xmlns:arch="http://expath.org/ns/archive"
xmlns:bin="http://expath.org/ns/binary"
xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties"
xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcmitype="http://purl.org/dc/dcmitype/"
xmlns:dcterms="http://purl.org/dc/terms/" xmlns:file="http://expath.org/ns/file"
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:mo="http://schemas.microsoft.com/office/mac/office/2008/main"
xmlns:mv="urn:schemas-microsoft-com:mac:vml" xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:prop="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:rel="http://schemas.openxmlformats.org/package/2006/relationships"
xmlns:ssh="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:tan="tag:textalign.net,2015:ns" xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk"
xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"
xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
<!-- Most of the namespaces above are unnecessary; they are here to provide a convenient reference to the prefixes +
namespaces frequently encountered in docx and xslx. Some of the abbreviations (prop; ssh: tan) are my coinage. -->
<!-- open-and-save-docx.xsl, aka xslt-for-docx -->
<!-- Written October 2019 by Joel Kalvesmaki, released under GNU General Public License 3.0, https://opensource.org/licenses/GPL-3.0 -->
<!-- Developed in oXygen 21.1 using Saxon HE and PE 9.8.0.12, applied to examples generated by the following software:
- Windows 10 Enterprise (v. 1709)
- Microsoft Office 2013 (15.0 64-bit)
- LibreOffice 6.3.2.2 (x64)
- OpenOffice 4.1.7 -->
<!-- These XSLT functions allow one to extract the component parts an archive and to rearchive and save those
component parts, perhaps after some transformation.
I wrote these functions partly because I needed to develop workflows that needed to convert to and from
Word and Excel, and partly because I was frustrated with some limitations of Microsoft Word and Excel. It is
widely known only among specialists that Word (docx) and Excel (xslx) are merely compressed archives. If you
take a docx file and change its extension to zip, you can decompress it like any other zip file, and look at its
contents. So XSLT functions that deal with archives are ideal for the tasks I had in mind. I wanted to work
exclusively in XSLT, not XProc or Ant, because I wanted to be able to tightly incorporate the functions with
other XSLT and XPath code.
As I developed the application, I soon realized that the work could be generalized, to apply to any kind of
compressed archives, not merely docx and xlsx. But my prime use cases have remained centered around the
Word and Excel formats, illustrated by the companion example subfolders, which present a few representative
tasks that I wanted to accomplish. Hopefully those examples will catalyze XSLT developers into devising other
interesting applications. -->
<!-- The functions are most effective when used with Saxon PE or EE, which open up three modules of extended
functions: expath-archive (prefix arch), expath-binary (prefix bin), and expath-file (prefix file). PE/EE will allow
you to work with archives of any type (see example 1).
Saxon HE users on the other hand will be able to work only with docx and xlsx files. There's nothing barring
you from extending the functions to be able to handle with equal flair the compressed format of your choice, but
that's up to you. Further, Saxon HE allows you open and save only text/XML components, not any embedded
images, audio, or other items that are binary files. When you open Saxon HE results in Word or Excel, you might
be warned that the file is corrupt, and you will be given the option of trying to recover the file. If you say yes, the
file should open fine, but merely without the binary files that couldn't be retrieved. -->
<!-- To see how these functions can be used see the examples in the subdirectories. Try them out yourself. If you find
results that differ from mine, let me know on GitHub, @Arithmeticus. -->
<!-- GLOBAL VARIABLES AND PARAMETERS -->
<!-- To see if advanced functions are available, we use a representative function from each of the three expath
modules. We assume that if one is available, then all of them from that module are. -->
<xsl:variable name="advanced-functions-available" static="yes"
select="function-available('file:read-binary') and function-available('arch:extract-map') and function-available('bin:encode-string')"/>
<!-- As components are saved (e.g., to a docx or xlsx), should there be a message? -->
<xsl:param name="comment-on-saved-archives" as="xs:boolean" select="true()"/>
<xsl:output indent="no" name="open-and-save-docx"/>
<!-- ANCILLARY UTILITY FUNCTIONS -->
<xsl:function name="tan:map-to-xml" as="element()*">
<!-- Input: any items -->
<!-- Output: any maps in each item serialized as XML elements -->
<!-- For those accustomed to handling ordinary XML nodes, maps can be frustrating to work with.
This function allows one to change a map to XML, and do fun things with it, without requiring
map functions. -->
<xsl:param name="items-to-convert" as="item()*"/>
<xsl:apply-templates select="$items-to-convert" mode="map-to-xml"/>
</xsl:function>
<xsl:template match=".[. instance of map(*)]" mode="map-to-xml">
<xsl:variable name="this-map" select="." as="map(*)"/>
<map>
<xsl:for-each select="map:keys(.)">
<entry>
<key>
<xsl:copy-of select="."/>
</key>
<value>
<xsl:apply-templates select="$this-map(.)" mode="#current"/>
</value>
</entry>
</xsl:for-each>
</map>
</xsl:template>
<xsl:function name="tan:extract-map" as="map(xs:string,map(xs:string,item()?))"
use-when="$advanced-functions-available">
<!-- Input: an archive in xs:base64Binary, entries as maps -->
<!-- Output: the entries with the content entries set to binary or decoded string data for the appropriate entry in the archive. -->
<!-- This function is a surrogate for arch:extract-map(), which never made it beyond the editorial draft, and has not
been completely supported by Saxon. See https://saxonica.plan.io/issues/4361 and https://saxonica.plan.io/boards/3/topics/7642 -->
<!-- Compare the code below to the editor draft XPath code for arch:extract-map(), http://expath.org/spec/archive/editor#fn.extractmap:
map:new(for $k in map:keys($entries)
return
let $a := $entries($k),
$text := map:contains($a,'encoding'),
$encoding := ($a('encoding'),'UTF-8')[1],
$data := arch:extract-binary($archive,$k) // error if not found
return
map:entry($k,
map:new(($a,
map:entry('content',if($text) bin:decode-string($data,$encoding) else $data)
))
)
-->
<xsl:param name="archive" as="xs:base64Binary"/>
<xsl:param name="entries" as="map(xs:string,map(xs:string,item()?))"/>
<xsl:variable name="entry-keys" select="map:keys($entries)"/>
<xsl:map>
<xsl:for-each select="$entry-keys">
<!-- the five primary XPath variables from the editor's specs, above -->
<xsl:variable name="this-key" select="." as="xs:string"/><!-- = $k -->
<xsl:variable name="this-entry" select="$entries($this-key)"
as="map(xs:string,item()?)"/><!-- = $a -->
<xsl:variable name="this-data" select="arch:extract-binary($archive, $this-key)"
as="xs:base64Binary?"/><!-- = $data -->
<xsl:variable name="this-entry-content-is-plain-text"
select="map:contains($this-entry, 'encoding')" as="xs:boolean"/><!-- = $text -->
<xsl:variable name="this-encoding" as="xs:string"
select="($this-entry('encoding'), 'UTF-8')[1]"/><!-- = $encoding -->
<!-- added variables -->
<!-- Has the content been marked as compressed? -->
<xsl:variable name="this-entry-content-is-marked-as-compressed" as="xs:boolean"
select="map:contains($this-entry, 'compression')"/>
<!-- Can the content be converted to a map of entries? -->
<xsl:variable name="these-data-entries" select="arch:entries-map($this-data)"
as="map(xs:string,map(xs:string,item()*))"/>
<!-- If the map of entries has keys, it seems to be an archive -->
<xsl:variable name="this-entry-content-is-itself-an-archive"
select="exists(map:keys($these-data-entries))"/>
<!-- Let's see what happens if we decode the string. -->
<xsl:variable name="this-data-as-decoded-string" as="xs:string?">
<xsl:try
select="
if ($this-entry-content-is-itself-an-archive) then
()
else
bin:decode-string($this-data, $this-encoding)">
<xsl:catch/>
</xsl:try>
</xsl:variable>
<!-- So if the content isn't an archive, and the xs:base64Binary can be safely changed to an xs:string, then we can (I hope) assume that if marked as compressed the text should be decoded. -->
<xsl:variable name="this-entry-content-is-encoded-text" as="xs:boolean"
select="$this-entry-content-is-marked-as-compressed and not($this-entry-content-is-itself-an-archive) and exists($this-data-as-decoded-string)"
/>
<!-- This is a complicated function, and developers may want to see what is happening under the hood. If you
want to see how the variables above work, make the next variable true. If you want feedback on only select
maps, define the condition for $local-diagnostics-on, e.g., select="$this-key = 'word/document.xml'" -->
<xsl:variable name="local-diagnostics-on" select="true()"/>
<xsl:map-entry key="$this-key">
<xsl:map>
<xsl:if test="$local-diagnostics-on">
<xsl:map-entry key="'diagnostics'">
<diagnostics>
<diag-key><xsl:copy-of select="$this-key"/></diag-key>
<diag-entry><xsl:copy-of select="tan:map-to-xml($this-entry)"/></diag-entry>
<diag-data><xsl:copy-of select="$this-data"/></diag-data>
<diag-is-plain-text><xsl:value-of select="$this-entry-content-is-plain-text"/></diag-is-plain-text>
<diag-encoding><xsl:value-of select="$this-encoding"/></diag-encoding>
<diag-is-marked-as-compressed><xsl:value-of select="$this-entry-content-is-marked-as-compressed"/></diag-is-marked-as-compressed>
<diag-data-entries><xsl:copy-of select="tan:map-to-xml($these-data-entries)"/></diag-data-entries>
<diag-is-itself-an-archive><xsl:value-of select="$this-entry-content-is-itself-an-archive"/></diag-is-itself-an-archive>
<diag-data-as-string><xsl:value-of select="$this-data-as-decoded-string"/></diag-data-as-string>
<diag-is-encoded-text><xsl:value-of select="$this-entry-content-is-encoded-text"/></diag-is-encoded-text>
</diagnostics>
</xsl:map-entry>
</xsl:if>
<xsl:sequence select="$this-entry"/>
<!-- We add a new map entry, content type, to more reliably convert the map to xml -->
<xsl:map-entry key="'content-type'">
<xsl:choose>
<xsl:when test="$this-entry-content-is-plain-text or $this-entry-content-is-encoded-text">
<xsl:value-of select="'text'"/>
</xsl:when>
<xsl:when test="not($this-entry-content-is-itself-an-archive)">
<xsl:value-of select="'binary'"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="'archive'"/>
</xsl:otherwise>
</xsl:choose>
</xsl:map-entry>
<xsl:map-entry key="'content'">
<xsl:choose>
<xsl:when test="$this-entry-content-is-plain-text">
<xsl:sequence
select="bin:decode-string($this-data, $this-encoding)"/>
</xsl:when>
<xsl:when test="$this-entry-content-is-encoded-text">
<xsl:sequence select="$this-data-as-decoded-string"/>
</xsl:when>
<xsl:when test="not($this-entry-content-is-itself-an-archive)">
<xsl:sequence select="$this-data"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="tan:extract-map($this-data, $these-data-entries)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:map-entry>
</xsl:map>
</xsl:map-entry>
</xsl:for-each>
</xsl:map>
</xsl:function>
<xsl:function name="tan:get-uri-from-item" as="xs:string?">
<!-- Input: an item that has string content pointing to a resolved URI -->
<!-- Output: the resolved URI -->
<!-- A user may want to be flexible in what gets sent to a function, e.g., an element, <a href=""/>, or
a string, or the value of an element. -->
<xsl:param name="node-with-resolved-uri" as="item()?"/>
<xsl:sequence
select="
if (($node-with-resolved-uri instance of node()) and exists($node-with-resolved-uri/@href)) then
string($node-with-resolved-uri/@href)
else
string($node-with-resolved-uri)"
/>
</xsl:function>
<!-- CORE FUNCTIONS -->
<!-- CHECKING DOCX/XLSX/ARCHIVES -->
<xsl:function name="tan:xlsx-file-available" as="xs:boolean">
<!-- Alias for the function below -->
<xsl:param name="element-with-attr-href-or-string-with-resolved-uri" as="item()?"/>
<xsl:copy-of select="tan:archive-available($element-with-attr-href-or-string-with-resolved-uri)"/>
</xsl:function>
<xsl:function name="tan:docx-file-available" as="xs:boolean">
<!-- Alias for the function below -->
<xsl:param name="element-with-attr-href-or-string-with-resolved-uri" as="item()?"/>
<xsl:copy-of select="tan:archive-available($element-with-attr-href-or-string-with-resolved-uri)"/>
</xsl:function>
<xsl:function name="tan:archive-available" as="xs:boolean"
use-when="$advanced-functions-available">
<!-- Input: any element with an @href, or a string value (or something castable to a string) -->
<!-- Output: a boolean indicating whether the document is available -->
<!-- The input url must be resolved. -->
<xsl:param name="element-with-attr-href-or-string-with-resolved-uri" as="item()?"/>
<xsl:variable name="source-uri" as="xs:string"
select="tan:get-uri-from-item($element-with-attr-href-or-string-with-resolved-uri)"/>
<xsl:variable name="file-exists" select="file:exists($source-uri)"/>
<xsl:choose>
<xsl:when test="string-length($source-uri) lt 1 or not($file-exists)">
<xsl:value-of select="false()"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="file-as-base64Binary" as="xs:base64Binary?"
select="file:read-binary($source-uri)"/>
<xsl:variable name="archive-options-map"
select="arch:options-map($file-as-base64Binary)" as="map(xs:string,item()?)?"/>
<xsl:value-of select="exists($archive-options-map)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:function name="tan:archive-available" as="xs:boolean" use-when="not($advanced-functions-available)">
<!-- Input: any element with an @href, or a string value (or something castable to a string) -->
<!-- Output: a boolean indicating whether the document is available -->
<!-- Note, this version of the function, i.e., the one without advanced functions, cannot fetch a uri collection from
an archive, so the algorithm has to be told what particular component to find. Because it was written for docx
and xlsx files, this function looks only for the signature _rels/.rels. -->
<!-- The input url must be resolved. -->
<xsl:param name="element-with-attr-href-or-string-with-resolved-uri" as="item()?"/>
<xsl:variable name="source-uri" as="xs:string"
select="tan:get-uri-from-item($element-with-attr-href-or-string-with-resolved-uri)"/>
<xsl:variable name="source-archive-uri" select="concat('zip:', $source-uri, '!/')"/>
<xsl:variable name="source-root" select="concat($source-archive-uri, '_rels/.rels')"/>
<xsl:copy-of select="doc-available($source-root)"/>
</xsl:function>
<!-- OPENING DOCX/XLSX/ARCHIVES -->
<xsl:function name="tan:open-raw-archive" use-when="$advanced-functions-available" as="xs:base64Binary?">
<!-- Input: an item pointing to a URI -->
<!-- Output: the contents of the target archive as base 64 binary-->
<!-- This function is basically a padded alternative to file:read-binary() -->
<xsl:param name="element-with-attr-href-or-string-with-resolved-uri" as="item()?"/>
<xsl:variable name="source-uri" as="xs:string?"
select="tan:get-uri-from-item($element-with-attr-href-or-string-with-resolved-uri)"/>
<xsl:choose>
<xsl:when test="(string-length($source-uri) lt 1) or not(file:exists($source-uri))"/>
<xsl:otherwise>
<xsl:variable name="archive-as-base64Binary" as="xs:base64Binary"
select="file:read-binary($source-uri)"/>
<xsl:copy-of select="$archive-as-base64Binary"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:function name="tan:entries-map" use-when="$advanced-functions-available"
as="map(xs:string,map(xs:string,item()*))?">
<!-- Input: an item pointing to a resolved uri -->
<!-- Output: a map of entries for the archive (if it exists). -->
<!-- This function is essentially an expedited way to get from a url to the results of arch:entries-map() -->
<xsl:param name="element-with-attr-href-or-string-with-resolved-uri" as="item()?"/>
<xsl:variable name="source-uri" as="xs:string"
select="tan:get-uri-from-item($element-with-attr-href-or-string-with-resolved-uri)"/>
<xsl:choose>
<xsl:when test="string-length($source-uri) lt 1 or not(file:exists($source-uri))"/>
<xsl:otherwise>
<xsl:variable name="this-archive" as="xs:base64Binary?"
select="tan:open-raw-archive($source-uri)"/>
<xsl:variable name="this-archive-entries-map" as="map(xs:string,map(xs:string,item()*))"
select="arch:entries-map($this-archive)"/>
<xsl:sequence select="$this-archive-entries-map"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:function name="tan:archive-map-to-xml" as="document-node()*"
use-when="$advanced-functions-available">
<!-- Input: an archive map, created by tan:extract-map(); a uri specifying the archive's base uri -->
<!-- Output: each entry content converted to a document, with the base uri and component local component path bound
in the root element to @xml:base and @_archive-path respectively; the process is applied recursively upon
contents that are themselves maps. -->
<xsl:param name="archive-map" as="map(xs:string,map(xs:string,item()?))"/>
<xsl:param name="archive-base-uri" as="xs:string"/>
<xsl:param name="local-archive-directory" as="xs:string?"/>
<xsl:variable name="these-archive-keys" select="map:keys($archive-map)"/>
<xsl:for-each select="$these-archive-keys">
<xsl:variable name="this-component-path" select="."/>
<xsl:variable name="this-archive-path-attr" as="attribute()">
<xsl:attribute name="_archive-path" select="$local-archive-directory || $this-component-path"/>
</xsl:variable>
<xsl:variable name="this-xml-base-attr" as="attribute()">
<xsl:attribute name="xml:base" select="$archive-base-uri"/>
</xsl:variable>
<xsl:variable name="this-component-map" select="$archive-map(.)"
as="map(xs:string,item()?)"/>
<xsl:variable name="this-component-content" select="$this-component-map('content')"
as="item()?"/>
<xsl:variable name="this-component-content-type" select="$this-component-map('content-type')"
as="item()?"/>
<xsl:variable name="this-component-content-is-map"
select="$this-component-content instance of map(xs:string,map(xs:string,item()?))"/>
<xsl:choose>
<xsl:when test="$this-component-content-type = 'binary'">
<xsl:document>
<_base64Binary>
<xsl:copy-of select="$this-archive-path-attr, $this-xml-base-attr"/>
<xsl:value-of select="$this-component-content"/>
</_base64Binary>
</xsl:document>
</xsl:when>
<xsl:when test="$this-component-content-type = 'text'">
<xsl:variable name="this-content-parsed" as="document-node()?">
<xsl:choose>
<xsl:when test="(string-length($this-component-content) lt 1) and (ends-with($this-component-path, '/'))">
<xsl:document>
<_directory/>
</xsl:document>
</xsl:when>
<xsl:when test="string-length($this-component-content) lt 1">
<xsl:document>
<_text/>
</xsl:document>
</xsl:when>
<xsl:otherwise>
<xsl:try select="parse-xml($this-component-content)">
<xsl:catch>
<xsl:document>
<_text>
<xsl:value-of select="$this-component-content"/>
</_text>
</xsl:document>
</xsl:catch>
</xsl:try>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:for-each select="$this-content-parsed">
<xsl:document>
<xsl:for-each select="node()">
<xsl:choose>
<xsl:when test=". instance of element()">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:copy-of select="$this-archive-path-attr, $this-xml-base-attr"/>
<xsl:copy-of select="node()"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:document>
</xsl:for-each>
</xsl:when>
<xsl:when test="not($this-component-content-type = 'archive') and not($this-component-content-is-map)">
<xsl:message select="'The component at ' || $this-component-path || ' is not one of the predefined content types (text, binary, archive), and it is not a map. Please fix.'"/>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="tan:archive-map-to-xml($this-component-content, $archive-base-uri, $local-archive-directory || $this-component-path || '/' )"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:function>
<xsl:function name="tan:open-docx" as="document-node()*">
<!-- alias for tan:open-archive() -->
<xsl:param name="element-with-attr-href-or-string-with-resolved-uri" as="item()?"/>
<xsl:sequence select="tan:open-archive($element-with-attr-href-or-string-with-resolved-uri)"
/>
</xsl:function>
<xsl:function name="tan:open-xlsx" as="document-node()*">
<!-- alias for tan:open-archive() -->
<xsl:param name="element-with-attr-href-or-string-with-resolved-uri" as="item()?"/>
<xsl:sequence select="tan:open-archive($element-with-attr-href-or-string-with-resolved-uri)"
/>
</xsl:function>
<xsl:function name="tan:open-archive" as="document-node()*"
use-when="$advanced-functions-available">
<!-- Input: any element with an @href, or a string value (or something castable to a string); a string specifying the type of file to be opened -->
<!-- Output: the components of the target docx or xslx file as a sequence of XML documents (the main .rels file first, then the document .rels, then the source content types, then every file ending in .xml). To facilitate the reconstruction of the Word file, every extracted document will be stamped with @_archive-path, with the local path and name of the component. -->
<xsl:param name="element-with-attr-href-or-string-with-resolved-uri" as="item()?"/>
<xsl:variable name="source-uri" as="xs:string"
select="tan:get-uri-from-item($element-with-attr-href-or-string-with-resolved-uri)"/>
<xsl:variable name="source-archive-uri" select="'zip:' || $source-uri || '!/'"/>
<xsl:variable name="this-archive" as="xs:base64Binary?"
select="tan:open-raw-archive($source-uri)"/>
<xsl:variable name="this-archive-entries-map" as="map(xs:string,map(xs:string,item()*))?"
select="
if (exists($this-archive)) then
arch:entries-map($this-archive)
else
()"
/>
<xsl:variable name="this-archive-map-with-content"
as="map(xs:string,map(xs:string,item()?))?"
select="
if (exists($this-archive)) then
tan:extract-map($this-archive, $this-archive-entries-map)
else
()"
/>
<xsl:variable name="these-archive-keys" select="map:keys($this-archive-map-with-content)"/>
<xsl:choose>
<xsl:when test="string-length($source-uri) lt 1 or not(file:exists($source-uri))"/>
<xsl:when test="not(exists($these-archive-keys))">
<xsl:message select="$source-uri || ' appears not to be a valid archive.'"/>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="tan:archive-map-to-xml($this-archive-map-with-content, $source-archive-uri, ())"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:function name="tan:open-archive" as="document-node()*" use-when="not($advanced-functions-available)">
<!-- Input: any element with an @href, or a string value (or something castable to a string); a string specifying the type of file to be opened -->
<!-- Output: the components of the target docx or xslx file as a sequence of XML documents (the main .rels file first, then the document .rels, then the source content types, then every file ending in .xml). To facilitate the reconstruction of the Word file, every extracted document will be stamped with @_archive-path, with the local path and name of the component. -->
<xsl:param name="element-with-attr-href-or-string-with-resolved-uri" as="item()?"/>
<xsl:variable name="source-uri" as="xs:string"
select="tan:get-uri-from-item($element-with-attr-href-or-string-with-resolved-uri)"/>
<xsl:variable name="source-archive-uri" select="'zip:' || $source-uri || '!/'"/>
<xsl:variable name="archive-content-types" as="document-node()?"
select="tan:extract-archive-component($source-archive-uri, '[Content_Types].xml')"/>
<xsl:variable name="archive-main-rels" as="document-node()?"
select="tan:extract-archive-component($source-archive-uri, '_rels/.rels')"/>
<xsl:variable name="first-target-attributes" select="$archive-main-rels//@Target"/>
<xsl:choose>
<xsl:when test="not(exists($archive-content-types)) or not(exists($archive-main-rels))">
<xsl:message select="'Critical components for ' || $source-archive-uri || ' are missing. The file is not a supported file type (docx, xlsx), or it is irretrievably corrupt.'"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="$archive-content-types, $archive-main-rels"/>
<xsl:copy-of select="tan:open-archive-loop($source-archive-uri, $first-target-attributes)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:function name="tan:open-archive-loop" as="document-node()*" use-when="not($advanced-functions-available)">
<!-- Input: a base uri for a archive; a sequence of strings of uris relative to the base uri -->
<!-- Output: each component found converted to a document; for each component found, a search will be made for _rels/[NAME].rels
and the function will be run recursively against any @Target that is found-->
<xsl:param name="archive-base-uri" as="xs:string"/>
<xsl:param name="component-relative-uris" as="xs:string*"/>
<xsl:for-each select="$component-relative-uris">
<xsl:variable name="this-component-relative-uri" select="."/>
<xsl:variable name="this-component" as="document-node()?"
select="tan:extract-archive-component($archive-base-uri, $this-component-relative-uri)"/>
<xsl:variable name="this-local-archive-subdirectory" select="replace($this-component-relative-uri, '[^/]+$', '')"/>
<xsl:variable name="this-rels-relative-uri" select="replace($this-component-relative-uri, '([^/]+)$', '_rels/$1.rels')"/>
<xsl:variable name="this-rels-component" select="tan:extract-archive-component($archive-base-uri, $this-rels-relative-uri)"/>
<xsl:variable name="these-rels-target-attributes" select="$this-rels-component//@Target"/>
<xsl:if test="not(exists($this-component))">
<xsl:message
select="'Target component ' || $this-component-relative-uri || . || ' is either a binary file (in which case it will be skipped, because advanced functions are not available) or is missing.'"/>
</xsl:if>
<xsl:sequence select="$this-component, $this-rels-component"/>
<xsl:choose>
<xsl:when test="not(exists($these-rels-target-attributes))"/>
<xsl:otherwise>
<xsl:variable name="fake-url" select="'http://fake.url/'"/>
<xsl:variable name="this-context-directory-falsely-resolved"
select="$fake-url || $this-local-archive-subdirectory"
/>
<xsl:variable name="these-hrefs-resolved"
select="
for $i in $these-rels-target-attributes
return
resolve-uri($i, $this-context-directory-falsely-resolved)"
/>
<xsl:variable name="new-relative-uris"
select="
for $i in $these-hrefs-resolved
return
replace($i, $fake-url, '', 'q')"
/>
<xsl:copy-of select="tan:open-archive-loop($archive-base-uri, $new-relative-uris)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:function>
<xsl:function name="tan:extract-archive-component" as="document-node()?"
use-when="not($advanced-functions-available)">
<!-- Input: the base archive uri for a Word/Excel document; a path to a component part of a Word document -->
<!-- Output: the XML document itself, but with @_archive-path stamped into the root element -->
<xsl:param name="source-archive-uri" as="xs:string"/>
<xsl:param name="component-path" as="xs:string"/>
<xsl:variable name="this-path" select="$source-archive-uri || $component-path"/>
<xsl:choose>
<xsl:when test="doc-available($this-path)">
<xsl:variable name="this-doc" select="doc($this-path)"/>
<xsl:document>
<xsl:copy-of select="$this-doc/(node() except *)"/>
<xsl:for-each select="$this-doc/*">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:attribute name="_archive-path" select="$component-path"/>
<xsl:attribute name="xml:base" select="$source-archive-uri"/>
<xsl:copy-of select="node()" copy-namespaces="no"/>
</xsl:copy>
</xsl:for-each>
</xsl:document>
</xsl:when>
<xsl:when test="unparsed-text-available($this-path)">
<xsl:document>
<unparsed-text>
<xsl:attribute name="xml:base" select="$this-path"/>
<xsl:value-of select="unparsed-text($this-path)"/>
</unparsed-text>
</xsl:document>
</xsl:when>
</xsl:choose>
</xsl:function>
<!-- SAVING DOCX/XLSX DOCUMENT FILES -->
<!-- The initial approach to saving was based upon xsl:result-document, which for security
reasons was not allowed to be a component of a function, only of a template. So the original
version of open-and-save-docx.xsl did not have a tan:save-docx(). This version does, but
only for the Saxon PE/EE versions, and as a pass-through to the named template. -->
<xsl:function name="tan:save-docx" use-when="$advanced-functions-available">
<!-- Alias for the function below -->
<xsl:param name="archive-components" as="document-node()*"/>
<xsl:param name="resolved-uri" as="xs:string"/>
<xsl:sequence select="tan:save-archive($archive-components, $resolved-uri)"/>
</xsl:function>
<xsl:function name="tan:save-xlsx" use-when="$advanced-functions-available">
<!-- Alias for the function below -->
<xsl:param name="archive-components" as="document-node()*"/>
<xsl:param name="resolved-uri" as="xs:string"/>
<xsl:sequence select="tan:save-archive($archive-components, $resolved-uri)"/>
</xsl:function>
<xsl:function name="tan:save-archive" use-when="$advanced-functions-available">
<!-- Alias for the named template, below -->
<xsl:param name="archive-components" as="document-node()*"/>
<xsl:param name="resolved-uri" as="xs:string"/>
<xsl:call-template name="tan:save-archive">
<xsl:with-param name="archive-components" select="$archive-components"/>
<xsl:with-param name="resolved-uri" select="$resolved-uri"/>
</xsl:call-template>
</xsl:function>
<xsl:template name="tan:save-docx">
<xsl:param name="docx-components" as="document-node()*"/>
<xsl:param name="resolved-uri" as="xs:string"/>
<xsl:call-template name="tan:save-archive">
<xsl:with-param name="archive-components" select="$docx-components"/>
<xsl:with-param name="resolved-uri" select="$resolved-uri"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="tan:save-xlsx">
<xsl:param name="xlsx-components" as="document-node()*"/>
<xsl:param name="resolved-uri" as="xs:string"/>
<xsl:call-template name="tan:save-archive">
<xsl:with-param name="archive-components" select="$xlsx-components"/>
<xsl:with-param name="resolved-uri" select="$resolved-uri"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="tan:save-archive" use-when="$advanced-functions-available">
<!-- Input: a sequence of documents that each have @_archive-path stamped in the root element (the result of tan:open-docx());
a resolved uri for the new Word document -->
<!-- Output: an archive saved at the URL specified by the second parameter -->
<xsl:param name="archive-components" as="document-node()*"/>
<xsl:param name="resolved-uri" as="xs:string"/>
<!-- Remove any prefix/suffix that has been set up for tan:save-archive without advanced functions -->
<xsl:variable name="uri-prepped-for-direct-archiving"
select="replace($resolved-uri, '^zip:|[/!]+$', '')"/>
<xsl:variable name="this-target-subdirectory"
select="replace($uri-prepped-for-direct-archiving, '/[^/]+$', '/')"/>
<xsl:variable name="number-of-components" select="count($archive-components)"/>
<xsl:variable name="constructed-archive" as="xs:base64Binary?">
<xsl:iterate select="$archive-components">
<xsl:param name="archive-so-far" as="xs:base64Binary" select="xs:base64Binary('')"/>
<xsl:variable name="this-archive-path" select="*/@_archive-path"/>
<xsl:variable name="this-is-text" select="exists(_text)"/>
<!-- We don't need to do anything if it's merely a subdirectory, but this variable is left as a reminder
that that's a possibility -->
<xsl:variable name="this-is-a-directory" select="exists(_directory)"/>
<xsl:variable name="this-is-binary" select="exists(_base64Binary)"/>
<xsl:variable name="new-content" as="xs:base64Binary?">
<xsl:choose>
<xsl:when test="$this-is-text">
<xsl:value-of select="bin:encode-string(xs:string(.))"/>
</xsl:when>
<xsl:when test="$this-is-binary">
<xsl:value-of select="xs:base64Binary(xs:string(.))"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="this-doc-cleaned-up" as="document-node()">
<xsl:document>
<xsl:apply-templates select="." mode="clean-up-archive"/>
</xsl:document>
</xsl:variable>
<xsl:variable name="this-doc-serialized" select="serialize($this-doc-cleaned-up)"/>
<xsl:value-of select="bin:encode-string($this-doc-serialized)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:if test="$comment-on-saved-archives">
<xsl:message select="'Attempting to add ' || $this-archive-path || ' to archive ' || $uri-prepped-for-direct-archiving"/>
</xsl:if>
<xsl:variable name="next-archive" select="arch:update($archive-so-far, $this-archive-path, $new-content)"/>
<xsl:if test="position() = $number-of-components">
<xsl:sequence select="$next-archive"/>
</xsl:if>
<xsl:next-iteration>
<xsl:with-param name="archive-so-far" select="$next-archive"/>
</xsl:next-iteration>
</xsl:iterate>
</xsl:variable>
<xsl:if test="not(file:exists($this-target-subdirectory))">
<xsl:message select="'Creating directory at ' || $this-target-subdirectory"/>
<xsl:sequence select="file:create-dir($this-target-subdirectory)"/>
</xsl:if>
<xsl:sequence select="file:write-binary($uri-prepped-for-direct-archiving, $constructed-archive)"/>
</xsl:template>
<xsl:template name="tan:save-archive" use-when="not($advanced-functions-available)">
<!-- Input: a sequence of documents that each have @_archive-path stamped in the root element (the result of tan:open-docx());
a resolved uri for the new Word document -->
<!-- Output: an archive saved at the URL specified by the second parameter -->
<!-- Ordinarily, this template would be a function, but security reasons dictate that <xsl:result-document> always fails in the context of a function. -->
<!-- In this function, the target subdirectory for the archive must already exist, or else you might get an error. -->
<xsl:param name="archive-components" as="document-node()*"/>
<xsl:param name="resolved-uri" as="xs:string"/>
<!-- We prep the URI to have zip: at the front and the !/ at the end (elements that might already be present) -->
<xsl:variable name="this-prefix"
select="
if (matches($resolved-uri, '^zip:', '')) then
()
else
'zip:'"
/>
<xsl:variable name="uri-prepped-for-zipping"
select="$this-prefix || replace($resolved-uri, '(.+?)[!/]*$', '$1!/')"/>
<xsl:variable name="binary-archive-components" select="$archive-components[_base64Binary]"/>
<xsl:for-each select="$archive-components/*[@_archive-path][not(self::_base64Binary)]">
<xsl:if test="$comment-on-saved-archives">
<xsl:message select="'Adding component ' || @_archive-path || ' to ' || $uri-prepped-for-zipping"/>
</xsl:if>
<xsl:result-document href="{$uri-prepped-for-zipping || @_archive-path}"
format="open-and-save-docx">
<xsl:document>
<xsl:apply-templates select="." mode="clean-up-archive"/>
</xsl:document>
</xsl:result-document>
</xsl:for-each>
<xsl:for-each select="$binary-archive-components">
<xsl:message select="*/@_archive-path || ' is being omitted from output, because advanced functions for saving binary files are not available.'"/>
</xsl:for-each>
</xsl:template>
<!-- DEFAULT BEHAVIOR FOR RESERVED TEMPLATE MODES -->
<!-- Important: if you import this stylesheet, you must be certain to include the following code,
commented out (the existence of value of @priority depends upon what is happening in the
importing XSLT): -->
<!--<xsl:template match="document-node() | node() | @*" priority="2"
mode="clean-up-archive map-to-xml">
<xsl:apply-imports/>
</xsl:template>-->
<xsl:template match="document-node()" mode="clean-up-archive map-to-xml">
<xsl:document>
<xsl:apply-templates mode="#current"/>
</xsl:document>
</xsl:template>
<xsl:template match="node() | @*" mode="clean-up-archive map-to-xml">
<xsl:copy>
<xsl:apply-templates select="node() | @*" mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@_archive-path | @xml:base" mode="clean-up-archive"/>
</xsl:stylesheet>
You can’t perform that action at this time.