-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
I think this is the root cause behind #1242.
In short, TemplateProcessor with PCLZip seems to produce malformed docx files that open fine in LibreOffice but will fail to open in Microsoft Word, saying "The file cannot be opened because there are problems with the contents" and requiring a repair before it will open. This seems to be because of duplicate XML files.
Here are two files produced exactly the same way, except one with PCLZip and another with ZipArchive. Note the size difference.
zipinfo
shows that the PclZip version has two instances of word/document.xml
, unlike the ZipArchive version, which only has one.
Archive: env-PclZip-2019-04-18-124542.docx
Zip file size: 37967 bytes, number of entries: 14
-rw---- 4.5 fat 1483 b- defS 80-Jan-01 00:00 [Content_Types].xml
-rw---- 4.5 fat 735 b- defS 80-Jan-01 00:00 _rels/.rels
-rw---- 4.5 fat 979 b- defS 80-Jan-01 00:00 word/_rels/document.xml.rels
-rw---- 4.5 fat 16390 b- defS 80-Jan-01 00:00 word/document.xml
-rw---- 4.5 fat 6797 b- defS 80-Jan-01 00:00 word/theme/theme1.xml
-rw---- 4.5 fat 15556 b- stor 80-Jan-01 00:00 docProps/thumbnail.jpeg
-rw---- 4.5 fat 2803 b- defS 80-Jan-01 00:00 word/settings.xml
-rw---- 4.5 fat 7681 b- defS 80-Jan-01 00:00 word/printerSettings/printerSettings1.bin
-rw---- 4.5 fat 529 b- defS 80-Jan-01 00:00 word/webSettings.xml
-rw---- 4.5 fat 751 b- defS 80-Jan-01 00:00 docProps/core.xml
-rw---- 4.5 fat 29435 b- defS 80-Jan-01 00:00 word/styles.xml
-rw---- 4.5 fat 1525 b- defS 80-Jan-01 00:00 word/fontTable.xml
-rw---- 4.5 fat 966 b- defS 80-Jan-01 00:00 docProps/app.xml
-rw---- 2.0 fat 14230 b- defN 19-Apr-18 12:45 word/document.xml
14 files, 99860 bytes uncompressed, 34489 bytes compressed: 65.5%
Archive: env-ZipArchive-2019-04-18-124536.docx
Zip file size: 32713 bytes, number of entries: 13
-rw---- 4.5 fat 1483 b- defN 80-Jan-01 00:00 [Content_Types].xml
-rw---- 4.5 fat 735 b- defN 80-Jan-01 00:00 _rels/.rels
-rw---- 4.5 fat 979 b- defN 80-Jan-01 00:00 word/_rels/document.xml.rels
-rw---- 4.5 fat 6797 b- defN 80-Jan-01 00:00 word/theme/theme1.xml
-rw---- 4.5 fat 15556 b- stor 80-Jan-01 00:00 docProps/thumbnail.jpeg
-rw---- 4.5 fat 2803 b- defN 80-Jan-01 00:00 word/settings.xml
-rw---- 4.5 fat 7681 b- defN 80-Jan-01 00:00 word/printerSettings/printerSettings1.bin
-rw---- 4.5 fat 529 b- defN 80-Jan-01 00:00 word/webSettings.xml
-rw---- 4.5 fat 751 b- defN 80-Jan-01 00:00 docProps/core.xml
-rw---- 4.5 fat 29435 b- defN 80-Jan-01 00:00 word/styles.xml
-rw---- 4.5 fat 1525 b- defN 80-Jan-01 00:00 word/fontTable.xml
-rw---- 4.5 fat 966 b- defN 80-Jan-01 00:00 docProps/app.xml
-rw-rw-rw- 2.0 unx 14230 b- defN 19-Apr-18 12:45 word/document.xml
13 files, 83470 bytes uncompressed, 29345 bytes compressed: 64.8%
How to Reproduce
use \PhpOffice\PhpWord\TemplateProcessor;
use \PhpOffice\PhpWord\Settings;
function storage_path($fileName) {
return __DIR__ . "/../storage/" . $fileName;
}
function resource_path($fileName) {
return __DIR__ . "/../resources/" . $fileName;
}
function build($zipClass, $template, $outdir)
{
assert($zipClass == Settings::ZIPARCHIVE || $zipClass == Settings::PCLZIP);
Settings::setZipClass($zipClass);
$builder = new TemplateProcessor($template);
$path = $outdir . '/env-' . $zipClass . "-" . date('Y-m-d-His', time()) . '.docx';
$builder->saveAs($path);
return $path;
}
echo build(Settings::PCLZIP, resource_path('views/docx/EnvelopeTemplate_narrow.docx'), storage_path('envelopes/')) . "\n";
Example file (from #1242): EnvelopeTemplate_narrow.docx
Swap out Settings::PCLZIP
for Settings::ZIPARCHIVE
on the last line as needed.
Details
To recap, PHPWord's TemplateProcessor uses a ZipArchive wrapper that either uses PCLZip or ZipArchive behind the scenes to handle zip operations. PHPWord seems to use ZipArchive by default (as configured in Settings::getZipClass).
When TemplateProcessor does its replacements, it extracts various XML files from the zip file, modifies these XML files, and then adds them back to the archive. PclZip yields an archive with duplicate XML files. ZipArchive does not.
Duplicate XML files in the resulting docx file is incorrect behaviour per OOXML file format rule M3.3 "Package implementers shall create item names that are unique within a given archive". LibreOffice probably opens it fine because it's more lenient.
The key difference between PclZip and ZipArchive seems to be in the addFromString
behaviour:
-
When adding to a zip archive, ZipArchive overwrites existing files:
Note that this function overwrites existing files of the same name.
-
However, PclZip does not:
If a file already exist in an archive it is added at the end of the archive, but not automatically replaced.
Context
- PHP version: 7.2.15
- PHPWord version: 0.16