Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to load empty / invalid files throws PHP notices/warnings #1718

Closed
dregad opened this issue Nov 17, 2020 · 10 comments
Closed

Attempting to load empty / invalid files throws PHP notices/warnings #1718

dregad opened this issue Nov 17, 2020 · 10 comments

Comments

@dregad
Copy link

dregad commented Nov 17, 2020

This is:

- [x] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

Reader should check return value of failing functions (e.g. ZipArchive::open, simplexml_load_string) and throw some meaningful exception, instead of continuing.

What is the current behavior?

Example with empty xlsx file:

PHP Notice:  Trying to get property 'Relationship' of non-object in /tmp/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Xlsx.php on line 342
PHP Warning:  Invalid argument supplied for foreach() in /tmp/vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Xlsx.php on line 342
... etc ...

What are the steps to reproduce?

Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:

<?php
include 'vendor/autoload.php';
$xlReader = new PhpOffice\PhpSpreadsheet\Reader\Xlsx();
$xlReader->load('test.xlsx');

Generate an empty or invalid Excel file, e.g. with touch test.xlsx or echo 'hello world' >test.xlsx and run the above script.

Which versions of PhpSpreadsheet and PHP are affected?

Tested on PHP 7.4.3 with PhpSpreadsheet 1.15.0.

@jonnylink
Copy link
Contributor

jonnylink commented Nov 23, 2020

Also bumping into this issue—except at a slightly different angle. For whatever reason the xlsx file we've been handed has a slightly different file structure, where the _rels dir is at the base level instead of the in the xl dir where it should(?) be. Don't know enough about the spec yet, but this seems not to bother other xlsx readers. PHPSpreadsheet dies on it for the same reason it dies on an empty xlsx file—trying to do a foreach on null.

vendor/phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Xlsx.php

        $wbRels = simplexml_load_string(
            $this->securityScanner->scan($this->getFromZipArchive($zip, "xl/_rels/${workbookBasename}.rels")),
            'SimpleXMLElement',
            Settings::getLibXmlLoaderOptions()
        );
        foreach ($wbRels->Relationship as $rel) {...};

The return of getFromZipArchive can be empty. Haven't spent enough time in the codebase to see the proper way to handle this, but for sure it would be a check before the foreach

@php-force
Copy link

Have the same issue while reading just a regular xlsx file created with office 365 that has data in it, has anyone come up with the fix yet ? majority of the urls doesn't seem to be working anymore: Xlsx reader breaks with Relationship not a property error. Majority of urls used in Xlsx reader seem to be incorrect, for example instead http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument should be http://purl.oclc.org/ooxml/officeDocument/relationships/officeDocument...

@dregad
Copy link
Author

dregad commented Apr 26, 2021

My original problem (which was caused by attempting to load a file which was actually not an Excel document) can easily be fixed by the following patch:

diff --git a/src/PhpSpreadsheet/Reader/Xlsx.php b/src/PhpSpreadsheet/Reader/Xlsx.php
index f07ac008..85af39da 100644
--- a/src/PhpSpreadsheet/Reader/Xlsx.php
+++ b/src/PhpSpreadsheet/Reader/Xlsx.php
@@ -328,8 +328,12 @@ class Xlsx extends BaseReader
         $unparsedLoadedData = [];
 
         $zip = new ZipArchive();
-        $zip->open($pFilename);
+        $result = $zip->open($pFilename);
+        if ($result !== true) {
+            // This does NOT catch the empty archive case - ZipArchive::open()
+            // returns true (with Deprecated notice on PHP >= 8.0)
+            throw new Exception('File "' . $pFilename . '" could not be opened (code ' . $result . ').');
+        }

         //    Read the theme first, because we need the colour scheme when reading the styles
         //~ http://schemas.openxmlformats.org/package/2006/relationships"

This does not handle the empty file case though, and I'm not sure about the right way to fix this.

PHP documentation states that libzip >= 1.6.0 treats empty archive as invalid, but even though have libzip 1.7.3, ZipArchive::open('emptyfile') returns true (PHP >= 8.0 does throw a deprecated notice) so we can't rely on that.

Should load() method throw an exception given an empty file, or return a valid, empty spreadsheet object ?

I have no idea how files missing (some of) the _rels directories should be handled. Maybe someone with knowledge of the spec can provide some hints ?

@stale
Copy link

stale bot commented Jun 26, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If this is still an issue for you, please try to help by debugging it further and sharing your results.
Thank you for your contributions.

@stale stale bot added the stale label Jun 26, 2021
@jonnylink
Copy link
Contributor

I'd forgotten about this. We had to adopt a much more annoying tool to read files. Would be nice to see this fixed.

@stale stale bot removed the stale label Jun 27, 2021
@oleibman
Copy link
Collaborator

We have just committed a change to master which is the first phase of a project to improve namespace handling. I doubt that it will fix the empty file or not-a-zip problems; I will certainly add those requirements to the next phase. I think it should handle, at least in part, the different namespace problem reported by @php-force on April 26; please test if possible. It may or may not handle the _rels-in-the-wrong-place problem reported by @jonnylink on November 23. If it doesn't, or if you can't test against master, is it possible to upload a copy of that file so that I can analyze what is needed for it?

@jonnylink
Copy link
Contributor

I'll look for one of the files and report back next week. To be honest, I'd forgotten why we couldn't use PHPOffice for certain files until the stalebot reminded me 😬

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Jul 9, 2021
This started as a response to issue PHPOffice#1718, for which it is a partial (not complete) solution. The following changes are made:
- canRead can currently throw an exception. This seems wrong. It should just return true/false.
- Breaking change of sorts. When AssertFile encounters a non-existent or unreadable file, it throws InvalidArgumentException. This does not make sense. I have changed it to throw PhpSpreadsheet/Reader/Exception.
- Since the previous bullet item required changing of most of the Reader files anyhow, this is a good time to add explicit typing for canRead in the function signature rather than the DocBlock. Since all the canRead functions inherit from an abstract version in IReader, they all have to be changed simulatneously. Except for Xlsx and Ods, most of the Reader files are otherwise unchanged.
- AssertFile is changed to add an optional "zip member" parameter. It will check for the existence of an appropriate member in what is supposed to be a zip file. It is used by Xlsx and Ods.
- Verifying that a given file is a valid zip ought to be a feature of ZipArchive. Thanks to a particularly nasty bug in php/libzip (see https://bugs.php.net/bug.php?id=81222), it is unsafe to attempt to open a zero-length file as a zip archive. There is a solution, but it does not apply to all the PHP releases which we support, and isn't even necessarily supported on all the point versions of the PHP versions which we do support. I have coded up a manual test for "valid zip", with a comment pointing to the spec.
- In theory, tests now cover 100% of the code in Shared/File. In practice ... One of the tests require that chmod works properly, which is not quite true on Windows systems, so that test is skipped on Windows. Another test requires that php.ini uses a non-default value for upload_temp_dir (can't be overridden in application code), which is probably not the case when Github runs the unit tests, so that test is skipped when appropriate. I have run tests for both on systems where they are not skipped.
oleibman added a commit that referenced this issue Jul 25, 2021
* Tweaks to Input File Validation

This started as a response to issue #1718, for which it is a partial (not complete) solution. The following changes are made:
- canRead can currently throw an exception. This seems wrong. It should just return true/false.
- Breaking change of sorts. When AssertFile encounters a non-existent or unreadable file, it throws InvalidArgumentException. This does not make sense. I have changed it to throw PhpSpreadsheet/Reader/Exception.
- Since the previous bullet item required changing of most of the Reader files anyhow, this is a good time to add explicit typing for canRead in the function signature rather than the DocBlock. Since all the canRead functions inherit from an abstract version in IReader, they all have to be changed simulatneously. Except for Xlsx and Ods, most of the Reader files are otherwise unchanged.
- AssertFile is changed to add an optional "zip member" parameter. It will check for the existence of an appropriate member in what is supposed to be a zip file. It is used by Xlsx and Ods.
- Verifying that a given file is a valid zip ought to be a feature of ZipArchive. Thanks to a particularly nasty bug in php/libzip (see https://bugs.php.net/bug.php?id=81222), it is unsafe to attempt to open a zero-length file as a zip archive. There is a solution, but it does not apply to all the PHP releases which we support, and isn't even necessarily supported on all the point versions of the PHP versions which we do support. I have coded up a manual test for "valid zip", with a comment pointing to the spec.
- In theory, tests now cover 100% of the code in Shared/File. In practice ... One of the tests require that chmod works properly, which is not quite true on Windows systems, so that test is skipped on Windows. Another test requires that php.ini uses a non-default value for upload_temp_dir (can't be overridden in application code), which is probably not the case when Github runs the unit tests, so that test is skipped when appropriate. I have run tests for both on systems where they are not skipped.

* Update File.php

* Scrutinizer Timeout

It's not actually timing out, it's just waiting for something to finish that finished ages ago. Making a meaningless comment change in hopes that will clear the jam. Not particularly hopeful.
@dregad
Copy link
Author

dregad commented Sep 10, 2021

I doubt that it will fix the empty file or not-a-zip problems

@oleibman I can confirm that attempting to read an empty file now throws PhpOffice\PhpSpreadsheet\Reader\Exception: Could not find zip member zip://test.xlsx#_rels/.rels instead of continuing with PHP warnings as it did until 1.18.0 included (tested today, with master branch at bc9234e), which fixes my main issue.

The invalid file case throws the same Reader exception (instead of ValueError: Invalid or uninitialized Zip object as it did before), which is good enough for me.

Looking forward to seeing this in the next official release. Thanks !

@oleibman
Copy link
Collaborator

Thank you for reporting the problem and confirming the fix.

@PowerKiKi
Copy link
Member

Fixed in 1.19.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants