Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zip files larger than 4GB fail on Windows #12

Open
tuxan11 opened this issue May 5, 2015 · 27 comments
Open

Zip files larger than 4GB fail on Windows #12

tuxan11 opened this issue May 5, 2015 · 27 comments
Labels

Comments

@tuxan11
Copy link

tuxan11 commented May 5, 2015

Hello,
Thanks for this great library. I am able to integrate this with a project that I am working on. The library creates downloadable zips perfectly when the entire zip file size is under 4GB. However, once the zip file size exceeds 4GB, the downloaded zip is not usable anymore. Unzip utilities complain that it is an corrupted zip.
Some more details of the issue:
Format used: Zip
No of files used : 10 (each file averaging about 500MB)
Total zip size: about 4.6 GB
Unzip utilities tried (and failed): Windows default, winzip and ALZip

Can you please let me know if I am missing something or this is a know limitation of the library.

@johnmaguire
Copy link
Contributor

Hi @tuxan11, can you please make sure that you're up-to-date with the master branch of this repository? In the past, there have been issues with files over 4GB, but they should be corrected at this point. If that doesn't help you, I'd need to see how you're using the library (code), and the resulting zip (you an email it to jmaguire@barracuda.com). Thanks!

@tuxan11
Copy link
Author

tuxan11 commented May 5, 2015

Hello @johnmaguire,
Thank you for the quick response.
I tried with ArchiveStream-php-master.zip from couple of days back, still the same issue.
Here is the code I am using.

$opt = array();
$opt['comment'] = 'Hello World';

$zip = new \ArchiveStream_Zip('download.zip');  //I am forcing Zip download. We don't want tar
$stat = fstat($fh);            //$fh is the file stream
$zip->init_file_stream_transfer($filePath, $stat['size'], $opt);            //$filePath path to store the file at
while (!feof($fh)) {
      $zip->stream_file_part(fread($fh, 1048576));
}
$zip->complete_file_stream();
$zip->finish();

BTW, I will upload the zip created with this code to a fileshare and send you the link.

@johnmaguire
Copy link
Contributor

@tuxan11 Thanks for the ZIP, I'll be looking into it as soon as possible. Just a couple more questions: You're not seeing any PHP errors or warnings in your log? Do you have output buffering turned off? Try adding this before your Zipstream code:

while (ob_get_level() > 0) {
    ob_end_clean();
}

This will turn off output buffering if it's turned on. If you're using a framework like Kohana, Laravel, or CodeIgniter, it's quite likely that it starts buffering prior to your code getting executed. This can cause out of memory issues.

Thanks!

@tuxan11
Copy link
Author

tuxan11 commented May 5, 2015

Hello John,
I don't see any errors or warnings while using the zip library.
I will add the code that you've had to turn off memory buffering and will get back to you with the results. Thanks.

@johnmaguire
Copy link
Contributor

@tuxan11 There's one more thing I'll ask you to try if that doesn't work for you... if you look at this issue you can see that @ianvonholt had some issues using the streaming capabilities of the library after we switched our code from using fgets to fread. I haven't been able to reproduce this, but in your example, you use fread (which theoretically should be correct). I would ask however that you try switching it to fgets and we can see if that fixes the issue.

If so, I can target a single bug instead of two. :) Thanks!

@tuxan11
Copy link
Author

tuxan11 commented May 6, 2015

@johnmaguire I did try that solution yesterday(but the result was same) as I was browsing through the issue list.
I've started same zip operation with the code change you've suggested. Will let you know how it goes.

@tuxan11
Copy link
Author

tuxan11 commented May 6, 2015

@johnmaguire I tried with explicit memory buffer off and still the same issue. Please let me know if you want me try anything different.

@johnmaguire
Copy link
Contributor

@tuxan11 Thanks for giving it a shot. I should have asked sooner -- can I get the PHP version, distro version, and web server version? Also, does this issue occur with files full of just null characters (or just 0s for that matter?) If so, I can try to reproduce this on my side.

@tuxan11
Copy link
Author

tuxan11 commented May 6, 2015

@johnmaguire Here is the info:
Distro: Windows 8,
XAMPP :1.8.3
Apache: 2.4.9
PHP Version 5.5.15

This problem occurs with any files. I just created bunch of files using linux command and zipped them up.

@tuxan11
Copy link
Author

tuxan11 commented May 8, 2015

@johnmaguire, were you able to find any issues with the zip file that I uploaded ?

@johnmaguire
Copy link
Contributor

@tuxan11 It definitely doesn't look correct. :) I'm hoping to get a tool from a colleague of mine that analyzes ZIP files to see exactly what's off. I'm also creating a test to reproduce this:

-> % cat issue_12.php
<?php
// Github issue: https://github.com/barracudanetworks/ArchiveStream-php/issues/12
// ZIPs with a size exceeding 4 GB are corrupt

require_once './ArchiveStream-php/stream.php';
require_once './ArchiveStream-php/zipstream.php';

$files = array(
        'A.bin', // 2.5GB
        'B.bin', // 2.5GB
);

$opt = array(
        'comment' => 'Hello world',
);

$zip = new \ArchiveStream_Zip('download.zip');
foreach ($files as $file)
{
        $fh = fopen($file, 'r');
        $stat = fstat($fh, $stat['size'], $opt);

        while (!feof($fh))
        {
                $zip->init_file_stream_transfer(fread($fh, 1024 * 1024));
        }
}

$zip->complete_file_stream();
$zip->finish();

If I can't reproduce with this script, I'll attempt one using a single 5GB file to see if that creates a corrupt ZIP.

Also, did you have a comment on your ZIP file that reads "Generated by FileCloud"? I don't see this in the sample code you provided. If so, I think the code that adds a comment to the ZIP file may be a little messed up. If not, you're outputting more to the page than simply the ZIP file, and that could be part of your issue.

@johnmaguire
Copy link
Contributor

Looks like that reproduced it. Will look into this after lunch hopefully.

-> % 7z x issue_12.zip

7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=utf8,Utf16=on,HugeFiles=on,8 CPUs)

Processing archive: issue_12.zip

Error: Can not open file as archive

@johnmaguire
Copy link
Contributor

Err, scratch that, my script was totally messed up, haha. Going to give it another go with this:

-> % cat issue_12.php
<?php
// Github issue: https://github.com/barracudanetworks/ArchiveStream-php/issues/12
// ZIPs with a size exceeding 4 GB are corrupt

ini_set('max_execution_time', 0);

require_once './ArchiveStream-php/stream.php';
require_once './ArchiveStream-php/zipstream.php';

$files = array(
        'A.bin',
        'B.bin',
);

$opt = array(
        'comment' => 'Hello world',
);

$zip = new \ArchiveStream_Zip('issue_12.zip');

foreach ($files as $file)
{
        // Note: Using stat instead of fstat
        $fh = fopen($file, 'r');

        $zip->init_file_stream_transfer($file, $stat['size'], $opt);
        while (!feof($fh))
        {
                $zip->stream_file_part(fread($fh, 1024 * 1024));
        }
        $zip->complete_file_stream();
}

$zip->finish();

@johnmaguire
Copy link
Contributor

That seemed to work just fine:

-> % 7z x issue_12.zip

7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=utf8,Utf16=on,HugeFiles=on,8 CPUs)

Processing archive: issue_12.zip

Extracting  A.bin
Extracting  B.bin

Everything is Ok

Files: 2
Size:       5242880000
Compressed: 5242880432

Some info about the server used to produce the zip file:

john@dib [12:42:16] [~/www/dib.leftforliving.com/files/archivestream]
-> % uname -a
Linux dib.leftforliving.com 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2+deb7u2 x86_64 GNU/Linux
john@dib [12:42:19] [~/www/dib.leftforliving.com/files/archivestream]
-> % sudo nginx -v
nginx version: nginx/1.6.2
john@dib [12:42:20] [~/www/dib.leftforliving.com/files/archivestream]
-> % php --version
PHP 5.6.7-1 (cli) (built: Mar 24 2015 12:30:15)
Copyright (c) 1997-2015 The PHP Group
Zend Engine v2.6.0, Copyright (c) 1998-2015 Zend Technologies
    with Zend OPcache v7.0.4-dev, Copyright (c) 1999-2015, by Zend Technologies

Could you try running the script I posted above? To generate A.bin and B.bin, I used the following commands on Linux:

dd if=/dev/urandom of=A.bin bs=1M count=2500
dd if=/dev/urandom of=B.bin bs=1M count=2500

Any data should do.

@johnmaguire
Copy link
Contributor

One last note for now... I added $opt into the ArchiveStream_Zip constructor, in order to make sure ZIP file-level comments work. No problems:

jmaguire@ZimsBase [01:01:57] [~/Downloads]
-> % 7z x issue_12\(1\).zip

7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=utf8,Utf16=on,HugeFiles=on,8 CPUs)

Processing archive: issue_12(1).zip

file A.bin
already exists. Overwrite with
A.bin?
(Y)es / (N)o / (A)lways / (S)kip all / A(u)to rename all / (Q)uit? A
Extracting  A.bin
Extracting  B.bin

Everything is Ok

Files: 2
Size:       5242880000
Compressed: 5242880432

@tuxan11
Copy link
Author

tuxan11 commented May 13, 2015

@johnmaguire, thanks for the info. I will try to run the script and get back to you.

@tuxan11
Copy link
Author

tuxan11 commented May 14, 2015

@johnmaguire I tried your code on my windows box and it resulted in a corrupted zip same as before.
Then I tried the same code in Linux and there it works. Looks like the problem is only on PHP on Windows. Can anything be done on Windows platform.
Please let me know if you need any more information.

@johnmaguire
Copy link
Contributor

Hi tuxan11. I don’t have any Windows machines currently, nor do we use any here at Barracuda Networks. If I get some time over the next week or so, I’ll try to setup a VM to test this in. No guarantees. :(

Just to verify, do you have the GMP PHP extension installed correctly?

John

On May 14, 2015, at 6:34 PM, tuxan11 notifications@github.com wrote:

@johnmaguire https://github.com/JohnMaguire I tried your code on my windows box and it resulted in a corrupted zip same as before.
Then I tried the same code in Linux and there it works. Looks like the problem is only on PHP on Windows. Can anything be done on Windows platform.
Please let me know if you need any more information.


Reply to this email directly or view it on GitHub #12 (comment).

@johnmaguire
Copy link
Contributor

Also, when testing on Linux, did you use Apache? If so, version used there
would be great. Thanks!
On May 14, 2015 6:40 PM, "John Maguire" john@johnmaguire.me wrote:

Hi tuxan11. I don’t have any Windows machines currently, nor do we use any
here at Barracuda Networks. If I get some time over the next week or so,
I’ll try to setup a VM to test this in. No guarantees. :(

Just to verify, do you have the GMP PHP extension installed correctly?

John

On May 14, 2015, at 6:34 PM, tuxan11 notifications@github.com wrote:

@johnmaguire https://github.com/JohnMaguire I tried your code on my
windows box and it resulted in a corrupted zip same as before.
Then I tried the same code in Linux and there it works. Looks like the
problem is only on PHP on Windows. Can anything be done on Windows platform.
Please let me know if you need any more information.


Reply to this email directly or view it on GitHub
#12 (comment)
.

@tuxan11
Copy link
Author

tuxan11 commented May 15, 2015

@johnmaguire, on Linux I used Apache version: 2.2.15.
On windows machine, I did install gmp php extension. Without installing it, some php calls such as(gmp_init) was failing.

@johnmaguire
Copy link
Contributor

Thanks! I'm able to reproduce this in a Windows VM with the latest XAMPP using PHP 5.5. I'll try to debug this soon. :)

@filerun
Copy link

filerun commented Jun 24, 2015

Any news about this? I have been trying all day to make this work with files larger than 4GB.
Running all 64 bit software, with PHP 5.6.10, but on Windows (8). The archive is always corrupt, regardless of being Zip or Tar. The expected archive size seems to be correct. WinRAR lists the archive contents, but cannot extract anything and reports the large file's size as around 2GB instead of 4.5GB. So I am pretty sure there is an integer limitation somewhere. I do get this warning from PHP: PHP Warning: ArchiveStream::int64_split(): Unable to convert variable to GMP - wrong type in stream.php on line 361
But I'm not good with numbers and bits to troubleshoot this...
Also, for other people trying to run this on Windows, do note that PHP, even the latest version, and even 64 bit, reports incorrect number when using "filesize()". I am using COM to get the real filesize and that works every time.

@johnmaguire
Copy link
Contributor

Are you saying that if you switch from filesize() to COM the archives are no longer corrupt, or the filesizes are correct?

Found this on php.net/filesize: Note: Because PHP's integer type is signed and many platforms use 32bit integers, some filesystem functions may return unexpected results for files which are larger than 2GB.

Definitely sounds like it could be related. Thank you for the information. I haven't tracked down the problem yet, but the PHP warning and that info is definitely helpful!

Also, are you using an up-to-date version of the ArchiveStream library? Line 361 of stream.php doesn't seem to be a gmp_*() call.

@filerun
Copy link

filerun commented Jun 24, 2015

On Windows, the sizes of large files (>2GB on 32bit systems and >4GB on 64bit) will always be reported incorrect by "filesize()". Regardless of fixing this problem, the library still provides corrupt archives.

I know that line doesn't specifically use any GMP functions, but the error points always to the "$low = $value & $right;" line. I have found however that this error occurs only with PHP 5.6, so I will skip that version for now.

@johnmaguire
Copy link
Contributor

Interesting. Thanks again for all the info.

On Jun 24, 2015, at 1:53 PM, Vlad notifications@github.com wrote:

On Windows, the sizes of large files (>2GB on 32bit systems and >4GB on 64bit) will always be reported incorrect by "filesize()". Regardless of fixing this problem, the library still provides corrupt archives.

I know that line doesn't specifically use any GMP functions, but the error points always to the "$low = $value & $right;" line. I have found however that this error occurs only with PHP 5.6, so I will skip that version for now.


Reply to this email directly or view it on GitHub #12 (comment).

@FabryB
Copy link

FabryB commented Nov 27, 2015

Note: on windows before PHP 7 there is no support for 64 bit integers, even if running php 64 bit. For this reason filesize() function reports negative value for files > 2GB.

// Check if PHP is running with 64 bit integer support
if (PHP_INT_SIZE < 8) {
    // NOT 64 bit
}

@johnmaguire
Copy link
Contributor

@FabryB I believe that is what @vvllaadd was saying above. He switched out the filesize() call for a COM call, and was still unable to get the library working. We store most numbers using GMP within the library, in order to get around 32-bit limitations. However, the line he said the error occured on ($low = $value & $right;) does not use GMP, so I wonder if it could suffer from a bug if you don't have 64-bit ints. I'd have to look closer at what we're storing in those vars, and what we're aiming to do.

Thank you for the information.

@johnmaguire johnmaguire changed the title Zip files larger than 4GB Zip files larger than 4GB fail on Windows Dec 28, 2015
@johnmaguire johnmaguire mentioned this issue Sep 1, 2016
nickvergessen pushed a commit to nextcloud-deps/TarStreamer that referenced this issue Aug 17, 2022
[0.2.0] Add PHP 7.3 to the matrix, drop PHP 5.4, 5.5, 5.6, 7.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants