Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The archive is corrupt #1

Closed
hchautrung opened this issue Nov 7, 2013 · 18 comments
Closed

The archive is corrupt #1

hchautrung opened this issue Nov 7, 2013 · 18 comments

Comments

@hchautrung
Copy link

Dear,

I download the latest version, and run index.php in test folder but I got the error "The archive is corrupt" when open the downloaded archive with WinRar.

Thank you,
Hugo

@johnmaguire
Copy link
Contributor

@hugohuynh I know this bug is very old, but can you give a little more details about the problem? Was it a .tar file or a .zip file? How large was the archive and how large were the files within it? What error did you get?

@johnmaguire
Copy link
Contributor

Closing this issue due to lack of information. The problem has likely been fixed in the interim.

@ianvonholt
Copy link

I have recently run into the same issue, but only if my archive streams above 670mb in size.

I have an script that will pull all selected files that a user wants and then streams the file on the fly. Anything under 670mb and the archive is completely fine. Anything over, and the file is corrupt.

What information would you need to help debug?

@johnmaguire
Copy link
Contributor

Was this a tar or a zip?

@ianvonholt
Copy link

It is a zip file.

Windows will flat out refuse to open the corrupt zip, which is expected. Winrar will open it and show one huge file with an large,but incorrect, file size. Selecting additional items to add to the zip archive still increases the size of the corrupt archive, but you just end up with a single file listing.

Corrupt Zip Screen

Again, as soon as the selected archive is under 670mb the archive displays correctly.

Correct Zip

Additionally, PHP.ini has some rather high memory limits and execution time for this server. Could this be an issue with Zip64 or gmp settings?

@johnmaguire
Copy link
Contributor

Hmm. It's rather odd since we haven't seen this in our environment, and our customers are downloading ZIPs every day. Have you made sure that there's no PHP errors occuring during execution? Is there any chance you might be able to get a broken ZIP to me some way, so that I can analyze it? If you'd like to email me a link to it privately, you can email jmaguire@barracuda.com.

@ianvonholt
Copy link

It is pretty odd. The download setup has been working pretty well for a couple months, however we never really encountered the file size limitations until now.

There have been no errors generated by PHP within the environment.

I've e-mailed you a link to a corrupt zip file.

@johnmaguire
Copy link
Contributor

I'm seeing that the ZIP is lacking an end-of-central-directory signature. This is created when finish() is called on the object after all files have been added. Are you making sure that finish() is called at the end of execution?

Also, do you know if this issue affects tar files as well? (You can test by passing anything into the instance_by_useragent method that doesn't contain the term "windows".)

@johnmaguire johnmaguire reopened this Apr 10, 2015
@johnmaguire
Copy link
Contributor

Err, sorry, please verify that both complete_file_stream() and finish() are being called, if you're using the stream methods.

@ianvonholt
Copy link

The archive is indeed failing to complete.

The call is pretty basic using the ArchiveStream from the example.

$zip = ArchiveStream::instance_by_useragent( 'CPF_WebFTP_Download_' . date('Y_m_d-H:i:s') );

foreach ($pickedFiles as $file) {
    $zip->add_file_from_path($file['internal'], $file['path']);
}

$zip->finish();

The add_file_from_path is correctly calling the add_large_file function, initializing the init_file_stream_transfer and dying on a fread call when the archive streams anything above the previously mentioned 670mb. So it seems that a stream_file_part is missed.

Switching to fgets seems to solve the issue on my current environment. However, this seems like a bad idea due to fgets returning false if it hits a newline.

@johnmaguire
Copy link
Contributor

The switch to fread() is quite new. Previously we were using fgets(), and in production here at Barracuda, we're still using the older version with fgets() (I was actually just about to update.)

As such, I'm not surprised there's some unexpected behavior.

While I'm ironing out the issue with fread(), there shouldn't be concerns about fgets() and newlines. For example:

jmaguire@ZimsBase [02:41:36] [~/Repositories/cuda/backup] [release/6.0.08 *]
-> % cat > asdf
test
123
jmaguire@ZimsBase [02:55:26] [~/Repositories/cuda/backup] [release/6.0.08 *]
-> % php -a
Interactive shell

php > $fh = fopen('asdf', 'r');
php > while ($data = fgets($fh)) { echo $data; }
test
123

@johnmaguire
Copy link
Contributor

A little shot in the dark: Is the file you're adding to the ZIP being read from over a socket?

@ianvonholt
Copy link

Nope.

All the files are local to the code-base. I looked into the possibility of a socket_timeout occurring, for some weird reason, but there was no indication that this was happening.

I'll do a bit more testing to see if I can track down what exactly is causing fread to fail, but for me switching back to fgets fixed my problem.

A bit more about the server:
CentOS release 6.6 (Final)
PHP 5.4.29
Apache/2.2.27

@johnmaguire
Copy link
Contributor

Interesting. I'll try doing some tests locally using the add from file
call, as we mainly use this library for creating the file on the fly while
streaming from multiple parts. Thanks a lot for the report. :)
On Apr 10, 2015 3:11 PM, "Ian Von Holt" notifications@github.com wrote:

Nope.

All the files are local to the code-base. I looked into the possibility of
a socket_timeout occurring, for some weird reason, but there was no
indication that this was happening.

I'll do a bit more testing to see if I can track down what exactly is
causing fread to fail, but for me switching back to fgets fixed my problem.

A bit more about the server:
CentOS release 6.6 (Final)
PHP 5.4.29
Apache/2.2.27


Reply to this email directly or view it on GitHub
#1 (comment)
.

@johnmaguire
Copy link
Contributor

Can't reproduce using a 5 byte text file, 750MB text file, and 750MB binary file... script below:

<?php

// Created for issue: https://github.com/barracudanetworks/ArchiveStream-php/issues/1
// Debugging problem that caused re-open (reported by ianvonholt)

// Just in case
ini_set('max_execution_time', 600);

// Switch to false for a tar file
define('ZIP_FILE', true);

include_once('stream.php');

// Hack to get a zip file even on Linux and vice-versa
if (ZIP_FILE) { $_SERVER['HTTP_USER_AGENT'] = 'windows'; } else { $_SERVER['HTTP_USER_AGENT'] = 'linux'; }

$files = [
    '5B.txt' => 'files/5B.txt',
    '750M.txt' => 'files/500M.txt',
    '750M.bin' => 'files/750M.bin',
];

$zip = ArchiveStream::instance_by_useragent('fread');
foreach ($files as $file => $path)
{
    $zip->add_file_from_path($file, $path);
}

$zip->finish();

@johnmaguire
Copy link
Contributor

It almost seems like WinRAR isn't respecting the third bit being set in the general purpose flag of the file header (this is what says to ignore the values 0x00 for CRC and 0xFFFF for the length of the data, and to go to the file descriptor for those values.) It's odd that switching back to fgets would fix the problem though.

Just to clarify, is the only difference fread -> fgets? Or did you checkout an earlier commit that used fgets?

@johnmaguire
Copy link
Contributor

@ianvonholt I've tried reproducing this in house a few times, and can't seem to do it.

Could you verify that output buffering is turned off? You can do this with the following snippet:

while (ob_get_level() > 0) {
    ob_end_clean();
}

Many frameworks can turn this on by default.

@johnmaguire
Copy link
Contributor

@ianvonholt If you're still interested in trying to get this fixed, please try running the script I provided and letting me know if you get a corrupt ZIP. Otherwise, I'll close this issue as CNR.

nickvergessen referenced this issue in nextcloud-deps/TarStreamer Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants