Progress reporting makes large jumps, doesn't make it to 100%, on large multi-volume archives #67

skito · 2017-12-05T20:03:26Z

Hi There,

There is an issue when observing progress with multivolume archives. For example:
override func observeValue(forKeyPath keyPath: String?, of object: Any?, change: [NSKeyValueChangeKey : Any]?, context: UnsafeMutableRawPointer?)
{
print("Extracting: \(progress.fractionCompleted)");
}

on single volume archive of around 50mb will produce:

Extracting: 8.08055588766906
Extracting: 15.2171524119645
Extracting: 25.2250117802612
Extracting: 32.4267074345756
Extracting: 40.5386480479931
Extracting: 49.5652731965651
Extracting: 57.9420499061096
Extracting: 65.1048882809258
Extracting: 75.3979794214661
Extracting: 82.5477628053197
Extracting: 90.9352648273049
Extracting: 100.0

which is perfect, but in 2.4GB multivolume archive(3 volumes * 800MB) the result is only:

Extracting: 26.7179072845146
Extracting: 37.7852841529133

which causes long progress freeze and sharp finish...

Could you check it please?

P.S. I got v2.9-beta8

Thanks,
Dimitar

The text was updated successfully, but these errors were encountered:

abbeycode · 2017-12-07T15:11:55Z

@skito Hmm, interesting. Are you able to share a Dropbox link (Or equivalent) to the archive for me to test with? I don't have any archives like that. You can DM me with it on Twitter (@DovFrankel)

skito · 2017-12-08T14:06:42Z

@abbeycode Sure. The data in the archive above is confidential, but here is similar archive that behaves the same:
https://my.pcloud.com/publink/show?code=kZJOyJ7Z1xpp9zAH5tkGuTQTITqLhfKDF727

My output:
Start extraction
Extracting: 7.02339954843241e-05
Extracting: 7.09789539958747e-05
Extracting: 0.0770643468305657
Extracting: 0.0771007285253158 - immediately after start
Extracting: 25.0103128331547
Extracting: 25.0182603104251 - 2 minutes after start (approximately)
Extracting: 50.0425241028979 - 4 minutes after start (approximately)
Extraction finished - 8-10 minutes after start (approximately)

Let me know if you have difficulties replicating the issue.

skito · 2017-12-15T15:23:45Z

Also .uncompressedSize property of URKArchive doesn't seems to be correct on multivolume archives.

Multivolume archive that in uncompressed state is 3.18 GB is shown as 7.7 GB when accessing .uncompressedSize before starting the extraction...

abbeycode · 2017-12-24T18:03:22Z

@skito the uncompressedSize is another interesting one. Would you mind reporting as a separate issue?

abbeycode · 2017-12-24T22:06:15Z

Alright, so I was able to reproduce with the archive you linked to. I added some additional diagnostic logging. These are the two things I notice:

The jumps in progress you mentioned
Progress ends at 50% or so, rather than 100%

The first item is expected behavior. When you extract files, the progress increments as each file completes. The archive you sent has a couple large files, and the rest are tiny. That's why you see those jumps. If it's important to you to see more granular updates, you can extract the files by listing them and then extracting to a buffer and writing to disk in your own code, so you'll see as each block completes.

The second one isn't expected, but as far as I can tell, the archive header is reporting inaccurate metadata (a total uncompressed archive size of 5.38 GB instead of 2.89 GB it actually is). That's the same issue you mentioned above, but I'm not yet sure what's causing it. I'll have to see which header(s) the total uncompressed size comes from, and check if the archive you sent has correct data stored. If not, then I'll have to fix it on my end.

…archives, which resulted in incorrect total uncompressed size being reported. It looks like the file that spans multiple archive parts would get reported for each of the parts that contained it (Issue #67)

abbeycode · 2017-12-26T13:04:46Z

Looking into it, there was definitely a bug that resulted in an incorrect total uncompressed size being reported. Files that spanned multiple volumes would get listed twice by -listFileInfo:. I added uniquing of the paths reported by -listFileInfo:, resolving the second issue above. No need to report it as a separate issue, since I don't think there's anything for me to do with the progress updating being sparse, as explained above. I'll include this update in the next beta.

skito · 2017-12-26T13:05:45Z

Thanks for the diagnostics!

About the progress - is there some sort of callback function that I can pass from Swift and manually handle the progress as you suggested? If so, could you provide very basic example of it?
About the uncompressedSize property - it's possible to be wrong headers, but I noticed this issue with several multi-volume archives compressed with different software on both Mac and Windows. All of them reported about twice their actual size. Maybe it's due to double counting somewhere while reading the headers? Do you still want me to reported as separate issue?

skito · 2017-12-26T13:07:56Z

Oh, alright then. Could you please provide basic example of handling the progress blocks?

… multivolume archive (Issue #67)

skito · 2017-12-26T13:37:10Z

I just saw that you already provided the example in the readme.

BOOL success = [archive extractBufferedDataFromFile:@"a file in the archive.jpg" error:&error action: ^(NSData *dataChunk, CGFloat percentDecompressed) { NSLog(@"Decompressed: %f%%", percentDecompressed); // Do something with the NSData chunk }];

Sorry for the dummy request :)

abbeycode · 2017-12-26T13:44:29Z

No problem :) This is what I had in mind. It compiles, but I haven't tested it:

let archiveURL: URL = //
let outputDirURL: URL = //

guard let archive = try? URKArchive(url: archiveURL) else {
    return
}

guard let fileInfos = try? archive.listFileInfo() else {
    return
}

let totalArchiveSize = archive.uncompressedSize!.int64Value
var totalExtracted = Int64(0)

for fileInfo in fileInfos {
    guard let fileHandle = try? FileHandle(forWritingTo: outputDirURL.appendingPathComponent(fileInfo.filename)) else {
        continue
    }
    
    defer {
        fileHandle.closeFile()
    }
    
    do {
        try archive.extractBufferedData(fromFile: fileInfo.filename) { (data, progress) in
            fileHandle.write(data)
            totalExtracted += Int64(data.count)
            
            NSLog("%f%% done with \(fileInfo.filename)", progress)
            NSLog("%f%% done with archive", (Double(totalExtracted) / Double(totalArchiveSize)))
        }
    } catch let extractError {
        NSLog("Error extracting \(fileInfo.filename): \(extractError)")
        continue
    }
}

skito · 2017-12-26T15:06:55Z

Awesome. Thanks!

abbeycode · 2017-12-26T16:52:03Z

I'm closing this, since the fix was merged in for the v2.9 release. Look for it in the next beta!

abbeycode self-assigned this Dec 7, 2017

abbeycode added the bug label Dec 7, 2017

abbeycode added a commit that referenced this issue Dec 24, 2017

Added diagnostic logging (Issue #67)

740f9fd

abbeycode added a commit that referenced this issue Dec 26, 2017

Added test case verifying the fix for count of file info items from a…

623fd88

… multivolume archive (Issue #67)

abbeycode changed the title ~~Strange NSProgress observe behaviour on multivolume archives~~ Progress reporting makes large jumps, doesn't make it to 100%, on large multi-volume archives Dec 26, 2017

abbeycode mentioned this issue Dec 26, 2017

Fix listFileInfo for multi-volume archives #69

Merged

abbeycode closed this as completed Dec 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progress reporting makes large jumps, doesn't make it to 100%, on large multi-volume archives #67

Progress reporting makes large jumps, doesn't make it to 100%, on large multi-volume archives #67

skito commented Dec 5, 2017 •

edited

abbeycode commented Dec 7, 2017

skito commented Dec 8, 2017

skito commented Dec 15, 2017 •

edited

abbeycode commented Dec 24, 2017

abbeycode commented Dec 24, 2017

abbeycode commented Dec 26, 2017

skito commented Dec 26, 2017

skito commented Dec 26, 2017

skito commented Dec 26, 2017 •

edited

abbeycode commented Dec 26, 2017

skito commented Dec 26, 2017

abbeycode commented Dec 26, 2017

Progress reporting makes large jumps, doesn't make it to 100%, on large multi-volume archives #67

Progress reporting makes large jumps, doesn't make it to 100%, on large multi-volume archives #67

Comments

skito commented Dec 5, 2017 • edited

abbeycode commented Dec 7, 2017

skito commented Dec 8, 2017

skito commented Dec 15, 2017 • edited

abbeycode commented Dec 24, 2017

abbeycode commented Dec 24, 2017

abbeycode commented Dec 26, 2017

skito commented Dec 26, 2017

skito commented Dec 26, 2017

skito commented Dec 26, 2017 • edited

abbeycode commented Dec 26, 2017

skito commented Dec 26, 2017

abbeycode commented Dec 26, 2017

skito commented Dec 5, 2017 •

edited

skito commented Dec 15, 2017 •

edited

skito commented Dec 26, 2017 •

edited