Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large file support #4

Closed
fasterthanlime opened this issue Jan 27, 2019 · 16 comments
Closed

Large file support #4

fasterthanlime opened this issue Jan 27, 2019 · 16 comments

Comments

@fasterthanlime
Copy link
Contributor

14/15 of my .rar files (taken from the top downloaded @itchio .rar uploads) now decompress normally.

The winner is... https://contragon.itch.io/contra-2028 which requires large file support:

 big ツ dunrar CONTRA\ 2028.rar
2019/01/27 22:23:19 Opening file...
2019/01/27 22:23:19 Opening as RAR archive...
2019/01/27 22:23:19 Listing contents...
2019/01/27 22:23:19 dmc_unrar_file_is_supported: error 25: Unsupported large file
github.com/itchio/dmcunrar-go/dmcunrar.checkError
        /home/amos/Dev/go/src/github.com/itchio/dmcunrar-go/dmcunrar/glue.go:366
github.com/itchio/dmcunrar-go/dmcunrar.(*Archive).FileIsSupported
        /home/amos/Dev/go/src/github.com/itchio/dmcunrar-go/dmcunrar/glue.go:176
main.main
        /home/amos/Dev/go/src/github.com/itchio/dmcunrar-go/cmd/dunrar/main.go:70
runtime.main
        /usr/local/go/src/runtime/proc.go:201
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1333
for file CONTRA 2028/ContraReboot/Content/Paks/ContraReboot-WindowsNoEditor.pak
2019/01/27 22:23:19 Extracting 55 files, 2.65 GiB uncompressed
@DrMcCoy
Copy link
Owner

DrMcCoy commented Jan 27, 2019

The problem with large file support is that it may need non-standard extensions, especially when compiling for 32-bit. I.e. fseek()/ftell() will return 32-bit values, size_t might not be large enough, etc.

I don't have any experience with this, so I was playing it safe.

I would appreciate some help here, to be honest. :)

@fasterthanlime
Copy link
Contributor Author

fasterthanlime commented Jan 27, 2019

Well, I just read https://stackoverflow.com/questions/30867831/portable-support-for-large-files and I understand why you dread this particular feature... this bit in particular:

image

shivers

I suppose one (simple) way to fix it for 97% of users is to only support large files on 64-bit?

@DrMcCoy
Copy link
Owner

DrMcCoy commented Jan 27, 2019

Well, 97% of users on Linux and macOS. MSVC doesn't have fseeko/ftello.

@DrMcCoy
Copy link
Owner

DrMcCoy commented Jan 27, 2019

Oh, and while we're at it: dmc_unrar_extract_file_to_path(), since it uses fopen(), will fail on Windows with non-ASCII paths. I.e. if you use the UTF-8 encoded file names dmc_unrar returns, and just throw them into dmc_unrar_extract_file_to_path() (like the example.c does), you'll get garbage on Windows.

I should probably add a comment for that onto the function.

@fasterthanlime
Copy link
Contributor Author

Oh, and while we're at it: dmc_unrar_extract_file_to_path(), since it uses fopen(), will fail on Windows with non-ASCII paths.

No worries there, I'm using callbacks both for reading and writing files, so, that's not an issue. Would large file be easier to support if it was only supported for "callbacks-style I/O"? (as opposed to FILE* I/O).

@fasterthanlime
Copy link
Contributor Author

Well, 97% of users on Linux and macOS. MSVC doesn't have fseeko/ftello.

From what I can see in the SO thread, the only solution there is to switch from fopen, fseek, ftell (*FILE) to open, lseek, ltell (fd), which is bad news since you lose buffering :/

@DrMcCoy
Copy link
Owner

DrMcCoy commented Jan 27, 2019

Yeah, it's unfortunately all horrible. (Which is why I'm so glad https://github.com/xoreos/xoreos/ doesn't need any of this because all data files of the targeted games are deliberately below 2GB :P)

Only supporting it for callback-style I/O and memory on 64-bit would be the easiest, yeah. So not supporting it in dmc_unrar_extract_file_to_file() and dmc_unrar_extract_file_to_path(), and not for 32-bit. So I'm going to check for sizeof(size_t) >= 8, I guess, when determining whether it's supported.

@DrMcCoy
Copy link
Owner

DrMcCoy commented Jan 27, 2019

Oh, and of course only for input archives that come from callback or memory, not file

@fasterthanlime
Copy link
Contributor Author

Only supporting it for callback-style I/O and memory on 64-bit would be the easiest, yeah. So not supporting it in dmc_unrar_extract_file_to_file() and dmc_unrar_extract_file_to_path(), and not for 32-bit. So I'm going to check for sizeof(size_t) >= 8, I guess, when determining whether it's supported.

Well... LFS with callback-style I/O on 32-bit would also be relatively easy - you just can't use size_t anymore (more #ifdefs, woo!)

@DrMcCoy DrMcCoy mentioned this issue Jun 30, 2020
@DrMcCoy
Copy link
Owner

DrMcCoy commented Jun 30, 2020

Tagging in itchio/butler#210, because the problem is large file support that is being tracked here.

As a little status update, @clone2727 and I have been working a bit on large file support. It's still WIP, but we're getting there.

Things that currently work:

  • Large files on Linux 64-bit
  • Large files on Windows 64-bit

That includes both extraction of large files from small archives and extraction of large archive, both with RAR4 and RAR5 archives.

Things that currently don't work:

  • Large files on Linux 32-bit
  • Large files on Windows 32-bit
  • Large files on macOS

For the 32-bit builds, we're going to need to rip out my usage of size_t everywhere and just go with uint64_t [1]. On Windows, it should then work out of the box due to 397a391. On Linux, _FILE_OFFSET_BITS will need to be set to 64.

For macOS, @clone2727 is looking at how to do that as well. He's got access to macOS X 10.6 IIRC.

Another thing that's needed is some kind of check for large file support, forceable either way with a #define. That'll use the Windows check for Windows (if so, LFS is there), for Linux _FILE_OFFSET_BITS can be checked if it's a 32-bit build. Still need a way to check if it's a Linux 64-bit build, and no idea about macOS either yet.

In either case, the automatic check should probably rather err on being cautious. I'd rather throw an error instead of creating broken output files.

(Note that currently, the check in https://github.com/DrMcCoy/dmc_unrar/blob/master/dmc_unrar.c#L3855 still rejects large files for now, as it's all still WIP. That check needs to be removed if you wanted to test it.)

[1] Are there performance implications here?

@DrMcCoy
Copy link
Owner

DrMcCoy commented Jun 30, 2020

Or maybe I should say "glibc" instead of Linux. It probably covers GNU Hurd as well, I'd assume.

Conversely, it might not cover builds using the musl libc. Or any other libc like dietlibc, μClinux, klibc, ... I'm not going to worry too much about those, but I should probably document that in dmc_unrar.c.

@clone2727
Copy link
Contributor

In theory, any solution that uses fseeko and ftello should be workable in any POSIX system (as long as off_t is 64-bit). So if that works, then that can just be used in all the other libc's.

@fasterthanlime
Copy link
Contributor Author

The direction looks good to me, I'm probably not going to have time to test a wip version well enough though, I'll be following this thread to monitor progress!

@DrMcCoy
Copy link
Owner

DrMcCoy commented Jun 30, 2020

WIP here: https://github.com/DrMcCoy/dmc_unrar/tree/size_t

Current rough roadmap:

  • Replace size_t
  • Add a wrapper to dmc_unrar_archive_open_file() for Windows
  • Generic case works for linux32 (glibc), linux64 (glibc), win32, win64
    • _FILE_OFFSET_BITS needs to be set to 64 on linux32 glibc
  • macOS testing
  • Test all RAR features with LFS
    • Solid archives (RAR4, RAR5) (but see Bug in RAR5 solid files #7)
    • Delta filter (RAR4, RAR5)
    • 32-bit x86 filter (RAR4, RAR5)
    • Audio filter (RAR4)
    • RGB filter (RAR4)
    • Itanium filter (RAR4)
    • PPMd (RAR4)
    • Archive comments (RAR4, RAR5)
    • File comments (RAR4)
    • SFX (RAR4, RAR5)
    • Store (RAR4, RAR5)
  • Autodetect macro whether LFS is available
  • Add a paragraph at the top describing the configuration flags for LFS

@DrMcCoy
Copy link
Owner

DrMcCoy commented Jul 13, 2020

Giving a bit of a status update, summing up the edits I did in my previous comment: a WIP branch implementing large file support is here: https://github.com/DrMcCoy/dmc_unrar/tree/size_t .

I've tested it on Linux 32-bit, Linux 64-bit, Windows 32-bit and Windows 64-bit, and it works with all the features I threw at it.

I did, however, found another bug: #7 . Basically, RAR5 solid archives (i.e. multiple files compressed together as one chunk) are mostly broken in dmc_unrar and have always been. It's just that my previous test files didn't trigger the bug. I'm tracking this independently of large file support, though.

Unlike me, @clone2727 has access to a Mac, so he's currently testing stuff on Mac OS X (10.6, if I'm not mistaken).

Also thanks to @clone2727, dmc_unrar_is_rar_path() and dmc_unrar_archive_open_path() actually support full UTF-8 paths on Windows now, when using the WinAPI wrappers he wrote (by defaults automaitically used on Windows, but can be overruled by defining DMC_UNRAR_DISABLE_WIN32 to either 1 (disable) or 0 (force enable)).

We're still missing a similar wrapper for dmc_unrar_extract_file_to_path(), but that should be do-able. This would then make my comment #4 (comment) obsolete once we have that. :)
EDIT: This is now live :)

Roadmap right now is that I'm waiting for @clone2727 to give me a thumbs-up on Mac OS X, in the meanwhile I'll work on bug #7. If I get that thumbs-up before fixing bug #7, I'll release a new version with LFS and do a patch-release for #7 later. Otherwise, the bug fix goes into that release as well. I'll probably also want the dmc_unrar_extract_file_to_path() wrapper in that release; should be quick.

Well, unless of course it doesn't work on Mac OS X, then we'll have to fix that first. :P

Repository owner deleted a comment from BaseMax Jul 27, 2020
@DrMcCoy
Copy link
Owner

DrMcCoy commented Aug 20, 2020

Okay, this has now all been tested on Linux, Windows and Mac OS X (thanks @clone2727!) and it all seems to work well.

So I've just released dmc_unrar 1.7.0 which includes full large file support: https://github.com/DrMcCoy/dmc_unrar/releases/tag/v1.7.0

Unfortunately, I didn't have any luck so far tracking down bug #7, or rather find an easy way to fix it. I fear I have to redo solid files from scratch, because I was basing it on partially wrong assumptions.

@fasterthanlime, feel free to pass this along to the rest of the itch.io people and/or whoever will take over your duties there. And again good luck with your future endeavours, of course! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants