Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress, decompress, compress results in lots of mismatch errors #62

Open
gingerbeardman opened this issue Apr 22, 2023 · 19 comments
Open

Comments

@gingerbeardman
Copy link
Contributor

Repro

  1. afsctool -c /Applications/PlayOnMac.app (no errors)
  2. afsctool -d /Applications/PlayOnMac.app (no errors)
  3. afsctool -c -T LZFSE /Applications/PlayOnMac.app (errors!)

Comparison

  1. download new version of the app here
  2. afsctool -c -T LZFSE /Applications/PlayOnMac.app (no errors)

Errors

	size mismatch=0 read=57405 failure=0 content mismatch=1 (Undefined error: 0)
/Applications/PlayOnMac.app/Contents/Resources/unix/wine/lib32on64/wine/ext-ms-win-ntuser-draw-l1-1-0.dll: Compressed file check failed, reverting file changes
	size mismatch=0 read=52611 failure=0 content mismatch=1 (Undefined error: 0)
/Applications/PlayOnMac.app/Contents/Resources/unix/wine/lib32on64/wine/api-ms-win-core-libraryloader-l1-1-0.dll: Compressed file check failed, reverting file changes
	size mismatch=0 read=57535 failure=0 content mismatch=1 (Undefined error: 0)
/Applications/PlayOnMac.app/Contents/Resources/unix/wine/lib32on64/wine/ext-ms-win-security-credui-l1-1-0.dll: Compressed file check failed, reverting file changes
	size mismatch=0 read=57414 failure=0 content mismatch=1 (Undefined error: 0)
/Applications/PlayOnMac.app/Contents/Resources/unix/wine/lib32on64/wine/api-ms-win-security-lsalookup-l2-1-1.dll: Compressed file check failed, reverting file changes
	size mismatch=0 read=57513 failure=0 content mismatch=1 (Undefined error: 0)
/Applications/PlayOnMac.app/Contents/Resources/unix/wine/lib32on64/wine/ext-ms-win-gdi-render-l1-1-0.dll: Compressed file check failed, reverting file changes
	size mismatch=0 read=57411 failure=0 content mismatch=1 (Undefined error: 0)
@Dr-Emann
Copy link
Contributor

Dr-Emann commented Apr 25, 2023

This reproduces consistently for me with version of playonmac 4.4.3 which is the current most recent version for me.

@Dr-Emann
Copy link
Contributor

Set a breakpoint on the message output: before afsctool reverts the compression, the file ends up containing all zeros

@Dr-Emann
Copy link
Contributor

Focusing on one of the files which show this behavior: Contents/Resources/unix/wine/lib64/wine/api-ms-win-core-registry-l2-2-0.dll

The file fits in one block. It seems that when compressed with zlib, the file is compressed into the resource fork (it isn't compressed enough to fit in the decmpfs xattr). However, when compressed with lzfse, it does compress enough to fit into the decmpfs xattr.

It's not clear why this is a problem though. I've verified that the decmpfs xattr written is exactly the same when it's compressing a fresh version of the dll and when compressing after it's already been compressed and decompressed, it's not clear what could be carrying over to change things: with the same decmpfs xattr contents, enabling the compressed flag in one case works, and in the other case results in a file which reports to contain all zeros.

It really feels like a macos bug, I don't see any differences I can observe in the file before/after a compress+decompress cycle, but we're getting different behavior.

@Dr-Emann
Copy link
Contributor

I believe this is a macos bug: if a file has been compressed into a resource fork, uncompressing the file, and recompressing directly into the decmpfs xattr, the file will appear as all zeros (of the correct len).

I've created a minimal reproduction: https://gist.github.com/Dr-Emann/3d0f272d08c68a889fa368597a10cea7. If SKIP_RFORK_COMPRESSION_FINISH is defined (with -DSKIP_RFORK_COMPRESSION_FINISH=1 when compiling the c code), the file will be correct. Without the define, the file will instead be filled with zeros.

@RJVB
Copy link
Owner

RJVB commented Apr 26, 2023 via email

@gingerbeardman
Copy link
Contributor Author

gingerbeardman commented Apr 26, 2023

  • PlayOnMac.app approx 1.75 GB
  • ZLIB -9 compresses to 674.9 MB in 2:26.47
  • LZFSE compresses to 621.7 MB in 1:05.79

Results are compressing an app copied fresh out of the DMG.

❯ time afsctool -c -T ZLIB -9 PlayOnMac.app
/Users/matt/Downloads/2023-04-26/PlayOnMac.app:
Number of HFS+/APFS compressed files: 4561
================
CPU	98%
TIME	2:26.47
❯ time afsctool -c -T LZFSE PlayOnMac.app
/Users/matt/Downloads/2023-04-26/PlayOnMac.app:
Number of HFS+/APFS compressed files: 4570
================
CPU	97%
TIME	1:05.79

I'm using a MacBook Pro (M1 Pro CPU, 16GB RAM).

@RJVB
Copy link
Owner

RJVB commented Apr 26, 2023 via email

@gingerbeardman
Copy link
Contributor Author

No worries! It's an eye-opener, for sure.

@RJVB
Copy link
Owner

RJVB commented Apr 26, 2023 via email

@gingerbeardman
Copy link
Contributor Author

Please do, I love a good benchmark.

I wonder how much ARM optimisation Apple have done to these algos. I'm betting "a lot"!

@kapitainsky
Copy link

LZFSE is highly optimized for ARM so not big surprise

@RJVB
Copy link
Owner

RJVB commented Apr 26, 2023 via email

@kapitainsky
Copy link

on Intel mac:

❯ time afsctool -c -T ZLIB -9 PlayOnMac.app
2m52s
650 MB

❯ time afsctool -c -T LZFSE PlayOnMac.app
1m40s
599 MB

so looks like LZFSE is just well done:)

@Dr-Emann
Copy link
Contributor

Dr-Emann commented Apr 26, 2023

I think there's some confusion.

After a file has had compression turned on with a resource fork, compressing the same file into the decmpfs header and turning the compression flag on results in a file that appears to be filled with zeros. The verification step DOES catch this (the original content is not all zeros), and correctly reverts compression.

I verified that the same decmpfs header content works if the file has not already been compressed to a resource fork in the past, but fails if the file has been compressed to a resource fork in the past.

There's nothing compressor specific: it should be possible to find a file which compresses to the resource fork at zlib -1, and fits in the decmpfs header at zlib -9, and it would exhibit the same behavior, compressing with zlib -1, uncompressing, then compressing with zlib -9 would end up with a file containing all zeros (and validation would revert, if enabled).

I'm running macos 13.3.1, the newest version. Feedback reported through the feedback assistant as FB12146617

@RJVB
Copy link
Owner

RJVB commented Apr 26, 2023 via email

@Dr-Emann
Copy link
Contributor

Yes, those steps reproduce the issue. I can confirm that skipping step two does not reproduce the problem.

Replacing step 2 with cat $file1 > $file2, then compressing $file2 with lzfse does not reproduce the problem. Inserting it as a step between 2 and 3 (or using cp between step 2 and 3) also does not reproduce the problem: The sha256sum of the file is the same after each step and it seems to be in some way actually tied to the file identity.

@gingerbeardman
Copy link
Contributor Author

gingerbeardman commented Apr 27, 2023

sha256sum of the file is the same after each step and it seems to be in some way actually tied to the file identity.

This smells of APFS copy in place. I'll try to test on an HFS Extended partition.

@RJVB
Copy link
Owner

RJVB commented Apr 27, 2023

This smells of APFS copy in place.

Yes, but it could also hint at my hunch that it is somehow tied to something to do with the resource fork, possibly something as stupid as not cleaning it up entirely. Except that removeResourceFork does seem to get set when it should, so the xattr and the resourcefork bits always get removed when needed. Do you know of tools to inspect those aspects of files? The xattr command on my OS doesn't list any attributes on files that I know to be compressed...

RJVB added a commit that referenced this issue Apr 27, 2023
It currently compresses on HFS with ZLIB compression <= 8 which
make the compressed content too large for storing in the XATTR.
With ZLIB 9 (afsctool -cfvv 9) it will fail because of a content
mismatch during the verification step. See issue #62.

As far as I can tell having had a file compressed into the resource
fork AT A GIVEN INODE will make compression into the xattr fail at
a later stage. It doesn't even have to be the same file. For instance:
> cat /usr/lib/libSystem.B.dylib > kkk
> afsctool -cfvv -S -L kkk
$PWD/kkk:
Compression type: ZLIB in resource fork (4)
File size (uncompressed; reported size by Mac OS 10.6+ Finder): 59088 bytes / 59 KB (kilobytes, base-10)
File size (compressed): 12288 bytes / 12 KiB
Compression savings: 79.2%
Number of extended attributes: 0
Total size of extended attribute data: 0 bytes
Approximate overhead of extended attributes: 536 bytes
Uncompressed file size reported in compressed header: 59088 bytes
> date > kkk
> afsctool -cfvv -S -L kkk
size mismatch=0 read=30 failure=0 content mismatch=1 (Undefined error: 0)
/Volumes/Debian/Users/bertin/work/src/new/Qt/kkk: Compressed file check failed, reverting file changes
/Volumes/Debian/Users/bertin/work/src/new/Qt/kkk:
Unable to compress file.
File content type: public.data
File data fork size (reported size by Mac OS X Finder): 30 bytes / 4 KB (kilobytes, base-10)
Number of extended attributes: 0
Total size of extended attribute data: 0 bytes
Approximate overhead of extended HFS+ attributes: 0 bytes
Approximate total file size (data fork + resource fork + EA + EA overhead + file overhead): 4344 bytes / 4 KiB
> rm kkk
> date > kkk
> afsctool -cfvv -S -L kkk
/Volumes/Debian/Users/bertin/work/src/new/Qt/kkk:
Compression type: ZLIB in decmpfs xattr (3)
File size (uncompressed; reported size by Mac OS 10.6+ Finder): 30 bytes / 0 KB (kilobytes, base-10)
File size (compressed): 0 bytes
Compression savings: 100.0%
Number of extended attributes: 0
Total size of extended attribute data: 0 bytes
Approximate overhead of extended attributes: 268 bytes
Uncompressed file size reported in compressed header: 30 bytes
@RJVB
Copy link
Owner

RJVB commented Apr 27, 2023

See the commit message of #8b89d8f20e8dbb3efddbbca1f9f7852318727d87

It looks like inodes have some kind of memory?!?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants