Skip to content

Conversation

cowwoc
Copy link

@cowwoc cowwoc commented Oct 5, 2025

Fixes #214

Problem

When using attachedOutputs, executable file permissions are lost during cache restoration. This is because the standard java.util.zip classes do not preserve Unix file permissions.

Solution

Switch from java.util.zip to Apache Commons Compress's ZipArchive classes which support Unix permissions via setUnixMode() and getUnixMode() methods.

Changes

  1. Add Apache Commons Compress 1.28.0 dependency (was already transitive test dependency, now direct compile dependency)

  2. Replace java.util.zip with Commons Compress in CacheUtils:

    • ZipEntryZipArchiveEntry
    • ZipOutputStreamZipArchiveOutputStream
    • ZipInputStreamZipArchiveInputStream
  3. Preserve permissions during zip:

    • Read POSIX permissions with Files.getPosixFilePermissions()
    • Convert to Unix mode integer (e.g., 0755)
    • Store via zipEntry.setUnixMode()
  4. Restore permissions during unzip:

    • Read Unix mode from zipEntry.getUnixMode()
    • Convert to POSIX permissions
    • Apply via Files.setPosixFilePermissions()
  5. Platform safety:

    • Wrap permission operations in try-catch for UnsupportedOperationException
    • Gracefully handle non-POSIX filesystems (Windows, FAT32, etc.)

Why Apache Commons Compress?

Apache Commons Compress was already a transitive test dependency. Using it provides a clean, simple solution compared to alternatives:

With Commons Compress (this PR):

  • ✅ 2 lines to preserve permissions: entry.setUnixMode() / entry.getUnixMode()
  • ✅ Well-tested, maintained by Apache
  • ✅ Handles all edge cases and platform differences
  • ✅ Same Apache 2.0 license

Without Commons Compress (JDK-only approach):

  • ❌ Would require manually encoding Unix permissions in ZipEntry extra field
  • ❌ Complex binary format (InfoZIP extra field specification)
  • ❌ Error-prone and hard to maintain
  • ❌ Platform-specific quirks
  • ❌ No standard API - would need reflection or custom binary encoding

The standard java.util.zip.ZipEntry class provides no methods to access or modify external file attributes that store Unix permissions. Attempting this with JDK-only code would require:

  1. Manual binary encoding of the InfoZIP extra field format
  2. Platform-specific logic to handle different ZIP implementations
  3. Extensive testing across different operating systems
  4. Maintenance burden to keep up with ZIP format changes

In contrast, Apache Commons Compress provides a battle-tested, well-documented API that handles all this complexity for us.

Testing

The changes preserve backward compatibility:

  • Files without Unix mode (mode=0) are unchanged
  • Non-POSIX systems gracefully skip permission operations
  • Existing zip files without permission data work as before

Tested scenarios:

  • ✅ Executable shell scripts (rwxr-xr-x / 0755)
  • ✅ Read-only files (r--r--r-- / 0444)
  • ✅ Regular files (rw-r--r-- / 0644)
  • ✅ Windows filesystems (permissions skipped gracefully)

Related Issues

Copy link
Contributor

@AlexanderAshitkin AlexanderAshitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the permissions on machine X should be the same as those on the build from machine Y. This could lead to problems with cache portability since file metadata is not included in the file checksum - only the content of the file is. The cache doesn't track the permissions in checksums, and changing permissions doesn't affect caching. While I understand the relevance of timestamps, the significance of permissions is less clear to me. Perhaps permissions should only be synced when the build explicitly sets them on outputs, otherwise it becomes a replication of environment which is a slightly different concern.

@cowwoc
Copy link
Author

cowwoc commented Oct 6, 2025

I wonder why the permissions on machine X should be the same as those on the build from machine Y. This could lead to problems with cache portability since file metadata is not included in the file checksum - only the content of the file is. The cache doesn't track the permissions in checksums, and changing permissions doesn't affect caching. While I understand the relevance of timestamps, the significance of permissions is less clear to me. Perhaps permissions should only be synced when the build explicitly sets them on outputs, otherwise it becomes a replication of environment which is a slightly different concern.

  1. I know that you designed this extension for CIs but it sounds like most of the users reporting problems are actually using it for local development.
  2. Permissions lost when using attachedOutputs #214 discusses one scenario where the current behavior causes problems
  3. In the context of preserving permissions across machines, perhaps we should model this after git? It preserves permissions across multiple machines. Are you okay with its approach?

@pn-santos
Copy link

pn-santos commented Oct 6, 2025

To add a bit of context (since I opened #214 to begin with), I was using the maven-build-cache-extension in a project that generated a docker image (using https://github.com/fabric8io/docker-maven-plugin) which in practice meant assembling various files under target/docker which are then used by docker-maven-plugin to create the image in the local docker container.

The issue arose because some resource files (bash scripts) that had exec permissions, and the Dockerfile was written expecting those files to be executable. On a clean build it would work. But once target/docker was cached, it would no longer work since all the files with exec permissions would no longer have them.

We did work around this by using --chmod with COPY within the Dockerfile itself (probably how it should be done anyway) but I still thought it was worth reporting since it is a behaviour that makes the "build-from-cache" not be the same (output-wise) from the "build-from-scratch"

cowwoc added a commit to cowwoc/maven-build-cache-extension that referenced this pull request Oct 6, 2025
When preservePermissions=true (default), Unix file permissions are stored in
ZIP entry headers, making them part of the ZIP file's binary content. This
ensures that hashing the ZIP file (for cache keys) includes permission
information, providing proper cache invalidation when file permissions change.

This addresses the architectural correctness concern raised in PR apache#392 review:
permissions should be in the checksum if they're being preserved, similar to
how Git includes file mode in tree hashes.

Changes:
- Added preservePermissions boolean parameter to CacheUtils.zip() and unzip()
- Added preservePermissions config option to build-cache-config.mdo (default: true)
- Added isPreservePermissions() method to CacheConfig interface and implementation
- Updated CacheControllerImpl to pass preservePermissions config to zip/unzip
- Made permission preservation conditional based on the config flag
- Fixed permissionsToMode() to include file type bits (0100000 for regular files)
- Added comprehensive tests:
  * testPermissionsAffectFileHashWhenEnabled() - Verifies different permissions
    produce different ZIP hashes when preservation is enabled
  * testPermissionsDoNotAffectHashWhenDisabled() - Verifies permissions are NOT
    preserved when the flag is disabled
- Added JavaDoc explaining that permissions become part of cache checksum

Behavior:
- preservePermissions=true: Permissions stored in ZIP, affect cache key (default)
- preservePermissions=false: Permissions not preserved, files use system umask

This is analogous to Git's treatment of file mode in tree hashes, ensuring
that metadata included in cache restoration is also part of cache key computation.
Fixes apache#214

## Problem

When using attachedOutputs, executable file permissions are lost during cache
restoration. This is because the standard java.util.zip classes do not preserve
Unix file permissions.

## Solution

Switch from java.util.zip to Apache Commons Compress's ZipArchive classes which
support Unix permissions via setUnixMode() and getUnixMode() methods.

## Changes

1. **Add Apache Commons Compress 1.28.0 dependency** (was already transitive
   test dependency, now direct compile dependency)

2. **Replace java.util.zip with Commons Compress in CacheUtils**:
   - ZipEntry → ZipArchiveEntry
   - ZipOutputStream → ZipArchiveOutputStream
   - ZipInputStream → ZipArchiveInputStream

3. **Preserve permissions during zip**:
   - Read POSIX permissions with Files.getPosixFilePermissions()
   - Convert to Unix mode integer
   - Store via zipEntry.setUnixMode()

4. **Restore permissions during unzip**:
   - Read Unix mode from zipEntry.getUnixMode()
   - Convert to POSIX permissions
   - Apply via Files.setPosixFilePermissions()

5. **Platform safety**:
   - Wrap permission operations in try-catch for UnsupportedOperationException
   - Gracefully handle non-POSIX filesystems (Windows, FAT32, etc.)

## Why Apache Commons Compress?

Apache Commons Compress was already a transitive test dependency. Using it
provides a clean, simple solution compared to alternatives:

**With Commons Compress** (this PR):
- 2 lines to preserve permissions: entry.setUnixMode() / entry.getUnixMode()
- Well-tested, maintained by Apache
- Handles all edge cases and platform differences
- Same Apache 2.0 license

**Without Commons Compress** (JDK-only approach):
- Would require manually encoding Unix permissions in ZipEntry extra field
- Complex binary format (InfoZIP extra field specification)
- Error-prone and hard to maintain
- Platform-specific quirks
- No standard API - would need reflection or custom binary encoding

## Testing

The changes preserve backward compatibility:
- Files without Unix mode (mode=0) are unchanged
- Non-POSIX systems gracefully skip permission operations
- Existing zip files without permission data work as before

Tested scenarios:
- Executable shell scripts (rwxr-xr-x / 0755)
- Read-only files (r--r--r-- / 0444)
- Regular files (rw-r--r-- / 0644)
- Windows filesystems (permissions skipped gracefully)
cowwoc added a commit to cowwoc/maven-build-cache-extension that referenced this pull request Oct 6, 2025
When preservePermissions=true (default), Unix file permissions are stored in
ZIP entry headers, making them part of the ZIP file's binary content. This
ensures that hashing the ZIP file (for cache keys) includes permission
information, providing proper cache invalidation when file permissions change.

This addresses the architectural correctness concern raised in PR apache#392 review:
permissions should be in the checksum if they're being preserved, similar to
how Git includes file mode in tree hashes.

Changes:
- Added preservePermissions boolean parameter to CacheUtils.zip() and unzip()
- Added preservePermissions config option to build-cache-config.mdo (default: true)
- Added isPreservePermissions() method to CacheConfig interface and implementation
- Updated CacheControllerImpl to pass preservePermissions config to zip/unzip
- Made permission preservation conditional based on the config flag
- Fixed permissionsToMode() to include file type bits (0100000 for regular files)
- Added comprehensive tests:
  * testPermissionsAffectFileHashWhenEnabled() - Verifies different permissions
    produce different ZIP hashes when preservation is enabled
  * testPermissionsDoNotAffectHashWhenDisabled() - Verifies permissions are NOT
    preserved when the flag is disabled
- Added JavaDoc explaining that permissions become part of cache checksum

Behavior:
- preservePermissions=true: Permissions stored in ZIP, affect cache key (default)
- preservePermissions=false: Permissions not preserved, files use system umask

This is analogous to Git's treatment of file mode in tree hashes, ensuring
that metadata included in cache restoration is also part of cache key computation.

Optimize filesystem capability check to avoid repeated exceptions

Addresses reviewer feedback: check once if filesystem supports POSIX file
attributes instead of throwing and catching UnsupportedOperationException
for every file during zip/unzip operations.

Changes:
- Added single filesystem capability check at the beginning of zip() and unzip()
- Removed try-catch blocks from inside the file iteration loops
- Improved performance on non-POSIX filesystems (e.g., Windows)

Before: UnsupportedOperationException thrown and caught for every file
After: Single supportedFileAttributeViews() check per zip/unzip operation

This is more efficient and cleaner code while maintaining identical behavior.
@cowwoc cowwoc force-pushed the fix-preserve-unix-permissions branch from 3b7e662 to 76ed09c Compare October 6, 2025 22:22
cowwoc added a commit to cowwoc/maven-build-cache-extension that referenced this pull request Oct 6, 2025
When preservePermissions=true (default), Unix file permissions are stored in
ZIP entry headers, making them part of the ZIP file's binary content. This
ensures that hashing the ZIP file (for cache keys) includes permission
information, providing proper cache invalidation when file permissions change.

This addresses the architectural correctness concern raised in PR apache#392 review:
permissions should be in the checksum if they're being preserved, similar to
how Git includes file mode in tree hashes.

Changes:
- Added preservePermissions boolean parameter to CacheUtils.zip() and unzip()
- Added preservePermissions config option to build-cache-config.mdo (default: true)
- Added isPreservePermissions() method to CacheConfig interface and implementation
- Updated CacheControllerImpl to pass preservePermissions config to zip/unzip
- Made permission preservation conditional based on the config flag
- Fixed permissionsToMode() to include file type bits (0100000 for regular files)
- Added comprehensive tests:
  * testPermissionsAffectFileHashWhenEnabled() - Verifies different permissions
    produce different ZIP hashes when preservation is enabled
  * testPermissionsDoNotAffectHashWhenDisabled() - Verifies permissions are NOT
    preserved when the flag is disabled
- Added JavaDoc explaining that permissions become part of cache checksum

Behavior:
- preservePermissions=true: Permissions stored in ZIP, affect cache key (default)
- preservePermissions=false: Permissions not preserved, files use system umask

This is analogous to Git's treatment of file mode in tree hashes, ensuring
that metadata included in cache restoration is also part of cache key computation.

Optimize filesystem capability check to avoid repeated exceptions

Addresses reviewer feedback: check once if filesystem supports POSIX file
attributes instead of throwing and catching UnsupportedOperationException
for every file during zip/unzip operations.

Changes:
- Added single filesystem capability check at the beginning of zip() and unzip()
- Removed try-catch blocks from inside the file iteration loops
- Improved performance on non-POSIX filesystems (e.g., Windows)

Before: UnsupportedOperationException thrown and caught for every file
After: Single supportedFileAttributeViews() check per zip/unzip operation

This is more efficient and cleaner code while maintaining identical behavior.
@cowwoc cowwoc force-pushed the fix-preserve-unix-permissions branch from 76ed09c to 3d534ee Compare October 6, 2025 22:27
@cowwoc
Copy link
Author

cowwoc commented Oct 6, 2025

@AlexanderAshitkin Updated commit ready for your review

When preservePermissions=true (default), Unix file permissions are stored in
ZIP entry headers, making them part of the ZIP file's binary content. This
ensures that hashing the ZIP file (for cache keys) includes permission
information, providing proper cache invalidation when file permissions change.

This addresses the architectural correctness concern raised in PR apache#392 review:
permissions should be in the checksum if they're being preserved, similar to
how Git includes file mode in tree hashes.

Changes:
- Added preservePermissions boolean parameter to CacheUtils.zip() and unzip()
- Added preservePermissions config option to build-cache-config.mdo (default: true)
- Added isPreservePermissions() method to CacheConfig interface and implementation
- Updated CacheControllerImpl to pass preservePermissions config to zip/unzip
- Made permission preservation conditional based on the config flag
- Fixed permissionsToMode() to include file type bits (0100000 for regular files)
- Added comprehensive tests:
  * testPermissionsAffectFileHashWhenEnabled() - Verifies different permissions
    produce different ZIP hashes when preservation is enabled
  * testPermissionsDoNotAffectHashWhenDisabled() - Verifies permissions are NOT
    preserved when the flag is disabled
- Added JavaDoc explaining that permissions become part of cache checksum

Behavior:
- preservePermissions=true: Permissions stored in ZIP, affect cache key (default)
- preservePermissions=false: Permissions not preserved, files use system umask

This is analogous to Git's treatment of file mode in tree hashes, ensuring
that metadata included in cache restoration is also part of cache key computation.

Optimize filesystem capability check to avoid repeated exceptions

Addresses reviewer feedback: check once if filesystem supports POSIX file
attributes instead of throwing and catching UnsupportedOperationException
for every file during zip/unzip operations.

Changes:
- Added single filesystem capability check at the beginning of zip() and unzip()
- Removed try-catch blocks from inside the file iteration loops
- Improved performance on non-POSIX filesystems (e.g., Windows)

Before: UnsupportedOperationException thrown and caught for every file
After: Single supportedFileAttributeViews() check per zip/unzip operation

This is more efficient and cleaner code while maintaining identical behavior.
@cowwoc cowwoc force-pushed the fix-preserve-unix-permissions branch 3 times, most recently from 6c41ae5 to 63cf5a8 Compare October 8, 2025 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Permissions lost when using attachedOutputs
3 participants