Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add property to ZipArchiveEntry to indicate if the entry is encrypted #68897

Closed
jeffhandley opened this issue May 5, 2022 · 6 comments · Fixed by #70036
Closed
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.IO.Compression
Milestone

Comments

@jeffhandley
Copy link
Member

jeffhandley commented May 5, 2022

Background and motivation

By specification, Zip archives support "weak" and "strong" entry-level encryption. Weak encryption of an entry (via the ZipCrypto algorithm) is indicated in the entry-level General Purpose Bit Flag value by setting bit 0 to 1. Strong encryption (e.g. AES-*) is indicated with both bits 0 and 6 (PKZIP) or by setting bit 0 and using an extra data field (WinZip). With any encryption scheme, General Purpose Bit Flag bit 0 will be set if an entry is encrypted.

Our APIs do not currently expose any indicators for whether an archive contains any weak- or strong-encrypted entries, nor do we expose the General Purpose Bit Flag for advanced scenarios that require checking bits within that value. The end result is that calling code cannot determine if an entry is encrypted. Furthermore, when attempting to extract an encrypted entry from a ZipArchive, two different behaviors are observed:

  1. For weak (ZipCrypto) encryption, the encrypted data is received as the contents of the entry, and there is no means for identifying that the entry was encrypted. This can lead to invalid or seemingly corrupt data to be passed to the next subsystem.
  2. For strong encryption, an InvalidDataException is thrown when attempting to extract the entry. This exception type is also thrown in other circumstances, and there is no discriminating information to indicate the reason for the exception; thus, there is no means for identifying that the entry was encrypted.

Issue #57197 requests a means for detecting if a ZipArchiveEntry is encrypted while iterating over the entries in the archive.

API Proposal

namespace System.IO.Compression
{
    public class ZipArchiveEntry
    {
        public bool IsEncrypted { get; }
    }
}

API Usage

try
{
    if (!entry.IsEncrypted)
    {
        using (var entryStream = entry.Open())
        {
            entryStream.CopyTo(documentStream);
        }
    }
    else
    {
        // App-specific logic for how to handle/log that the zip contained encrypted entries
    }
}
catch (InvalidDataException)
{
    // This will no longer occur for encrypted entries
}

Additional APIs to Consider

Expose an Archive-Level Property

We could additionally include an archive-level property to indicate if any of the entries are encrypted. However, since the encryption information is stored at the entry-level "General Purpose Bit Flag" value, populating this property would require iterating over the entries as is shown in the usage example above. If demand for this archive-level property exists, it could be added to ZipArchive alongside the existing Entries and Mode properties. Because ZipArchive currently contains no other aggregate information about the entries though, I recommend we do not add this property unless sufficient requests are received after adding the entry-level property.

using System.Collections.ObjectModel;

namespace System.IO.Compression
{
    public class ZipArchive
    {
        public ReadOnlyCollection<ZipArchiveEntry> Entries { get; }
        public ZipArchiveMode Mode { get; }
        public bool HasEncryptedEntries { get; } // Access would ensure the entries have been enumerated internally
    }
}

Specify Encryption Handling Behavior to ZipFile Accelerators

The ZipFile class provides static accelerator methods, including Open, OpenRead, and ExtractToDirectory. The Open and OpenRead methods simply return a ZipArchive without taking any other action and they succeed with weak- or strong-encrypted entries present. The ExtractToDirectory method exhibits the same behavior as described above, with an InvalidDataException being thrown for strong-encrypted entries and silently extracting encrypted entries for weak encryption.

We could add overloads to ZipFile.ExtractToDirectory to specify how encrypted entries should be handled. I recommend we do not pursue this unless sufficient demand emerges. At that time, we could learn more about the scenarios to determine if we need an enum to define different behaviors or if we merely need to allow a boolean to indicate encrypted entries should be skipped. Without these overloads, a caller can instead use ZipFile.Open[Read] and iterate over the entries.

Alternative Designs

Exposing the Entry Encryption Algorithm

An alternative design would use an enum to indicate the encryption algorithm used for an entry:

namespace System.IO.Compression
{
    public class ZipArchiveEntry
    {
        public ZipEntryEncryption Encryption { get; }
    }

    public enum ZipEntryEncryption 
    {
        None,
        Weak, // ZipCrypto
        Aes128,
        Aes192,
        Aes256,
        Unknown = 0xFFFF,
    }
}

For Add password to ZipArchive · Issue #1545 · dotnet/runtime, an enum along these lines is very likely to be needed. Without fully designing the APIs for those features though, it's possible that the enum illustrated above might not be the right design for the full feature set. Since the enum is not strictly necessary for detecting if there are password-protected/encrypted entries, I recommend we stick with the simple boolean for now. If #1545 is implemented and we end up adding an Encryption property that uses an enum similar to what's shown here, then the IsEncrypted property would become a convenience method over checking if Encryption is ZipEntryEncryption.None or another value. For the initial implementation though, IsEncrypted would only need to be based on General Purpose Bit 0 being set.

Exposing the General Purpose Bit Flag Value

We could optionally expose the General Purpose Bit Flag value on ZipArchiveEntry, either in addition to or instead of IsEncrypted. Exposing this value would allow callers to check other bits to glean more about the archive entry than our APIs support.

namespace System.IO.Compression
{
    public class ZipArchiveEntry
    {
        public ushort GeneralPurposeBits { get; }
    }
}

From there, we could also ostensibly introduce an enum that names each of the bits per the specification. There has not been demand for this low-level data to be exposed however, so I recommend we do not pursue this approach unless demand emerges.

netstandard support via Exception.Data

Some of the requests for this functionality came with the desire for netstandard support. Without adding new API surface, a failed extraction could include data on the exception instance that indicates why the InvalidDataException was thrown. For instance, Exception.Data could include a key/value pair of "IsEncrypted" and the boolean indicating if the entry was encrypted. However, this would rely on exception handling to control flow, which is an anti-pattern.

Risks

The Zip file format also includes the ability to encrypt some of the central directory metadata, including file names. This API design does not take that concept into account. Popular tools such as WinZip, 7-Zip, and WinRAR do not support this capability either though. More typically, a password is set for the zip such that it is applied to each of the entries individually, and the central directory metadata remains unencrypted.

Based on the specification for how central directory metadata encryption is implemented, the entry count values are to be obfuscated if the central directory is encrypted (reference). That expectation would likely introduce other issues with the current ZipArchive implementation if we sought to support encryption of the central directory. However, the file entry header General Purpose Bit Flag would remain the same, with bit 0 still indicating that the entry is also encrypted; thus the proposed design would still apply.

@jeffhandley jeffhandley added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO.Compression labels May 5, 2022
@jeffhandley jeffhandley added this to the 7.0.0 milestone May 5, 2022
@ghost
Copy link

ghost commented May 5, 2022

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

We have received requests to be able to detect if a ZipArchive contains password-protected entries. Without this information, the consumer of a zip file cannot determine if the contents can be extracted ahead of time. During extraction of the entire archive or a single entry, an InvalidDataException is thrown if an entry is encrypted, but that exception can be thrown in other circumstances as well.

By adding a property to indicate if an entry is encrypted, the entries could be inspected ahead of time and/or skipped during extraction.

See #57197 for more background on the requests.

API Proposal

namespace System.IO.Compression;

public class ZipArchiveEntry
{
    public bool IsEncrypted { get; }
}

API Usage

private bool ArchiveContainsEncryptedEntries(ZipArchive archice)
{
    foreach (var entry in archive.Entries)
    {
        if (entry.IsEncrypted)
        {
            return true;
        }
    }

    return false;
}

Alternative Designs

Exposing an Archive-level Property

We could consider additionally including an archive-level property to indicate if any of the entries are encrypted. However, populating this property would require iterating over the entries as is shown in the usage example above. Until such an API request is received, I recommend we limit the new API surface to the entry-level property.

Exposing the Encryption Level

Zip archives contain "weak" and "strong" encryption. Weak encryption (via the ZipCrypto algorithm) is indicated on a file with a single bit set within the General Purpose Bit Flags (bit 0). Strong encryption (e.g. AES-*) is indicated with both bits 0 and 6 being set in the General Purpose Bit Flags.

An alternative design would use an enum to indicate the encryption level of an entry:

  • None
  • Weak
  • Strong

Unless we intend to offer APIs in the future that would enable the extraction of encrypted entries, the weak vs. strong indicator is unnecessary. With that consideration, I recommend we use a simple boolean to indicate if the entry is encrypted, regardless of the encryption level or algorithm. And with the boolean approach, we only need to check General Purpose Bit 0, since that bit must be set for both weak and strong encryption.

netstandard support via Exception.Data

Some of the requests for this functionality came with the desire for netstandard support. Without adding new API surface, a failed extraction could include data on the exception instance that indicates why the InvalidDataException was thrown. For instance, Exception.Data could include a key/value pair of "IsEncrypted" and the boolean indicating if the entry was encrypted. However, this would rely on exception handling to control flow, which is an anti-pattern.

Risks

The Zip file format also includes the ability to encrypt some of the central directory metadata, including file names. This API design does not take that concept into account. Popular tools such as WinZip, 7-Zip, and WinRAR do not support this capability either though. More typically, a password is set for the zip such that it is applied to each of the entries individually, and the central directory metadata remains unencrypted.

Based on the specification for how central directory metadata encryption is implemented, the entry count values are to be obfuscated if the central directory is encrypted (reference). That expectation would likely introduce other issues with the current ZipArchive implementation if we sought to support encryption of the central directory. However, the file entry header General Purpose Bit Flags would remain the same, with bit 0 still indicating that the entry is also encrypted; thus the proposed design would still apply.

Author: jeffhandley
Assignees: -
Labels:

api-suggestion, area-System.IO.Compression

Milestone: 7.0.0

@jozkee
Copy link
Member

jozkee commented May 13, 2022

For the record, PKWare strong encryption requires a license, see #1545 (comment)

The PKZIP encryption seems to require a license to implement.
The WinZip encryption scheme does not.

WinZIP encryption doesn't use the bit 6, it uses an extra data field with header 0x9901.

Now, API-wise, I think bool IsEncrypted can be a convenient API to just skip the encrypted entries for now, but once we address reading and creation of encrypted entries (which is very likely given the popularity of the issue), we may need another field to describe the AES strength (128, 192, or 256 bit).

My point being that once we extend the encryption support, bool IsEncrypted field will be redundant (and somewhat obsolete) compared to an enum that tells you more about the encryption information of the entry.

Yet another alternative, similar to your "Exposing the Entry Encryption Level" design, could be:

namespace System.IO.Compression
{
    public class ZipArchiveEntry
    {
+        public ZipEntryEncryption Encryption { get; }
    }

+    public enum ZipEntryEncryption 
+    {
+        None,
+        Weak, // ZipCrypto
+        Aes128,
+        Aes192,
+        Aes256,
+        Unknown = 0xFFFF,
+    }
}

@jeffhandley
Copy link
Member Author

jeffhandley commented May 13, 2022

Thanks, @jozkee; I had not read the WinZip documentation you linked to.

With the enum approach, callers that needs to determine if an entry is encrypted would use if (entry.Encryption != ZipEntryEncryption.None), which is slightly less ergonomic than if (entry.IsEncrypted), but it's still reasonable. We could also consider adding IsEncrypted as a bool, only storing that bool for now; then when more encryption functionality is added, use IsEncrypted as a convenience over Encryption != ZipEntryEncryption.None.

@jozkee jozkee added blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels May 13, 2022
@bartonjs
Copy link
Member

bartonjs commented May 19, 2022

Video

Looks good as proposed. It seems like there's some missing user story around what to do when it's true, but it does provide incremental value.

namespace System.IO.Compression
{
    public class ZipArchiveEntry
    {
        public bool IsEncrypted { get; }
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 19, 2022
@jeffhandley jeffhandley self-assigned this May 21, 2022
@deeprobin
Copy link
Contributor

// This will no longer occur for encrypted entries

Should we mark this as a breaking-change? If the InvalidDataException is no longer thrown isn't that one?

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 31, 2022
@jeffhandley
Copy link
Member Author

// This will no longer occur for encrypted entries

Should we mark this as a breaking-change? If the InvalidDataException is no longer thrown isn't that one?

Sorry; that code comment was slightly misleading. This isn't a breaking change; ZipArchiveEntry will still throw InvalidDataException, but the code sample shown above will not make the call that would have resulted in that exception. Previously, there was no way for the calling code to know whether or not the contents could be copied out; now the caller can.

Thanks for noticing that detail though and calling it out, @deeprobin.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 20, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.IO.Compression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants