Skip to content

Tar: configurable symbolic link handling #126404

@rzikm

Description

@rzikm

Continues from #74404.

The original issue asked for hardlink detection and support for toggling whether hardlinks should be stored as regular files (copies) and similar decision for extraction. This was implemented in #123874.

Similar functionality can be envisioned for symbolic links, and the following API shape was approved as of #74404 (comment)

namespace System.Formats.Tar
{
    public enum TarSymbolicLinkMode
    {
        PreserveLink,
        CopyContents,
        Skip,
    }

    public paritial sealed class TarWriterOptions
    {
        public TarSymbolicLinkMode SymbolicLinkMode { get; set; } = TarSymbolicLinkMode.PreserveLink;
    }

    public partial sealed class TarExtractOptions
    {
        public TarSymbolicLinkMode SymbolicLinkMode { get; set; } = TarSymbolicLinkMode.PreserveLink;
    }
}

However, during implementation, there arose some problems and uncertainties, see #74404 (comment)

For files, it pretty straightforward, headache starts with directory symlinks and CopyContents.

GNU Tar will traverse directory symlinks as if they were normal directories (with -h flag), but that is not straightforward to replicate in .NET because:

The TarSymbolicLinkMode is set on a TarWriter level, so it should apply during a call to TarWriter.WriteEntry(fileName, entryName)
for non-symlinks, when we pass a directory to TarWriter.WriteEntry, we omit one entry (which results in empty directory during extraction), recursing here and writing multiple entries for a directory symlink introduces inconsistency, and writing only a directory entry does not roundtrip w.r.t. users expectations when using TarFile.CreateFromDirectory
Alternative is to implement this link-traversal in TarFile.CreateFromDirectory, but then the TarSymbolicLinkMode should be on "TarCreationOptions" (which we decided to omit from the proposal).
What's more, during extraction, when we encounter a symlink entry, the target file/directory does not necessarily exist yet. (it may be present later in the archive), so we don't know which contents we should copy, we probably need to postpone these entries until all other entries are extracted.

And don't even start with loops created by symbolic links, detecting these during creation is not complicated, but it's another level to do that during extraction (preventing infinite recursion vs finding a loop in a directed graph).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions