Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix symlink handling on a source data match #4265

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

dferreyra
Copy link
Contributor

Change Duplicati to also honor the symlink policy when the symlink is
included because of a source data match.

Without these changes, Duplicati doesn't always correctly handle symlinks when explicitly chosen for a backup. For example, let's say you explicitly choose a directory symlink to back up. If the symlink policy is set to "Ignore", Duplicati does the following:

  1. Stores no entry for the directory symlink, like what happens with a symlink policy of "Ignore"
  2. Stores files and directories from the symlink target, like what happens with a symlink policy of "Follow"

If the symlink policy is set to "Store", Duplicati does the following:

  1. Stores a "Symlink" entry for the directory symlink, like what happens with a symlink policy of "Store"
  2. Stores files and directories from the symlink target, like what happens with a symlink policy of "Follow"

If the symlink policy is set to "Follow", Duplicati behaves correctly.

Fix how Duplicati handles the backup of a symlink when the symlink is
included because of a source data match.
@warwickmm
Copy link
Member

I can see how the current behavior is inconsistent with the symlink policy. However, I wonder if most users would expect the "follow" behavior when a symlink is selected as a source directory. Otherwise, what's the purpose of explicitly selecting a directory symlink as a backup source if it's not going to be followed? I guess if the policy is "store", then the symlink will be recorded, but that's about it.

In other words, I can see an argument for why source directory symlinks might be treated differently. Perhaps others might have some thoughts on this.

/cc @drwtsn32x, @ts678, @kenkendk

@kenkendk
Copy link
Member

kenkendk commented Aug 7, 2020

I can see why this is inconsistent and confusing as @dferreyra points out.

My initial thought was more along the lines of what @warwickmm mentions, namely that including a symlink would mean that you want to include the contents, not the link itself. I think this still holds somewhat, as there is very little value in recording a single symlink (at least it is simple to recover manually).

If we change this, there might be users who rely on the inconsisten behavior, whose backups will contain fewer files than what is expected. My default reaction to something like this is usually "why not add another option?", but this seems a bit misplaced here. I think we can "fix" this a bit by simply choosing to ignore the symlink policy for the source folders and always apply the "follow" policy here, which would make it at least a bit more consistent.

I think a similar scenario is where the user would like to request a backup of a single folder, and then add symlinks to this folder that points to the real data, removing the need to update the backup job. This requires a policy of "follow", but then applies that policy to all subfolders as well, which is not always desirable.

To fix this, we would essentially need to add something similar to the filter system, so the user can selectively choose policy based on paths. If we do this, I suggest we use a default policy of "Follow", including for the source folders, and then allow the user to selectively change the policy.

@dferreyra
Copy link
Contributor Author

@kenkendk, regarding:

I think we can "fix" this a bit by simply choosing to ignore the symlink policy for the source folders and always apply the "follow" policy here, which would make it at least a bit more consistent.

If you'd like me to submit a commit that implements this behavior, let me know.

@warwickmm
Copy link
Member

I think we can "fix" this a bit by simply choosing to ignore the symlink policy for the source folders and always apply the "follow" policy here, which would make it at least a bit more consistent.

If you'd like me to submit a commit that implements this behavior, let me know.

I think that makes sense.

Change Duplicati to use the "follow" policy for symlinks that are
explicitly selected for backup.  Other symlinks continue to be backed
up according to the configured symlink policy.
@dferreyra
Copy link
Contributor Author

I've added a commit that essentially undoes the previous commit, and applies changes to support a "follow" policy for direct matches against source data filters.

Because FileEnumerationProcess determines the filter matches and MetadataPreProcess follows the symlink policy for what to store for a symlink, the source filter match information needs to be conveyed between the processes. This is done by changing the channel between them from string to a new FileEnumerationEntry class. FileEnumerationEntry contains a path (i.e., the original data that was being communicated) plus a bool to say whether it was a result of a direct source filter match.

Also, since the match information needs to come out of calls to AttributeFilter(), I changed EnumerationFilterDelegate to be a function that returns EnumerationFilterResult instead of bool. EnumerationFilterResult contains a bool for whether to continue recursing (i.e., the original return value) plus a bool to say whether the path it was given resulted in a direct source filter match.

These changes were a bit invasive. In all cases I tried to mechanically preserve the sematics of the existing code. Maybe there's a less invasive way to implement this. That being said, maybe this can ge seen as a small step toward what @kenkendk suggested:

...we would essentially need to add something similar to the filter system, so the user can selectively choose policy based on paths.

I've done some basic tests, including backing up symlinks by referencing them directly, using the three symlink policies and expecting the results to always be a "follow" policy. And I tried backing them up by selecting a parent directory instead, in this case expecting the symlinks to be backed up per each configured symlink policy.

I also tried command line backups, including using the --changed-files option to exercise that particular code path since it was changed.

Add readonly to EnumerationFilterResult.True and
EnumerationFilterResult.False.
@warwickmm
Copy link
Member

Sorry @dferreyra for the lack of attention here. This touches some parts of the code that I'm not very familiar with. Perhaps others (@kenkendk) might be able to provide some input when they have time.

In the mean time, do you want to merge in master so that the symlink tests from pull request #4311 are available in this branch?

Add unit tests to confirm that "follow" policy is used
when symlinks are explicitly selected for backup.
@dferreyra
Copy link
Contributor Author

@warwickmm, no problem. I've pushed the merge, plus an additional test to confirm that with the changes for this ticket we get a "follow" policy if a symlink was explicitly selected.

@dferreyra
Copy link
Contributor Author

dferreyra commented Oct 11, 2020

By the way, I was always wondering what would happen in the released Duplicati (I mean, without these changes) if someone were to try to restore files from within the directory symlink to a new location. My worry was that since the directory symlink was recorded by Duplicati as a symlink, it might end up overwriting files in the symlink target, instead. I was imagining Duplicati doing something like this during the restore:

  1. Create the symlink in the new location
  2. Restore the files into the symlink, thereby actually overwriting files in the symlink target instead

But that's not what happened, at least not in my simple experiment. Duplicati did this:

  1. Create the directory symlink as a regular directory in the new location
  2. Restore the files into that directory
  3. Try to create the symlink, but since there's already a regular directory there, Duplicati gives an error instead. In my experiment, I was restoring files from a directory symlink named "symlink" into a new location, "c:\restore":
2020-10-03 16:39:55 -07 - [Warning-Duplicati.Library.Main.Operation.RestoreHandler-MetadataWriteFailed]: Failed to apply metadata to file: "c:\restore\symlink\", message: File already exists: c:\restore\symlink
System.IO.IOException: File already exists: c:\restore\symlink
   at Duplicati.Library.Common.IO.SystemIOWindows.CreateSymlink(String symlinkfile, String target, Boolean asDir)
   at Duplicati.Library.Main.Operation.RestoreHandler.ApplyMetadata(String path, Stream stream, Boolean restorePermissions, Boolean restoreSymlinkMetadata, Boolean dryrun)
   at Duplicati.Library.Main.Operation.RestoreHandler.ApplyStoredMetadata(Options options, RestoreHandlerMetadataStorage metadatastorage)

Except for the warning, the current released Duplicati ends up behaving like a "follow" policy anyway, despite storing the symlink as a symlink.

@duplicatibot
Copy link

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/configuration-filters-include-and-symlink-policy-store-or-ignore/11608/11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants