Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize paths during archive creation and extraction #7108

Merged
merged 4 commits into from Jun 10, 2023

Conversation

pgerber
Copy link
Contributor

@pgerber pgerber commented Oct 30, 2022

Paths are not always normalized when creating an archive and, more importantly, never when extracting one. The following example shows how this can be used to attempt to write a file outside the extraction directory:

$ echo abcdef | borg create -r ~/borg/a --stdin-name x/../../../../../etc/shadow archive-1 - $ borg list -r ~/borg/a archive-1
-rw-rw---- root   root          7 Sun, 2022-10-23 19:14:27  x/../../../../../etc/shadow
$ mkdir borg/target
$ cd borg/target
$ borg extract -r ~/borg/a archive-1
x/../../../../../etc/shadow: makedirs: [Errno 13] Permission denied: '/home/user/borg/target/x/../../../../../etc'

Note that borg tries to extract the file to /etc/shadow and the permission error is a result of the user not having access.

This patch ensures file names are normalized before archiving. As for files extracted from the archive, paths are normalized by making all paths relative, removing './', and removing superfluous slashes (as in '//'). '..' elments, however, are rejected outright. The reasoning here is that it is easy to start a path with './' or insert a '//' by accident (e.g. via --stdin-name or import-tar). '..', however, seem unlikely to be the result of an accident and could indicate a tampered repository.

With paths being normalized as they are being read, this "errors" will be corrected during the borg transfer required when upgrading to Borg 2. Hence, the normalization, when reading the archive, can be removed once support for reading v1 repositories is dropped. v2 repository will not contain non-normalized paths.

I recommend treating this as a security issue. I see the following cases where extracting a file outside the extraction path could constitute a security risk:

a) When extraction is done as a different user than archive creation. The user that created the archive may be able to get a file overwritten as a different user.
b) When the archive is created on one host and extracted on another. The user that created the archive may be able to get a file overwritten on another host.
c) When an archive is created and extracted after a OS reinstall. When a host is suspected compromised, it is common to reinstall (or set up a new machine), extract the backups and then evaluate their integrity. A user that manipulates the archive before such a reinstall may be able to get a file overwritten outside the extraction path and may evade integrity checks.

Notably absent is the creation and extraction on the same host as the same user. In such case, an adversary must be assumed to be able to replace any file directly.

This also (partially) fixes #7099.

@codecov-commenter
Copy link

codecov-commenter commented Oct 30, 2022

Codecov Report

Merging #7108 (518c4fb) into master (9341cd0) will increase coverage by 0.05%.
The diff coverage is 92.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master    #7108      +/-   ##
==========================================
+ Coverage   83.84%   83.89%   +0.05%     
==========================================
  Files          66       66              
  Lines       11802    11832      +30     
  Branches     2140     2144       +4     
==========================================
+ Hits         9895     9927      +32     
+ Misses       1343     1339       -4     
- Partials      564      566       +2     
Impacted Files Coverage Δ
src/borg/helpers/fs.py 83.89% <82.60%> (-0.32%) ⬇️
src/borg/archive.py 84.25% <100.00%> (-0.02%) ⬇️
src/borg/archiver/_common.py 90.65% <100.00%> (-0.40%) ⬇️
src/borg/archiver/create_cmd.py 78.57% <100.00%> (+0.06%) ⬆️
src/borg/helpers/__init__.py 100.00% <100.00%> (ø)
src/borg/helpers/parseformat.py 91.22% <100.00%> (+0.39%) ⬆️

... and 1 file with indirect coverage changes

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Nov 3, 2022

@pgerber good catch, thanks for finding/fixing this.

i agree that there are some potential security issues related to this, but I think they are low severity (compared to tar, zip, etc.) because borg repos/archives usually do not come from unknown/untrusted sources and also borg is not commonly used for data transfer. usually, you either made the repo/archive by yourself or it was made by some fellow admin of yours.

but, as you pointed out, some risks remain.

btw, i noticed quite some typos in your commit comment, can you proofread it / fix them?

@ThomasWaldmann
Copy link
Member

With paths being normalized as they are being read, this "errors" will be corrected during
the borg trasfer required when upgrading to Borg 2. Hence, the normalization, when reading
the archive, can be removed once support for reading v1 repositories is dropped. v2 repository
will not contain non-normalized paths.

Guess normalisation could be only removed if we would always trust the archive contents.
But considering the misc. security-related scenarios you pointed out, we should never trust.

Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

src/borg/archive.py Show resolved Hide resolved
src/borg/helpers/fs.py Outdated Show resolved Hide resolved
src/borg/helpers/fs.py Show resolved Hide resolved
src/borg/helpers/fs.py Outdated Show resolved Hide resolved
src/borg/helpers/fs.py Outdated Show resolved Hide resolved
src/borg/helpers/msgpack.py Outdated Show resolved Hide resolved
src/borg/testsuite/archive.py Outdated Show resolved Hide resolved
src/borg/testsuite/helpers.py Show resolved Hide resolved
src/borg/helpers/msgpack.py Outdated Show resolved Hide resolved
src/borg/helpers/msgpack.py Outdated Show resolved Hide resolved
@pgerber
Copy link
Contributor Author

pgerber commented Nov 6, 2022

With paths being normalized as they are being read, this "errors" will be corrected during
the borg trasfer required when upgrading to Borg 2. Hence, the normalization, when reading
the archive, can be removed once support for reading v1 repositories is dropped. v2 repository
will not contain non-normalized paths.

Guess normalisation could be only removed if we would always trust the archive contents. But considering the misc. security-related scenarios you pointed out, we should never trust.

I wasn't very clear. I meant normalization could be removed and replaced by strict checking. In other words, we can switch from to_normalized_path() to assert_normalized_path().

@pgerber
Copy link
Contributor Author

pgerber commented Nov 6, 2022

@pgerber good catch, thanks for finding/fixing this.

i agree that there are some potential security issues related to this, but I think they are low severity (compared to tar, zip, etc.) because borg repos/archives usually do not come from unknown/untrusted sources and also borg is not commonly used for data transfer. usually, you either made the repo/archive by yourself or it was made by some fellow admin of yours.

but, as you pointed out, some risks remain.

Agreed. This is also something that isn't covered by the current Attack Model. I wonder, though, if it should be covered (for Borg 2). Even if a minor issue, having it in there would remind developers not to blindly trust archive/repository content.

(I'm going ignoring the fact there is a --encryption {none|authenticated} and the Attack Model should probably mention that such repositories aren't cover by it.)

@pgerber pgerber force-pushed the dotdot branch 2 times, most recently from e39a299 to 64fc2e1 Compare November 13, 2022 17:19
@pgerber
Copy link
Contributor Author

pgerber commented Jan 4, 2023

btw, as we also use os.path.normpath (which implements "normalizing a path"), the names here can be a bit confusing.

I moved from "normalize" to "sanitize". This avoids the conflict in the naming and, I believe, also more clearly expresses what's being done.

Let me know if there is anything else that needs to be done for this to get merged.

@pgerber pgerber changed the title Normalize paths during archive creation and extraction Sanitize paths during archive creation and extraction Jan 4, 2023
Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another review.

btw, please make sure that all places that need to be removed or edited when removing borg1 archive reading or other legacy, mention legacy in a comment.

Comment on lines +196 to +252
path = _dotdot_re.sub("", path)
if path in ["", ".."]:
return "."
return path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess the main issue remaining here is that the code removes arbitrary amounts of leading ../.
So we basically archive something and lose information about its original path.

We won't remember whether it was ../foo or ../../foo or whatever other upwards directory level.

The original code before your PR also had this issue, but guess if we clean it up, we should do it right.

I guess the only context where this could make sense is if we want "no warnings / no errors" borg1 archive transfer and accept that information loss (preferring it over introducing some security issue into borg2 archives).

So, in case this is only meant for borg transfer, maybe it could be put there so it won't stay here forever?

For borg2 "borg create", I guess we rather want to reject if something starts with ../.

src/borg/helpers/fs.py Outdated Show resolved Hide resolved
src/borg/helpers/fs.py Outdated Show resolved Hide resolved
src/borg/helpers/fs.py Outdated Show resolved Hide resolved
src/borg/item.pyx Show resolved Hide resolved
src/borg/testsuite/archiver/tar_cmds.py Outdated Show resolved Hide resolved
src/borg/testsuite/archiver/tar_cmds.py Outdated Show resolved Hide resolved
Comment on lines +354 to +406
self.assert_equal(remove_dotdot_prefixes("/foo/bar"), "foo/bar")
self.assert_equal(remove_dotdot_prefixes("../foo/bar"), "foo/bar")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripping "./" and "/" is the usual thing borg does.

When listing (or extracting), we'll know it either refers to "/" or to "." (which is already ambiguous, but guess it is what tar does and attic/borg copied that behaviour).

If we just strip arbitrary amounts of "../", we basically have no clue any more about the directory level on which it was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but I think this doesn't just apply to ../../../. Even for a simple ., we don't know were it belongs. For relative paths, users are expected to know where the files belong.

I do use relative path in some places like for DB dumps. Those you have to restore into the DB and their "origin" location doesn't matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When listing (or extracting), we'll know it either refers to "/" or to "." (which is already ambiguous, but guess it is what tar does and attic/borg copied that behaviour).

Talking of tar, GNU tar prints a warning when it strips anything from the path. The same could be considered for borg except that we shouldn't complain when a single / at the beginning of a path is removed.

user@dev-borg:~/src/borg$ tar cf tmp.tar ../borg-upstream/
tar: Removing leading `../' from member names

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if you assume that the cwd at the time of the borg invocation is known (because it is part of the backup script), we know what . referred to.

But if borg gets fed all sorts of ../*N (e.g. via stdin) and it removes them all, then we can't reconstruct the location any more. In case of db dumps it might not matter, but in many other cases it might.

BTW, macOS tar:

tw@mba2020 ~ % tar cvf foo.tar ../tw/foo.txt ../../etc/hosts
a ../tw/foo.txt
a ../../etc/hosts
tw@mba2020 ~ % tar tvf foo.tar 
-rw-r--r--  0 tw     staff     361 Jul 21  2021 ../tw/foo.txt
-rw-r--r--  0 root   wheel     264 Dec 26 18:59 ../../etc/hosts
tw@mba2020 ~ % tar xvf foo.tar 
x ../tw/foo.txt: Path contains '..'
x ../../etc/hosts: Path contains '..'
tar: Error exit delayed from previous errors.
# note: it did not extract these files.

src/borg/testsuite/helpers.py Show resolved Hide resolved
src/borg/testsuite/helpers.py Show resolved Hide resolved
@pgerber
Copy link
Contributor Author

pgerber commented Jan 13, 2023

Well, if you assume that the cwd at the time of the borg invocation is known (because it is part of the backup script), we know what . referred to.

But if borg gets fed all sorts of ../*N (e.g. via stdin) and it removes them all, then we can't reconstruct the location any more. In case of db dumps it might not matter, but in many other cases it might.

True, I did not consider the possibility of multiple ../* paths. So, the following combination would be possible:

../../db_backup/ ../vm_snapshots/ ../../documents/

We'd end up with db_backup, vm_snapshots and documents in the backup and we can tell their source.

../a/etc ../b/*

Not ideal as b could contain another etc. But everything else we know where it comes from.

../a/etc ../b/etc

We'd end up with two etc directories and content from both directories being intermixed and some files might even exist twice.

../a/* ../b/*

All bets are off.

Still, this examples look very contrived to me. One would have to end up being in a CWD where you'd have to use ../ to reach a backup location and decide not to use an absolute path or simply change the CWD. Then, once you settled for using ../, you'd have to use a wildcard or have two files/directory with the same name. Additionally, you'd probably have to neglect looking at the resulting backup (e.g. via borg list). This should be easy to spot. And, finally, you'd likely not try if you can restore the backup

I see how one ends up not doing the last item in the list but everything else I find hardly probable. I still think a simple warning would be sufficient for ../. Not sure if we should do anything about the possibility of a file existing twice in the same archive. Playing around with Borg, the fact that borg extract silently overwrites existing files or the fact that a permission denied is only considered a warning are far more real concerns, IMO. In fact, I do ignore all warning as I can't silent some innocuous warning that a files content changed while reading.

w@mba2020 ~ % tar cvf foo.tar ../tw/foo.txt ../../etc/hosts
a ../tw/foo.txt
a ../../etc/hosts
tw@mba2020 ~ % tar tvf foo.tar 
-rw-r--r--  0 tw     staff     361 Jul 21  2021 ../tw/foo.txt
-rw-r--r--  0 root   wheel     264 Dec 26 18:59 ../../etc/hosts

This is interesting I guess this would mean that importing a tar file could break as well when ../ were to be banned. On GNU tar, you need to specifically request this behavior.

@pgerber pgerber force-pushed the dotdot branch 2 times, most recently from f24d201 to df5f895 Compare January 13, 2023 22:21
@pgerber pgerber force-pushed the dotdot branch 2 times, most recently from 65afe29 to 7b1f15a Compare February 1, 2023 21:23
@pgerber
Copy link
Contributor Author

pgerber commented Feb 1, 2023

Rebased. I had to disable test_create_read_special_symlink() locally though. Looking at that test there are ways this test can hang indefinitely. Working on fixing it …

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 10, 2023

Rebased on current master, squashed commits, no other changes.

I also removed some minor / not relevant any more comments from here.

@borgbackup borgbackup deleted a comment from pgerber May 11, 2023
@ThomasWaldmann
Copy link
Member

rebased on master again, no other changes (yet).

Paths are not always sanitized when creating an archive and,
more importantly, never when extracting one. The following example
shows how this can be used to attempt to write a file outside the
extraction directory:

$ echo abcdef | borg create -r ~/borg/a --stdin-name x/../../../../../etc/shadow archive-1 -
$ borg list -r ~/borg/a archive-1
-rw-rw---- root   root          7 Sun, 2022-10-23 19:14:27  x/../../../../../etc/shadow
$ mkdir borg/target
$ cd borg/target
$ borg extract -r ~/borg/a archive-1
x/../../../../../etc/shadow: makedirs: [Errno 13] Permission denied: '/home/user/borg/target/x/../../../../../etc'

Note that Borg tries to extract the file to /etc/shadow and the
permission error is a result of the user not having access.

This patch ensures file names are sanitized before archiving.
As for files extracted from the archive, paths are sanitized
by making all paths relative, removing '.' elements, and removing
superfluous slashes (as in '//'). '..' elements, however, are
rejected outright. The reasoning here is that it is easy to start
a path with './' or insert a '//' by accident (e.g. via --stdin-name
or import-tar). '..', however, seem unlikely to be the result
of an accident and could indicate a tampered repository.

With paths being sanitized as they are being read, this "errors"
will be corrected during the `borg transfer` required when upgrading
to Borg 2. Hence, the sanitation, when reading the archive,
can be removed once support for reading v1 repositories is dropped.
V2 repository will not contain non-sanitized paths. Of course,
a check for absolute paths and '..' elements needs to kept in
place to detect tempered archives.

I recommend treating this as a security issue. I see the following
cases where extracting a file outside the extraction path could
constitute a security risk:

a) When extraction is done as a different user than archive
creation. The user that created the archive may be able to
get a file overwritten as a different user.
b) When the archive is created on one host and extracted on
another. The user that created the archive may be able to
get a file overwritten on another host.
c) When an archive is created and extracted after a OS reinstall.
When a host is suspected compromised, it is common to reinstall
(or set up a new machine), extract the backups and then evaluate
their integrity. A user that manipulates the archive before such
a reinstall may be able to get a file overwritten outside the
extraction path and may evade integrity checks.

Notably absent is the creation and extraction on the same host as
the same user. In such case, an adversary must be assumed to be able
to replace any file directly.

This also (partially) fixes borgbackup#7099.
on windows, we also want slashes, not backslashes.
Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this, took long enough.

In case we find issues in beta testing, they can be addressed later.

@ThomasWaldmann ThomasWaldmann merged commit aca2021 into borgbackup:master Jun 10, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

borg import-tar does not strip leading ./ from filenames, making mount access impossible
3 participants