New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitize paths during archive creation and extraction #7108
Conversation
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## master #7108 +/- ##
==========================================
+ Coverage 83.84% 83.89% +0.05%
==========================================
Files 66 66
Lines 11802 11832 +30
Branches 2140 2144 +4
==========================================
+ Hits 9895 9927 +32
+ Misses 1343 1339 -4
- Partials 564 566 +2
|
@pgerber good catch, thanks for finding/fixing this. i agree that there are some potential security issues related to this, but I think they are low severity (compared to tar, zip, etc.) because borg repos/archives usually do not come from unknown/untrusted sources and also borg is not commonly used for data transfer. usually, you either made the repo/archive by yourself or it was made by some fellow admin of yours. but, as you pointed out, some risks remain. btw, i noticed quite some typos in your commit comment, can you proofread it / fix them? |
Guess normalisation could be only removed if we would always trust the archive contents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments
I wasn't very clear. I meant normalization could be removed and replaced by strict checking. In other words, we can switch from |
Agreed. This is also something that isn't covered by the current Attack Model. I wonder, though, if it should be covered (for Borg 2). Even if a minor issue, having it in there would remind developers not to blindly trust archive/repository content. (I'm going ignoring the fact there is a |
e39a299
to
64fc2e1
Compare
I moved from "normalize" to "sanitize". This avoids the conflict in the naming and, I believe, also more clearly expresses what's being done. Let me know if there is anything else that needs to be done for this to get merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another review.
btw, please make sure that all places that need to be removed or edited when removing borg1 archive reading or other legacy, mention legacy in a comment.
path = _dotdot_re.sub("", path) | ||
if path in ["", ".."]: | ||
return "." | ||
return path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess the main issue remaining here is that the code removes arbitrary amounts of leading ../
.
So we basically archive something and lose information about its original path.
We won't remember whether it was ../foo
or ../../foo
or whatever other upwards directory level.
The original code before your PR also had this issue, but guess if we clean it up, we should do it right.
I guess the only context where this could make sense is if we want "no warnings / no errors" borg1 archive transfer and accept that information loss (preferring it over introducing some security issue into borg2 archives).
So, in case this is only meant for borg transfer
, maybe it could be put there so it won't stay here forever?
For borg2 "borg create", I guess we rather want to reject if something starts with ../
.
self.assert_equal(remove_dotdot_prefixes("/foo/bar"), "foo/bar") | ||
self.assert_equal(remove_dotdot_prefixes("../foo/bar"), "foo/bar") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stripping "./" and "/" is the usual thing borg does.
When listing (or extracting), we'll know it either refers to "/" or to "." (which is already ambiguous, but guess it is what tar does and attic/borg copied that behaviour).
If we just strip arbitrary amounts of "../", we basically have no clue any more about the directory level on which it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but I think this doesn't just apply to ../../../. Even for a simple ., we don't know were it belongs. For relative paths, users are expected to know where the files belong.
I do use relative path in some places like for DB dumps. Those you have to restore into the DB and their "origin" location doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When listing (or extracting), we'll know it either refers to "/" or to "." (which is already ambiguous, but guess it is what tar does and attic/borg copied that behaviour).
Talking of tar, GNU tar prints a warning when it strips anything from the path. The same could be considered for borg except that we shouldn't complain when a single / at the beginning of a path is removed.
user@dev-borg:~/src/borg$ tar cf tmp.tar ../borg-upstream/
tar: Removing leading `../' from member names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if you assume that the cwd at the time of the borg invocation is known (because it is part of the backup script), we know what .
referred to.
But if borg gets fed all sorts of ../
*N (e.g. via stdin) and it removes them all, then we can't reconstruct the location any more. In case of db dumps it might not matter, but in many other cases it might.
BTW, macOS tar:
tw@mba2020 ~ % tar cvf foo.tar ../tw/foo.txt ../../etc/hosts
a ../tw/foo.txt
a ../../etc/hosts
tw@mba2020 ~ % tar tvf foo.tar
-rw-r--r-- 0 tw staff 361 Jul 21 2021 ../tw/foo.txt
-rw-r--r-- 0 root wheel 264 Dec 26 18:59 ../../etc/hosts
tw@mba2020 ~ % tar xvf foo.tar
x ../tw/foo.txt: Path contains '..'
x ../../etc/hosts: Path contains '..'
tar: Error exit delayed from previous errors.
# note: it did not extract these files.
True, I did not consider the possibility of multiple ../* paths. So, the following combination would be possible:
We'd end up with db_backup, vm_snapshots and documents in the backup and we can tell their source.
Not ideal as b could contain another etc. But everything else we know where it comes from.
We'd end up with two etc directories and content from both directories being intermixed and some files might even exist twice.
All bets are off. Still, this examples look very contrived to me. One would have to end up being in a CWD where you'd have to use ../ to reach a backup location and decide not to use an absolute path or simply change the CWD. Then, once you settled for using ../, you'd have to use a wildcard or have two files/directory with the same name. Additionally, you'd probably have to neglect looking at the resulting backup (e.g. via I see how one ends up not doing the last item in the list but everything else I find hardly probable. I still think a simple warning would be sufficient for ../. Not sure if we should do anything about the possibility of a file existing twice in the same archive. Playing around with Borg, the fact that
This is interesting I guess this would mean that importing a tar file could break as well when ../ were to be banned. On GNU tar, you need to specifically request this behavior. |
f24d201
to
df5f895
Compare
65afe29
to
7b1f15a
Compare
Rebased. I had to disable test_create_read_special_symlink() locally though. Looking at that test there are ways this test can hang indefinitely. Working on fixing it … |
fc06fa5
to
0702add
Compare
Rebased on current master, squashed commits, no other changes. I also removed some minor / not relevant any more comments from here. |
rebased on master again, no other changes (yet). |
1fb0cd4
to
12dcd37
Compare
Paths are not always sanitized when creating an archive and, more importantly, never when extracting one. The following example shows how this can be used to attempt to write a file outside the extraction directory: $ echo abcdef | borg create -r ~/borg/a --stdin-name x/../../../../../etc/shadow archive-1 - $ borg list -r ~/borg/a archive-1 -rw-rw---- root root 7 Sun, 2022-10-23 19:14:27 x/../../../../../etc/shadow $ mkdir borg/target $ cd borg/target $ borg extract -r ~/borg/a archive-1 x/../../../../../etc/shadow: makedirs: [Errno 13] Permission denied: '/home/user/borg/target/x/../../../../../etc' Note that Borg tries to extract the file to /etc/shadow and the permission error is a result of the user not having access. This patch ensures file names are sanitized before archiving. As for files extracted from the archive, paths are sanitized by making all paths relative, removing '.' elements, and removing superfluous slashes (as in '//'). '..' elements, however, are rejected outright. The reasoning here is that it is easy to start a path with './' or insert a '//' by accident (e.g. via --stdin-name or import-tar). '..', however, seem unlikely to be the result of an accident and could indicate a tampered repository. With paths being sanitized as they are being read, this "errors" will be corrected during the `borg transfer` required when upgrading to Borg 2. Hence, the sanitation, when reading the archive, can be removed once support for reading v1 repositories is dropped. V2 repository will not contain non-sanitized paths. Of course, a check for absolute paths and '..' elements needs to kept in place to detect tempered archives. I recommend treating this as a security issue. I see the following cases where extracting a file outside the extraction path could constitute a security risk: a) When extraction is done as a different user than archive creation. The user that created the archive may be able to get a file overwritten as a different user. b) When the archive is created on one host and extracted on another. The user that created the archive may be able to get a file overwritten on another host. c) When an archive is created and extracted after a OS reinstall. When a host is suspected compromised, it is common to reinstall (or set up a new machine), extract the backups and then evaluate their integrity. A user that manipulates the archive before such a reinstall may be able to get a file overwritten outside the extraction path and may evade integrity checks. Notably absent is the creation and extraction on the same host as the same user. In such case, an adversary must be assumed to be able to replace any file directly. This also (partially) fixes borgbackup#7099.
on windows, we also want slashes, not backslashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this, took long enough.
In case we find issues in beta testing, they can be addressed later.
Paths are not always normalized when creating an archive and, more importantly, never when extracting one. The following example shows how this can be used to attempt to write a file outside the extraction directory:
Note that borg tries to extract the file to /etc/shadow and the permission error is a result of the user not having access.
This patch ensures file names are normalized before archiving. As for files extracted from the archive, paths are normalized by making all paths relative, removing './', and removing superfluous slashes (as in '//'). '..' elments, however, are rejected outright. The reasoning here is that it is easy to start a path with './' or insert a '//' by accident (e.g. via --stdin-name or import-tar). '..', however, seem unlikely to be the result of an accident and could indicate a tampered repository.
With paths being normalized as they are being read, this "errors" will be corrected during the
borg transfer
required when upgrading to Borg 2. Hence, the normalization, when reading the archive, can be removed once support for reading v1 repositories is dropped. v2 repository will not contain non-normalized paths.I recommend treating this as a security issue. I see the following cases where extracting a file outside the extraction path could constitute a security risk:
a) When extraction is done as a different user than archive creation. The user that created the archive may be able to get a file overwritten as a different user.
b) When the archive is created on one host and extracted on another. The user that created the archive may be able to get a file overwritten on another host.
c) When an archive is created and extracted after a OS reinstall. When a host is suspected compromised, it is common to reinstall (or set up a new machine), extract the backups and then evaluate their integrity. A user that manipulates the archive before such a reinstall may be able to get a file overwritten outside the extraction path and may evade integrity checks.
Notably absent is the creation and extraction on the same host as the same user. In such case, an adversary must be assumed to be able to replace any file directly.
This also (partially) fixes #7099.