Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX #6413 pip install <url> allow directory traversal #6418

Merged
merged 1 commit into from Jun 11, 2019

Conversation

gzpan123
Copy link
Contributor

@gzpan123 gzpan123 commented Apr 17, 2019

Fixes #6413.

@cjerdonek cjerdonek added C: download About fetching data from PyPI and other sources type: bugfix type: security Has potential security implications labels Apr 17, 2019
Copy link
Member

@cjerdonek cjerdonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the report and patch! Some comments.

news/6413.bugfix Outdated
@@ -0,0 +1 @@
Fixed pip install <url> allow directory traversal when a malicious server (or a network MitM if downloading over HTTP) send a Content-Disposition header with filename which contains "../ or ..\".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the imperative tone (so "Fix pip install <url> ..."): https://pip.pypa.io/en/stable/development/contributing/#contents-of-a-news-entry

You should also wrap lines to 80 characters and surround words that should be code with double backticks before and after (since it is reStructuredText).

_filename = params.get('filename')
if _filename:
_filename = os.path.basename(_filename)
filename = _filename or filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To increase testability, I would define a parse_content_disposition(headers, default_filename) function that returns the file name to use, and then add a couple unit tests of the function using @pytest.mark.parametrize.

}
session.get.return_value = resp

temp_dir = mkdtemp()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the pytest tmpdir fixture so you won't need to manually create and delete a temp dir.

temp_dir = mkdtemp()
temp_sub_dir = mkdtemp(dir=temp_dir, prefix="sub-dir-")
hashes = None
progress_bar = "on"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass these two variables as keyword argument to save two lines.

@@ -157,6 +157,43 @@ def test_unpack_http_url_bad_downloaded_checksum(mock_unpack_file):
rmtree(download_dir)


def test_download_http_url_with_directory_traversal(data):
"""
It should download file to temp_sub_dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more explicit in the docstring and say something like, "Test downloading a file when the content-disposition header contains a filename with a ".." path part."

type, params = cgi.parse_header(content_disposition)
filename = params.get('filename')
if filename:
filename = os.path.basename(filename).split("\\")[-1].rstrip(".")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this implementation, it might actually be better to add a second function (called something like sanitize_filename()) and put the bulk of your unit tests in the unit tests of that function. Also, why is it necessary to split on \ after calling basename(), and why splitting only with that character as opposed to others? Some code comments are probably warranted here explaining what we are protecting against. Lastly, what about left-stripping ., too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added .split("\\")[-1] because os.path.basename("dir\\file") return "dir\file" in linux.
Later I found out that "dir \ file" is a valid filename in linux. so .split("\\")[-1] can be removed.

I added .rstrip(".") because os.path.basename("dir/..") return ".." and ".." is not a valid filename,
But only ".." can not lead to directory traversal. so ``.rstrip(".")``` can also be removed.

So only os.path.basename(filename) could fix this security issue already, so I decided to remove .split("\\")[-1] and .rstrip("."), is this ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't something like that still needed for Windows, though?

Copy link
Contributor Author

@gzpan123 gzpan123 Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In windows both os.path.basename("dir\\file") and os.path.basename("dir/file") return "file".
There is no other thing need to do in Windows, I think.

Copy link
Member

@cjerdonek cjerdonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your updates (and I agree with using os.path.basename()). Some more comments.

src/pip/_internal/download.py Outdated Show resolved Hide resolved
news/6413.bugfix Outdated Show resolved Hide resolved
src/pip/_internal/download.py Outdated Show resolved Hide resolved
tests/unit/test_download.py Outdated Show resolved Hide resolved
src/pip/_internal/download.py Outdated Show resolved Hide resolved
Copy link
Member

@cjerdonek cjerdonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final(?) comments. Thanks for your work on this.

src/pip/_internal/download.py Outdated Show resolved Hide resolved
src/pip/_internal/download.py Outdated Show resolved Hide resolved
src/pip/_internal/download.py Outdated Show resolved Hide resolved
tests/unit/test_download.py Outdated Show resolved Hide resolved
src/pip/_internal/download.py Show resolved Hide resolved
tests/unit/test_download.py Outdated Show resolved Hide resolved
@gzpan123
Copy link
Contributor Author

gzpan123 commented May 4, 2019

any problems?

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label May 10, 2019
@cjerdonek
Copy link
Member

any problems?

No, just haven't gotten to this. But looks like you need to do a rebase in the meantime..

@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label May 11, 2019
@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Jun 4, 2019
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Jun 4, 2019
@gzpan123 gzpan123 force-pushed the master branch 2 times, most recently from a2ea2a2 to 42dfa4f Compare June 9, 2019 10:00
@cjerdonek
Copy link
Member

@gzpan123 I made some formatting and wording tweaks before planning to merge this, if that's okay.

@cjerdonek
Copy link
Member

Thanks so much for all your work on this, @gzpan123, and for sticking with it! 👍

@cjerdonek
Copy link
Member

@gzpan123 In the course of reviewing your PR, I noticed a related issue that looks like it could use fixing. It seems like the "default" filename (from the Link object) could also result in directory traversal. For example:

>>> from pip._internal.models.link import Link
>>> link = Link('https://example.com/..%2Ffoo.txt')
>>> link.filename
'../foo.txt'

This is because Link.filename unquotes its basename before returning:

@property
def filename(self):
# type: () -> str
_, netloc, path, _, _ = urllib_parse.urlsplit(self.url)
name = posixpath.basename(path.rstrip('/')) or netloc
name = urllib_parse.unquote(name)
assert name, ('URL %r produced no filename' % self.url)
return name

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jul 11, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jul 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: download About fetching data from PyPI and other sources type: security Has potential security implications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pip install <url> allow directory traversal, leading to arbitrary file write
4 participants