Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLInputSource can be abused to retrieve arbitrary documents if used naïvely #1844

Closed
aucampia opened this issue Apr 17, 2022 Discussed in #1543 · 23 comments · Fixed by #2270 · May be fixed by #1385
Closed

URLInputSource can be abused to retrieve arbitrary documents if used naïvely #1844

aucampia opened this issue Apr 17, 2022 Discussed in #1543 · 23 comments · Fixed by #2270 · May be fixed by #1385
Assignees
Labels

Comments

@aucampia
Copy link
Member

Discussed in #1543

Originally posted by alexdutton July 20, 2021
This is mostly related to rdflib-jsonld, but the dereferencing implementation is in rdflib, hence raising it here.

Scenario

If a web service takes POSTed JSON-LD data, e.g. as part of a Linked Data Notifications implementation, rdflib will attempt to resolve any URL in the @context. This can lead to:

  • attackers being able to probe internal networks, by having rdflib request potential non-public URLs
  • reflection attacks, if the same or slightly-different URLs are repeated multiple times in the @context
  • resource exhaustion, as the entire remote file is loaded into memory before JSON parsing is attempted (admittedly an rdflib-jsonld issue)
  • denial of service, if web or task workers are tied up waiting for extended periods for HTTP requests to complete
  • attackers being able to probe the local filesystem using file:// URLs

Problem

rdflib provides no way to control how external references are resolved, nor a way to implement caching of external resources.

An implementor should be able to:

  • add URLs to a safelist, if e.g. they only expect certain JSON-LD contexts to be used
  • provide local copies of remote resources, to obviate needing to make HTTP requests
  • hook in a caching mechanism

These things should either be possible directly, or there should be an obvious way to hook them in.

Resolution

A new Resolver base class should be added that takes responsibility for resolving external references and returning InputSource instances, probably encapsulating the create_input_source() behaviour in a resolve() method. There should be a default implementation that resolves everything called e.g. DefaultResolver. Maybe this resolver has an instantiation parameter like resolve_schemes=('file', 'http', 'https') so it's easy to turn off dereferencing.

An optional resolver argument should be added to Graph.parse(), so that implementors can override the default behaviour. This is then passed down to the Parser.parse() plugin implementation, defaulting to an instance of DefaultResolver if not specified.

Finally, rdflib-jsonld can be updated to use the resolver instead of create_input_source directly.

Maybe there should also be a way to install a global default resolver to easily implement these protections without having to track down every Graph.parse() call.

Happy to put together a PR if/when an approach is agreed.

@aucampia
Copy link
Member Author

aucampia commented Apr 17, 2022

Converted back to an issue from discussion, this is tracked here: https://security.snyk.io/vuln/SNYK-PYTHON-RDFLIB-1324490

We should resolve this some way or another, either by making a change to rdflib or disputing the vulnerability.

@aucampia aucampia added bug Something isn't working security and removed 6.0.x bug Something isn't working labels Apr 17, 2022
@aucampia
Copy link
Member Author

aucampia commented Apr 17, 2022

"critical" label transfers from the original issue.

@aucampia aucampia linked a pull request Apr 18, 2022 that will close this issue
@ghost
Copy link

ghost commented Apr 18, 2022

FTR, Alex Dutton had worked up an implementation that seems to follow your intention: https://github.com/alexdutton/rdflib/commits/fix/1369-custom-resolver, dunno if rebasing it to current master will save some time.

@RichardWallis
Copy link
Contributor

Any progress on this?

Getting pushback as to the advisability of using rdflib, based upon the lack of resolution to this issue.

"We should resolve this some way or another, either by making a change to rdflib or disputing the vulnerability." - agree.

@aucampia
Copy link
Member Author

Nobody is working on this at the moment. A draft PR was made for this, but it is not in a mergable state and also reduces the overall quality of RDFLib. We are open to PRs to fix this though and I understand it is a priority, but right now the best I can say is that I will look into it when I have time, and will try get it fixed before the next release which will be in october.

@RichardWallis
Copy link
Contributor

Thanks @aucampia for this update.

I would have had a go at a PR myself but I'm not familiar enough with inner workings and idiosyncrasies of rdflib to be able to quickly dive in.

Reading the snyk Issue, the fix to me would look like the ability to initialise rdflib [or at least the JSON-LD bits of it] in a secure mode that would not connect externally to URLs in any form; with ability to set one or more whitelist URLs that it would connect to.

I'm wondering if the original PR, although attempting something similar, was trying to cover too much ground.

@aucampia
Copy link
Member Author

For reference:

The PR did try to do a lot, but a lot has to be done, through it has to be done in a way that does not deteriorate the codebase or create liabilities, and there are some considerations outside of what the PR addressed that should also be addressed.

@aucampia
Copy link
Member Author

What would be very helpful for resolving this issue is some prior art of how this is addressed in other RDF processors, I will try and make a bit of a survey.

But I also think we should have an ADR for this before a published PR.

@nileshpatra
Copy link

nileshpatra commented Aug 19, 2022

Hi, this is triggering a number of testing-autoremovals in debian, as there are a number of packages that depend on rdflib and this is a rather serious issue. I know work you on rdflib in 'volunteer' time, but would it be possible to give any sort of ETA for this?

I'm asking for it since I'm a bit concerned if it'd take too much time then it'd end up affecting a bunch of reverse-dependencies in next debian release, to the point that they'd end up being excluded, and hence the ping.

@aucampia
Copy link
Member Author

aucampia commented Aug 19, 2022

I know work you on rdflib in 'volunteer' time

The concept of "volunteer time" may be a bit ambiguous, but I work on RDFLib exclusively during personal time, not sure about other maintainers.

but would it be possible to give any sort of ETA for this?

I'm asking for it since I'm a bit concerned if it'd take too much time then it'd end up affecting a bunch of reverse-dependencies in next debian release, to the point that they'd end up being excluded, and hence the ping.

I will do my best to fix it before the next release and that is planned for middle of October, but I'm not sure what Debian's release schedule is, so would be good to understand if this timeline is very problematic.

@nileshpatra
Copy link

nileshpatra commented Aug 19, 2022 via email

@aucampia aucampia added the networking Related to networking. label Aug 21, 2022
CasperWA added a commit to EMMC-ASBL/oteapi-dlite that referenced this issue Aug 22, 2022
TEAM4-0 pushed a commit to EMMC-ASBL/oteapi-dlite that referenced this issue Aug 22, 2022
* Update mkdocs-material requirement from ~=8.3 to ~=8.4

Updates the requirements on [mkdocs-material](https://github.com/squidfunk/mkdocs-material) to permit the latest version.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](squidfunk/mkdocs-material@8.3.0...8.4.0)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Ignore local package import for pylint tests

* Ignore 48547 in safety check

See RDFLib/rdflib#1844 for more information.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Casper Welzel Andersen <casper.w.andersen@sintef.no>
CasperWA added a commit to EMMC-ASBL/oteapi-dlite that referenced this issue Aug 24, 2022
Update dependencies:

* Update mkdocs-material requirement from ~=8.3 to ~=8.4 (#65)

Ignore local package import for pylint tests.

Ignore 48547 in safety check.
See RDFLib/rdflib#1844 for more information.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Casper Welzel Andersen <casper.w.andersen@sintef.no>
@aucampia aucampia self-assigned this Sep 7, 2022
@tillea
Copy link

tillea commented Sep 29, 2022

Currently freeze is scheduled from next year Jan. Basically, if it isn't fixed before this years christmas, it'd be too late.

However, rdflib with RC bugs creates a lot of noise in the Debian package pool and removes heaps of packages from Debian testing. So it would be great if this could be fixed rather sooner than later. So if you might have a patch or some pre-release we might be interested.
Thanks a lot for working on rdflib, Andreas.

@aucampia
Copy link
Member Author

I have started working on this and I do expect I will be finished around the middle of October which is the target for the next release. I have been a bit busy with other things so I have had limited time to dedicate to rdflib maintenance.

@aucampia
Copy link
Member Author

quoting #2038 (comment)

I'm on leave next week (2022-W41), I'm going to finish whatever is needed for #1844 and try and fix as many bugs as possible, and then make the release on 2022-10-23 if all goes well.

The release will be 6.3.0 as I won't include any breaking changes.

@nileshpatra
Copy link

I'm on leave next week (2022-W41), I'm going to finish whatever is needed for #1844 and try and fix as many bugs as possible, and then make the release on 2022-10-23 if all goes well.

Hi, Is there still under your radar? I am asking this since we are a bit over 23-10-2022 now.

@aucampia
Copy link
Member Author

I'm on leave next week (2022-W41), I'm going to finish whatever is needed for #1844 and try and fix as many bugs as possible, and then make the release on 2022-10-23 if all goes well.

Hi, Is there still under your radar? I am asking this since we are a bit over 23-10-2022 now.

Hi, workload at my day job is quite high, I'm trying but I really have very little spare capacity for now I can't really give you any good answers.

@tillea
Copy link

tillea commented Oct 28, 2022 via email

@aucampia aucampia mentioned this issue Jan 16, 2023
8 tasks
@aucampia
Copy link
Member Author

I have added a bunch of tests now, and have written a URI mapper and filter, but also, while trying to find the optimal place to hook these things in, I have found urllib.request.install_opener(opener)

Install an OpenerDirector instance as the default global opener. Installing an opener is only necessary if you want urlopen to use that opener; otherwise, simply call OpenerDirector.open() instead of urlopen(). The code does not check for a real OpenerDirector, and any class with the appropriate interface will work.

Given this I actually don't think this is a valid vulnerability, and instead is just a feature request, as anyone can easily install an opener that prevents everything reported in the vulnerability:

If a web service takes POSTed JSON-LD data, rdflib will attempt to resolve any URL in the @context. This can lead to:

  • attackers being able to probe internal networks, by having rdflib request potential non-public URLs
  • reflection attacks, if the same or slightly-different URLs are repeated multiple times in the @context
  • resource exhaustion, as the entire remote file is loaded into memory before JSON parsing is attempted
  • Denial of Service, if web or task workers are tied up waiting for extended periods for HTTP requests to complete
  • attackers being able to probe the local filesystem using file:// URLs

I will still see how to hook in the remapping and filtering in a way that does not interfere with python's builtin mechanisms for customizing URL loading, but given standard documented python functional can be used to avoid all listed problems I will dispute the vulnerability.

aucampia added a commit to aucampia/rdflib that referenced this issue Feb 20, 2023
Add type hints to `rdflib/plugins/parser/*.py` and JSON-LD utils.

This is mainly because the work I'm doing to fix
<RDFLib#1844> is touching some of this
parser stuff and the type hints are useful to avoid mistakes.
aucampia added a commit to aucampia/rdflib that referenced this issue Feb 20, 2023
Add type hints to `rdflib/plugins/parser/*.py` and JSON-LD utils.

This is mainly because the work I'm doing to fix
<RDFLib#1844> is touching some of this
parser stuff and the type hints are useful to avoid mistakes.
aucampia added a commit to aucampia/rdflib that referenced this issue Feb 20, 2023
Add type hints to:
- `rdflib/parser.py`
- `rdflib/plugins/parser/*.py`
- some JSON-LD utils
- `rdflib/exceptions.py`.

This is mainly because the work I'm doing to fix
<RDFLib#1844> is touching some of
this parser stuff and the type hints are useful to avoid mistakes.

No runtime changes are included in this PR.
aucampia added a commit to aucampia/rdflib that referenced this issue Feb 20, 2023
Add type hints to:
- `rdflib/parser.py`
- `rdflib/plugins/parser/*.py`
- some JSON-LD utils
- `rdflib/exceptions.py`.

This is mainly because the work I'm doing to fix
<RDFLib#1844> is touching some of
this parser stuff and the type hints are useful to avoid mistakes.

No runtime changes are included in this PR.
aucampia added a commit to aucampia/rdflib that referenced this issue Feb 20, 2023
Add type hints to:
- `rdflib/parser.py`
- `rdflib/plugins/parser/*.py`
- some JSON-LD utils
- `rdflib/exceptions.py`.

This is mainly because the work I'm doing to fix
<RDFLib#1844> is touching some of
this parser stuff and the type hints are useful to avoid mistakes.

No runtime changes are included in this PR.
@aucampia
Copy link
Member Author

aucampia commented Mar 5, 2023

I took this up with SYNK:

We will change the wording of the advisory to reflect the findings of the maintainer (the one who opened this ticket) regarding the built-in behavior of Python, and the recommendation of how to remediate the issues described by using the library in a safe way. We’ve decided to keep the advisory in place, since the RDFlib does provide URL parsing functionality and it is therefore valuable to users to be aware of the possible risk of using it naively. We will be happy to include a reference to a warning in the relevant documentation if they’d like to add one, or to the type hints that have been added to the library, which also steer the user in the direction of safe usage.

A bit oddly phrased, I guess they want us to include a warning. I will do that, we may consider functionality for URL re-direction, but with a combination of urllib.request.install_opener and sys.addaudithook users can already prevent opening of malicious URLs. Furthermore, if you are actually running untrusted inputs, things like firejail/docker and normal firewalling should be used instead.

If the available options are not sufficent we would need a specific use case so we can fix the specific problem.

@alexdutton as you originally opened this I would appreciate if you can coordinate communication with SYNK further, it really is an incredibly poor experience as I have to raise a support request against their commercial product and then play broken telephone via their commercial 1st line support.

@aucampia
Copy link
Member Author

aucampia commented Mar 5, 2023

@alexdutton could urllib.request.urlopen not also be used to "be abused to retrieve arbitrary documents if used naïvely", and should you not raise similar advisories against python? I would say both URLInputSource and urlopen retrieve arbitrary documents by design, so probably the issues should be more specific to JSON-LD context handling.

aucampia added a commit that referenced this issue Mar 5, 2023
Add type hints to:
- `rdflib/parser.py`
- `rdflib/plugins/parser/*.py`
- some JSON-LD utils
- `rdflib/exceptions.py`.

This is mainly because the work I'm doing to fix
<#1844> is touching some of
this parser stuff and the type hints are useful to avoid mistakes.

No runtime changes are included in this PR.
aucampia added a commit to aucampia/rdflib that referenced this issue Mar 12, 2023
Several security measures can be used to mitigate risk when processing
potentially malicious input.

This change adds documentation about available security measures and
examples and tests that illustrate their usage.

- Closes <RDFLib#1844>.
@aucampia
Copy link
Member Author

aucampia commented Mar 12, 2023

Further communication from SYNK:

We are still discussing this further with our Security team and shall keep you posted accordingly.

Stay tuned !

Really not a great experience, I get they have no incentive to offer a good experience to library maintainers, so I would appreciate any help you can offer here @alexdutton.

I'm going to merge #2270 soon, which I will consider as closing this matter. All that does is add documentation.

aucampia added a commit to aucampia/rdflib that referenced this issue Mar 12, 2023
Several security measures can be used to mitigate risk when processing
potentially malicious input.

This change adds documentation about available security measures and
examples and tests that illustrate their usage.

- Closes <RDFLib#1844>.
aucampia added a commit to aucampia/rdflib that referenced this issue Mar 12, 2023
Several security measures can be used to mitigate risk when processing
potentially malicious input.

This change adds documentation about available security measures and
examples and tests that illustrate their usage.

- Closes <RDFLib#1844>.
aucampia added a commit to aucampia/rdflib that referenced this issue Mar 12, 2023
Several security measures can be used to mitigate risk when processing
potentially malicious input.

This change adds documentation about available security measures and
examples and tests that illustrate their usage.

- Closes <RDFLib#1844>.
aucampia added a commit to aucampia/rdflib that referenced this issue Mar 13, 2023
Several security measures can be used to mitigate risk when processing
potentially malicious input.

This change adds documentation about available security measures and
examples and tests that illustrate their usage.

- Closes <RDFLib#1844>.
aucampia added a commit to aucampia/rdflib that referenced this issue Mar 14, 2023
Several security measures can be used to mitigate risk when processing
potentially malicious input.

This change adds documentation about available security measures and
examples and tests that illustrate their usage.

- Closes <RDFLib#1844>.
@aucampia
Copy link
Member Author

Okay, https://security.snyk.io/vuln/SNYK-PYTHON-RDFLIB-1324490 was updated

Amendment
This was deemed not a vulnerability.

@RichardWallis
Copy link
Contributor

Great news!

Thanks to all that persevered to bring this to a conclusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants