New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URLInputSource can be abused to retrieve arbitrary documents if used naïvely #1844
Comments
Converted back to an issue from discussion, this is tracked here: https://security.snyk.io/vuln/SNYK-PYTHON-RDFLIB-1324490 We should resolve this some way or another, either by making a change to rdflib or disputing the vulnerability. |
"critical" label transfers from the original issue. |
FTR, Alex Dutton had worked up an implementation that seems to follow your intention: https://github.com/alexdutton/rdflib/commits/fix/1369-custom-resolver, dunno if rebasing it to current master will save some time. |
Any progress on this? Getting pushback as to the advisability of using rdflib, based upon the lack of resolution to this issue. "We should resolve this some way or another, either by making a change to rdflib or disputing the vulnerability." - agree. |
Nobody is working on this at the moment. A draft PR was made for this, but it is not in a mergable state and also reduces the overall quality of RDFLib. We are open to PRs to fix this though and I understand it is a priority, but right now the best I can say is that I will look into it when I have time, and will try get it fixed before the next release which will be in october. |
Thanks @aucampia for this update. I would have had a go at a PR myself but I'm not familiar enough with inner workings and idiosyncrasies of rdflib to be able to quickly dive in. Reading the snyk Issue, the fix to me would look like the ability to initialise rdflib [or at least the JSON-LD bits of it] in a secure mode that would not connect externally to URLs in any form; with ability to set one or more whitelist URLs that it would connect to. I'm wondering if the original PR, although attempting something similar, was trying to cover too much ground. |
For reference:
The PR did try to do a lot, but a lot has to be done, through it has to be done in a way that does not deteriorate the codebase or create liabilities, and there are some considerations outside of what the PR addressed that should also be addressed. |
What would be very helpful for resolving this issue is some prior art of how this is addressed in other RDF processors, I will try and make a bit of a survey. But I also think we should have an ADR for this before a published PR. |
Hi, this is triggering a number of testing-autoremovals in debian, as there are a number of packages that depend on rdflib and this is a rather serious issue. I know work you on rdflib in 'volunteer' time, but would it be possible to give any sort of ETA for this? I'm asking for it since I'm a bit concerned if it'd take too much time then it'd end up affecting a bunch of reverse-dependencies in next debian release, to the point that they'd end up being excluded, and hence the ping. |
The concept of "volunteer time" may be a bit ambiguous, but I work on RDFLib exclusively during personal time, not sure about other maintainers.
I will do my best to fix it before the next release and that is planned for middle of October, but I'm not sure what Debian's release schedule is, so would be good to understand if this timeline is very problematic. |
On Fri, Aug 19, 2022 at 07:12:18AM -0700, Iwan Aucamp wrote:
> I'm asking for it since I'm a bit concerned if it'd take too much time then it'd end up affecting a bunch of reverse-dependencies in next debian release, to the point that they'd end up being excluded, and hence the ping.
I will do my best to fix it before the next release and that is planned for middle of October,
Thank you :)
but I'm not sure what Debian's release schedule is, so would be good to understand if this timeline is very problematic.
Currently freeze is scheduled from next year Jan. Basically, if it isn't fixed before this years christmas, it'd be
too late.
|
See RDFLib/rdflib#1844 for more information.
* Update mkdocs-material requirement from ~=8.3 to ~=8.4 Updates the requirements on [mkdocs-material](https://github.com/squidfunk/mkdocs-material) to permit the latest version. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@8.3.0...8.4.0) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Ignore local package import for pylint tests * Ignore 48547 in safety check See RDFLib/rdflib#1844 for more information. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Casper Welzel Andersen <casper.w.andersen@sintef.no>
Update dependencies: * Update mkdocs-material requirement from ~=8.3 to ~=8.4 (#65) Ignore local package import for pylint tests. Ignore 48547 in safety check. See RDFLib/rdflib#1844 for more information. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Casper Welzel Andersen <casper.w.andersen@sintef.no>
However, rdflib with RC bugs creates a lot of noise in the Debian package pool and removes heaps of packages from Debian testing. So it would be great if this could be fixed rather sooner than later. So if you might have a patch or some pre-release we might be interested. |
I have started working on this and I do expect I will be finished around the middle of October which is the target for the next release. I have been a bit busy with other things so I have had limited time to dedicate to rdflib maintenance. |
quoting #2038 (comment)
|
Hi, Is there still under your radar? I am asking this since we are a bit over 23-10-2022 now. |
Hi, workload at my day job is quite high, I'm trying but I really have very little spare capacity for now I can't really give you any good answers. |
Am Thu, Oct 27, 2022 at 03:37:07PM -0700 schrieb Iwan Aucamp:
Hi, workload at my day job is quite high, I'm trying but I really have very little spare capacity for now I can't really give you any good answers.
I know we are all volunteers but I'd like to stress the urgency of the problem. Can you get any help from other team members of rdflib?
Kind regards, Andreas.
|
I have added a bunch of tests now, and have written a URI mapper and filter, but also, while trying to find the optimal place to hook these things in, I have found
Given this I actually don't think this is a valid vulnerability, and instead is just a feature request, as anyone can easily install an opener that prevents everything reported in the vulnerability:
I will still see how to hook in the remapping and filtering in a way that does not interfere with python's builtin mechanisms for customizing URL loading, but given standard documented python functional can be used to avoid all listed problems I will dispute the vulnerability. |
Add type hints to `rdflib/plugins/parser/*.py` and JSON-LD utils. This is mainly because the work I'm doing to fix <RDFLib#1844> is touching some of this parser stuff and the type hints are useful to avoid mistakes.
Add type hints to `rdflib/plugins/parser/*.py` and JSON-LD utils. This is mainly because the work I'm doing to fix <RDFLib#1844> is touching some of this parser stuff and the type hints are useful to avoid mistakes.
Add type hints to: - `rdflib/parser.py` - `rdflib/plugins/parser/*.py` - some JSON-LD utils - `rdflib/exceptions.py`. This is mainly because the work I'm doing to fix <RDFLib#1844> is touching some of this parser stuff and the type hints are useful to avoid mistakes. No runtime changes are included in this PR.
Add type hints to: - `rdflib/parser.py` - `rdflib/plugins/parser/*.py` - some JSON-LD utils - `rdflib/exceptions.py`. This is mainly because the work I'm doing to fix <RDFLib#1844> is touching some of this parser stuff and the type hints are useful to avoid mistakes. No runtime changes are included in this PR.
Add type hints to: - `rdflib/parser.py` - `rdflib/plugins/parser/*.py` - some JSON-LD utils - `rdflib/exceptions.py`. This is mainly because the work I'm doing to fix <RDFLib#1844> is touching some of this parser stuff and the type hints are useful to avoid mistakes. No runtime changes are included in this PR.
I took this up with SYNK:
A bit oddly phrased, I guess they want us to include a warning. I will do that, we may consider functionality for URL re-direction, but with a combination of urllib.request.install_opener and sys.addaudithook users can already prevent opening of malicious URLs. Furthermore, if you are actually running untrusted inputs, things like firejail/docker and normal firewalling should be used instead. If the available options are not sufficent we would need a specific use case so we can fix the specific problem. @alexdutton as you originally opened this I would appreciate if you can coordinate communication with SYNK further, it really is an incredibly poor experience as I have to raise a support request against their commercial product and then play broken telephone via their commercial 1st line support. |
@alexdutton could urllib.request.urlopen not also be used to "be abused to retrieve arbitrary documents if used naïvely", and should you not raise similar advisories against python? I would say both URLInputSource and urlopen retrieve arbitrary documents by design, so probably the issues should be more specific to JSON-LD context handling. |
Add type hints to: - `rdflib/parser.py` - `rdflib/plugins/parser/*.py` - some JSON-LD utils - `rdflib/exceptions.py`. This is mainly because the work I'm doing to fix <#1844> is touching some of this parser stuff and the type hints are useful to avoid mistakes. No runtime changes are included in this PR.
Several security measures can be used to mitigate risk when processing potentially malicious input. This change adds documentation about available security measures and examples and tests that illustrate their usage. - Closes <RDFLib#1844>.
Further communication from SYNK:
Really not a great experience, I get they have no incentive to offer a good experience to library maintainers, so I would appreciate any help you can offer here @alexdutton. I'm going to merge #2270 soon, which I will consider as closing this matter. All that does is add documentation. |
Several security measures can be used to mitigate risk when processing potentially malicious input. This change adds documentation about available security measures and examples and tests that illustrate their usage. - Closes <RDFLib#1844>.
Several security measures can be used to mitigate risk when processing potentially malicious input. This change adds documentation about available security measures and examples and tests that illustrate their usage. - Closes <RDFLib#1844>.
Several security measures can be used to mitigate risk when processing potentially malicious input. This change adds documentation about available security measures and examples and tests that illustrate their usage. - Closes <RDFLib#1844>.
Several security measures can be used to mitigate risk when processing potentially malicious input. This change adds documentation about available security measures and examples and tests that illustrate their usage. - Closes <RDFLib#1844>.
Several security measures can be used to mitigate risk when processing potentially malicious input. This change adds documentation about available security measures and examples and tests that illustrate their usage. - Closes <RDFLib#1844>.
Okay, https://security.snyk.io/vuln/SNYK-PYTHON-RDFLIB-1324490 was updated
|
Great news! Thanks to all that persevered to bring this to a conclusion |
Discussed in #1543
Originally posted by alexdutton July 20, 2021
This is mostly related to rdflib-jsonld, but the dereferencing implementation is in rdflib, hence raising it here.
Scenario
If a web service takes POSTed JSON-LD data, e.g. as part of a Linked Data Notifications implementation, rdflib will attempt to resolve any URL in the
@context
. This can lead to:@context
file://
URLsProblem
rdflib provides no way to control how external references are resolved, nor a way to implement caching of external resources.
An implementor should be able to:
These things should either be possible directly, or there should be an obvious way to hook them in.
Resolution
A new
Resolver
base class should be added that takes responsibility for resolving external references and returningInputSource
instances, probably encapsulating thecreate_input_source()
behaviour in aresolve()
method. There should be a default implementation that resolves everything called e.g.DefaultResolver
. Maybe this resolver has an instantiation parameter likeresolve_schemes=('file', 'http', 'https')
so it's easy to turn off dereferencing.An optional
resolver
argument should be added toGraph.parse()
, so that implementors can override the default behaviour. This is then passed down to theParser.parse()
plugin implementation, defaulting to an instance ofDefaultResolver
if not specified.Finally, rdflib-jsonld can be updated to use the
resolver
instead ofcreate_input_source
directly.Maybe there should also be a way to install a global default resolver to easily implement these protections without having to track down every
Graph.parse()
call.Happy to put together a PR if/when an approach is agreed.
The text was updated successfully, but these errors were encountered: