URLInputSource can be abused to retrieve arbitrary documents if used naïvely #1844

aucampia · 2022-04-17T20:55:47Z

Discussed in #1543

^{Originally posted by alexdutton July 20, 2021}
This is mostly related to rdflib-jsonld, but the dereferencing implementation is in rdflib, hence raising it here.

Scenario

If a web service takes POSTed JSON-LD data, e.g. as part of a Linked Data Notifications implementation, rdflib will attempt to resolve any URL in the @context. This can lead to:

attackers being able to probe internal networks, by having rdflib request potential non-public URLs
reflection attacks, if the same or slightly-different URLs are repeated multiple times in the @context
resource exhaustion, as the entire remote file is loaded into memory before JSON parsing is attempted (admittedly an rdflib-jsonld issue)
denial of service, if web or task workers are tied up waiting for extended periods for HTTP requests to complete
attackers being able to probe the local filesystem using file:// URLs

Problem

rdflib provides no way to control how external references are resolved, nor a way to implement caching of external resources.

An implementor should be able to:

add URLs to a safelist, if e.g. they only expect certain JSON-LD contexts to be used
provide local copies of remote resources, to obviate needing to make HTTP requests
hook in a caching mechanism

These things should either be possible directly, or there should be an obvious way to hook them in.

Resolution

A new Resolver base class should be added that takes responsibility for resolving external references and returning InputSource instances, probably encapsulating the create_input_source() behaviour in a resolve() method. There should be a default implementation that resolves everything called e.g. DefaultResolver. Maybe this resolver has an instantiation parameter like resolve_schemes=('file', 'http', 'https') so it's easy to turn off dereferencing.

An optional resolver argument should be added to Graph.parse(), so that implementors can override the default behaviour. This is then passed down to the Parser.parse() plugin implementation, defaulting to an instance of DefaultResolver if not specified.

Finally, rdflib-jsonld can be updated to use the resolver instead of create_input_source directly.

Maybe there should also be a way to install a global default resolver to easily implement these protections without having to track down every Graph.parse() call.

Happy to put together a PR if/when an approach is agreed.

The text was updated successfully, but these errors were encountered:

aucampia · 2022-04-17T20:56:34Z

Converted back to an issue from discussion, this is tracked here: https://security.snyk.io/vuln/SNYK-PYTHON-RDFLIB-1324490

We should resolve this some way or another, either by making a change to rdflib or disputing the vulnerability.

aucampia · 2022-04-17T21:04:00Z

"critical" label transfers from the original issue.

ghost · 2022-04-18T10:17:38Z

FTR, Alex Dutton had worked up an implementation that seems to follow your intention: https://github.com/alexdutton/rdflib/commits/fix/1369-custom-resolver, dunno if rebasing it to current master will save some time.

RichardWallis · 2022-07-25T15:17:46Z

Any progress on this?

Getting pushback as to the advisability of using rdflib, based upon the lack of resolution to this issue.

"We should resolve this some way or another, either by making a change to rdflib or disputing the vulnerability." - agree.

aucampia · 2022-07-26T10:47:44Z

Nobody is working on this at the moment. A draft PR was made for this, but it is not in a mergable state and also reduces the overall quality of RDFLib. We are open to PRs to fix this though and I understand it is a priority, but right now the best I can say is that I will look into it when I have time, and will try get it fixed before the next release which will be in october.

RichardWallis · 2022-07-28T08:21:07Z

Thanks @aucampia for this update.

I would have had a go at a PR myself but I'm not familiar enough with inner workings and idiosyncrasies of rdflib to be able to quickly dive in.

Reading the snyk Issue, the fix to me would look like the ability to initialise rdflib [or at least the JSON-LD bits of it] in a secure mode that would not connect externally to URLs in any form; with ability to set one or more whitelist URLs that it would connect to.

I'm wondering if the original PR, although attempting something similar, was trying to cover too much ground.

aucampia · 2022-07-29T02:02:41Z

For reference:

The original PR was Fix/1369 custom resolver #1385 - but it never went past draft and was essentially abandoned.
This PR was rebased in rebase of Alex Dutton's Draft PR #1385 #1995 but rejected because of issues listed there

The PR did try to do a lot, but a lot has to be done, through it has to be done in a way that does not deteriorate the codebase or create liabilities, and there are some considerations outside of what the PR addressed that should also be addressed.

aucampia · 2022-07-29T02:19:14Z

What would be very helpful for resolving this issue is some prior art of how this is addressed in other RDF processors, I will try and make a bit of a survey.

But I also think we should have an ADR for this before a published PR.

nileshpatra · 2022-08-19T12:20:28Z

Hi, this is triggering a number of testing-autoremovals in debian, as there are a number of packages that depend on rdflib and this is a rather serious issue. I know work you on rdflib in 'volunteer' time, but would it be possible to give any sort of ETA for this?

I'm asking for it since I'm a bit concerned if it'd take too much time then it'd end up affecting a bunch of reverse-dependencies in next debian release, to the point that they'd end up being excluded, and hence the ping.

aucampia · 2022-08-19T14:12:06Z

I know work you on rdflib in 'volunteer' time

The concept of "volunteer time" may be a bit ambiguous, but I work on RDFLib exclusively during personal time, not sure about other maintainers.

but would it be possible to give any sort of ETA for this?

I'm asking for it since I'm a bit concerned if it'd take too much time then it'd end up affecting a bunch of reverse-dependencies in next debian release, to the point that they'd end up being excluded, and hence the ping.

I will do my best to fix it before the next release and that is planned for middle of October, but I'm not sure what Debian's release schedule is, so would be good to understand if this timeline is very problematic.

nileshpatra · 2022-08-19T14:37:36Z

On Fri, Aug 19, 2022 at 07:12:18AM -0700, Iwan Aucamp wrote: > I'm asking for it since I'm a bit concerned if it'd take too much time then it'd end up affecting a bunch of reverse-dependencies in next debian release, to the point that they'd end up being excluded, and hence the ping. I will do my best to fix it before the next release and that is planned for middle of October,

Thank you :)

but I'm not sure what Debian's release schedule is, so would be good to understand if this timeline is very problematic.

Currently freeze is scheduled from next year Jan. Basically, if it isn't fixed before this years christmas, it'd be too late.

See RDFLib/rdflib#1844 for more information.

* Update mkdocs-material requirement from ~=8.3 to ~=8.4 Updates the requirements on [mkdocs-material](https://github.com/squidfunk/mkdocs-material) to permit the latest version. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@8.3.0...8.4.0) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Ignore local package import for pylint tests * Ignore 48547 in safety check See RDFLib/rdflib#1844 for more information. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Casper Welzel Andersen <casper.w.andersen@sintef.no>

Update dependencies: * Update mkdocs-material requirement from ~=8.3 to ~=8.4 (#65) Ignore local package import for pylint tests. Ignore 48547 in safety check. See RDFLib/rdflib#1844 for more information. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Casper Welzel Andersen <casper.w.andersen@sintef.no>

tillea · 2022-09-29T13:11:18Z

Currently freeze is scheduled from next year Jan. Basically, if it isn't fixed before this years christmas, it'd be too late.

However, rdflib with RC bugs creates a lot of noise in the Debian package pool and removes heaps of packages from Debian testing. So it would be great if this could be fixed rather sooner than later. So if you might have a patch or some pre-release we might be interested.
Thanks a lot for working on rdflib, Andreas.

aucampia · 2022-09-29T18:49:46Z

I have started working on this and I do expect I will be finished around the middle of October which is the target for the next release. I have been a bit busy with other things so I have had limited time to dedicate to rdflib maintenance.

aucampia · 2022-10-13T18:34:50Z

quoting #2038 (comment)

I'm on leave next week (2022-W41), I'm going to finish whatever is needed for #1844 and try and fix as many bugs as possible, and then make the release on 2022-10-23 if all goes well.

The release will be 6.3.0 as I won't include any breaking changes.

nileshpatra · 2022-10-27T19:14:11Z

I'm on leave next week (2022-W41), I'm going to finish whatever is needed for #1844 and try and fix as many bugs as possible, and then make the release on 2022-10-23 if all goes well.

Hi, Is there still under your radar? I am asking this since we are a bit over 23-10-2022 now.

aucampia · 2022-10-27T22:36:56Z

I'm on leave next week (2022-W41), I'm going to finish whatever is needed for #1844 and try and fix as many bugs as possible, and then make the release on 2022-10-23 if all goes well.

Hi, Is there still under your radar? I am asking this since we are a bit over 23-10-2022 now.

Hi, workload at my day job is quite high, I'm trying but I really have very little spare capacity for now I can't really give you any good answers.

tillea · 2022-10-28T04:53:57Z

Am Thu, Oct 27, 2022 at 03:37:07PM -0700 schrieb Iwan Aucamp:

Hi, workload at my day job is quite high, I'm trying but I really have very little spare capacity for now I can't really give you any good answers.

I know we are all volunteers but I'd like to stress the urgency of the problem. Can you get any help from other team members of rdflib? Kind regards, Andreas.

aucampia · 2023-02-18T16:06:28Z

I have added a bunch of tests now, and have written a URI mapper and filter, but also, while trying to find the optimal place to hook these things in, I have found urllib.request.install_opener(opener)

Install an OpenerDirector instance as the default global opener. Installing an opener is only necessary if you want urlopen to use that opener; otherwise, simply call OpenerDirector.open() instead of urlopen(). The code does not check for a real OpenerDirector, and any class with the appropriate interface will work.

Given this I actually don't think this is a valid vulnerability, and instead is just a feature request, as anyone can easily install an opener that prevents everything reported in the vulnerability:

If a web service takes POSTed JSON-LD data, rdflib will attempt to resolve any URL in the @context. This can lead to:

attackers being able to probe internal networks, by having rdflib request potential non-public URLs

reflection attacks, if the same or slightly-different URLs are repeated multiple times in the @context

resource exhaustion, as the entire remote file is loaded into memory before JSON parsing is attempted

Denial of Service, if web or task workers are tied up waiting for extended periods for HTTP requests to complete

attackers being able to probe the local filesystem using file:// URLs

I will still see how to hook in the remapping and filtering in a way that does not interfere with python's builtin mechanisms for customizing URL loading, but given standard documented python functional can be used to avoid all listed problems I will dispute the vulnerability.

Add type hints to `rdflib/plugins/parser/*.py` and JSON-LD utils. This is mainly because the work I'm doing to fix <RDFLib#1844> is touching some of this parser stuff and the type hints are useful to avoid mistakes.

Add type hints to: - `rdflib/parser.py` - `rdflib/plugins/parser/*.py` - some JSON-LD utils - `rdflib/exceptions.py`. This is mainly because the work I'm doing to fix <RDFLib#1844> is touching some of this parser stuff and the type hints are useful to avoid mistakes. No runtime changes are included in this PR.

aucampia · 2023-03-05T11:28:07Z

I took this up with SYNK:

We will change the wording of the advisory to reflect the findings of the maintainer (the one who opened this ticket) regarding the built-in behavior of Python, and the recommendation of how to remediate the issues described by using the library in a safe way. We’ve decided to keep the advisory in place, since the RDFlib does provide URL parsing functionality and it is therefore valuable to users to be aware of the possible risk of using it naively. We will be happy to include a reference to a warning in the relevant documentation if they’d like to add one, or to the type hints that have been added to the library, which also steer the user in the direction of safe usage.

A bit oddly phrased, I guess they want us to include a warning. I will do that, we may consider functionality for URL re-direction, but with a combination of urllib.request.install_opener and sys.addaudithook users can already prevent opening of malicious URLs. Furthermore, if you are actually running untrusted inputs, things like firejail/docker and normal firewalling should be used instead.

If the available options are not sufficent we would need a specific use case so we can fix the specific problem.

@alexdutton as you originally opened this I would appreciate if you can coordinate communication with SYNK further, it really is an incredibly poor experience as I have to raise a support request against their commercial product and then play broken telephone via their commercial 1st line support.

aucampia · 2023-03-05T11:30:32Z

@alexdutton could urllib.request.urlopen not also be used to "be abused to retrieve arbitrary documents if used naïvely", and should you not raise similar advisories against python? I would say both URLInputSource and urlopen retrieve arbitrary documents by design, so probably the issues should be more specific to JSON-LD context handling.

Add type hints to: - `rdflib/parser.py` - `rdflib/plugins/parser/*.py` - some JSON-LD utils - `rdflib/exceptions.py`. This is mainly because the work I'm doing to fix <#1844> is touching some of this parser stuff and the type hints are useful to avoid mistakes. No runtime changes are included in this PR.

Several security measures can be used to mitigate risk when processing potentially malicious input. This change adds documentation about available security measures and examples and tests that illustrate their usage. - Closes <RDFLib#1844>.

aucampia · 2023-03-12T22:50:31Z

Further communication from SYNK:

We are still discussing this further with our Security team and shall keep you posted accordingly.

Stay tuned !

Really not a great experience, I get they have no incentive to offer a good experience to library maintainers, so I would appreciate any help you can offer here @alexdutton.

I'm going to merge #2270 soon, which I will consider as closing this matter. All that does is add documentation.

Several security measures can be used to mitigate risk when processing potentially malicious input. This change adds documentation about available security measures and examples and tests that illustrate their usage. - Closes <RDFLib#1844>.

aucampia · 2023-03-21T21:56:47Z

Okay, https://security.snyk.io/vuln/SNYK-PYTHON-RDFLIB-1324490 was updated

Amendment
This was deemed not a vulnerability.

RichardWallis · 2023-03-22T10:25:36Z

Great news!

Thanks to all that persevered to bring this to a conclusion

aucampia added critical 6.0.x labels Apr 17, 2022

aucampia added bug Something isn't working security and removed 6.0.x bug Something isn't working labels Apr 17, 2022

aucampia linked a pull request Apr 18, 2022 that will close this issue

Fix/1369 custom resolver #1385

Draft

github-actions bot mentioned this issue Jun 3, 2022

parse dataset series Informasjonsforvaltning/fdk-rdf-parser#154

Merged

This was referenced Jun 7, 2022

RDFLib vulnerability CasperWA/turtle-canon#71

Closed

RDFLib vulnerability emmo-repo/CIF-ontology#109

Closed

RDFLib vulnerability emmo-repo/EMMOntoPy#424

Open

ghost mentioned this issue Jun 20, 2022

rebase of Alex Dutton's Draft PR #1385 #1995

Closed

8 tasks

aucampia mentioned this issue Jul 29, 2022

Load a remote context via an unsupported network protocol #1650

Closed

aucampia added the networking Related to networking. label Aug 21, 2022

CasperWA added a commit to EMMC-ASBL/oteapi-dlite that referenced this issue Aug 22, 2022

Ignore 48547 in safety check

9213712

See RDFLib/rdflib#1844 for more information.

aucampia self-assigned this Sep 7, 2022

aucampia mentioned this issue Jan 16, 2023

Customisable resolver #2199

Closed

8 tasks

aucampia mentioned this issue Feb 20, 2023

feat: add parser type hints #2232

Merged

3 tasks

aucampia mentioned this issue Mar 8, 2023

fix: small InputSource related issues #2255

Merged

4 tasks

aucampia mentioned this issue Mar 12, 2023

docs: document avaiable security measures #2270

Merged

3 tasks

aucampia closed this as completed in #2270 Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URLInputSource can be abused to retrieve arbitrary documents if used naïvely #1844

URLInputSource can be abused to retrieve arbitrary documents if used naïvely #1844

aucampia commented Apr 17, 2022

Scenario

Problem

Resolution

aucampia commented Apr 17, 2022 •

edited

aucampia commented Apr 17, 2022 •

edited

ghost commented Apr 18, 2022

RichardWallis commented Jul 25, 2022

aucampia commented Jul 26, 2022

RichardWallis commented Jul 28, 2022

aucampia commented Jul 29, 2022

aucampia commented Jul 29, 2022

nileshpatra commented Aug 19, 2022 •

edited

aucampia commented Aug 19, 2022 •

edited

nileshpatra commented Aug 19, 2022 via email

tillea commented Sep 29, 2022

aucampia commented Sep 29, 2022

aucampia commented Oct 13, 2022

nileshpatra commented Oct 27, 2022

aucampia commented Oct 27, 2022

tillea commented Oct 28, 2022 via email

aucampia commented Feb 18, 2023

aucampia commented Mar 5, 2023 •

edited

aucampia commented Mar 5, 2023 •

edited

aucampia commented Mar 12, 2023 •

edited

aucampia commented Mar 21, 2023

RichardWallis commented Mar 22, 2023

URLInputSource can be abused to retrieve arbitrary documents if used naïvely #1844

URLInputSource can be abused to retrieve arbitrary documents if used naïvely #1844

Comments

aucampia commented Apr 17, 2022

Discussed in #1543

Scenario

Problem

Resolution

aucampia commented Apr 17, 2022 • edited

aucampia commented Apr 17, 2022 • edited

ghost commented Apr 18, 2022

RichardWallis commented Jul 25, 2022

aucampia commented Jul 26, 2022

RichardWallis commented Jul 28, 2022

aucampia commented Jul 29, 2022

aucampia commented Jul 29, 2022

nileshpatra commented Aug 19, 2022 • edited

aucampia commented Aug 19, 2022 • edited

nileshpatra commented Aug 19, 2022 via email

tillea commented Sep 29, 2022

aucampia commented Sep 29, 2022

aucampia commented Oct 13, 2022

nileshpatra commented Oct 27, 2022

aucampia commented Oct 27, 2022

tillea commented Oct 28, 2022 via email

aucampia commented Feb 18, 2023

aucampia commented Mar 5, 2023 • edited

aucampia commented Mar 5, 2023 • edited

aucampia commented Mar 12, 2023 • edited

aucampia commented Mar 21, 2023

RichardWallis commented Mar 22, 2023

aucampia commented Apr 17, 2022 •

edited

aucampia commented Apr 17, 2022 •

edited

nileshpatra commented Aug 19, 2022 •

edited

aucampia commented Aug 19, 2022 •

edited

aucampia commented Mar 5, 2023 •

edited

aucampia commented Mar 5, 2023 •

edited

aucampia commented Mar 12, 2023 •

edited