New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialization vulnerability (CVE-2019-12760) #75
Comments
|
Thanks for your research. I have looked into this before. It looks like it's not that easy to fix, because I don't really know of good and fast Python serializers. Probably need to look into However speed for this is paramount and I won't merge anything that's not about as fast. Note that you can disable caching by just using parso without |
|
I think it would make a lot of sense to serialize the DFA to a python module that can be shipped with parso. It's obviously not going to be fast as pickle, but it probably should be close enough, depending on the size of the DFA. |
|
But do you think that generating the dfa is too slow in any way? I just feel like that was never really a problem. |
|
When python-rope had a pickle deserialization vulnerability a while back, they addressed it with signature verification. Also, I haven't used Cerealizer myself, but it allows you to register explicitly which classes are allowed to be deserialized. |
Oh, sorry, I was misinterpreting what the cache was used for. I assumed it was just used to cache the language grammar (including the DFA), since that's what lib2to3 does. I didn't realize that you were caching the full syntax trees of modules. |
|
@bgw It's quite a bit faster to cache full syntax trees instead of parsing them. |
|
... and we're getting alerts every day. |
|
I kinda think this was silly this was opened in the first place. Anything that uses pickle is vulnerable to this issue. If you can't trust your own FS then you're already running into problems. Unless you're able to write your pickled files to a non writable location you are vulnerable. Should people just not use pickle at all? Sphinx uses pickle in the same(?)/similar way. @davidhalter marshal looks like carries the same/similar warning: https://docs.python.org/3.7/library/marshal.html Rope was able to use a pre-shared key because it's generated while the rope server is running and is passed to the "blessed" clients. Using pickle for caching does not allow this to happen because there is nowhere to store the key for subsequent invocations without that key also being vulnerable. If an attacker only has write access to the FS and can't read files using a key would fix the issue, but at that point it feels like splitting hairs. Because this library deals only with the AST it might be possible to define a JSON or other only data format, but it seems like the risk of using pickle is so small that it doesn't make a lot of sense unless the author really cares. I personally would probably just put a note in the documentation that if you're trying to use this for a service or in some sort of server code that you restrict the cache directory. I haven't looked at how pickle is being used in this code, but if restricting the globals makes sense then it wouldn't hurt to add that. (Not sure if you'd be able to limit to the AST classes). Sphinx may be doing that, or may need to add that too. |
|
@dhondta Thinking about how pickle works, and the gist this does also imply the attacker can also modify the Python source, right? Pickle only encodes the full path to an object not the code, so the malicious code would need to be shipped to the target to somewhere that would be on the path and importable from the cached AST. I believe this makes a real world attack even more unlikely (I am not saying impossible). The attacker would then also need to know the FS layout a little better and also know what is the current directory of the interpreter (and this is before you have any code running, so it seems like you'd have to spray the FS or know how the target lays things out and is run). I think just adding some documentation and saying that if you are worried about malicious code it's safer just to disable the cache as @davidhalter mentioned fixes the issue. I personally have already marked this vulnerability which comes via iPython as tolerable to out project. |
Serialization with Pickle is inherently weak (that's not a bug, that's a feature) and should be handled with caution in implementations. The vulnerability I reported is analog to CVE-2019-6446 (which relies on a kind of shortcut provided by Numpy to
That's actually the case on Web servers that use (unsafe) file upload features.
In the current implementation of I don't think the present vulnerability could lead to a full compromise in itself, however it could be used in a combination of vulnerabilities to lead to a RCE.
Definitely not, but pickling must be addressed with caution, i.e. not relying on any user-controllable value (which is the case here).
No, it just requires the followings, given that caching is enabled :
Note: The proof-of-concept script is structured with functions to be tuned (see the |
It is precisely one of the requirements for the vulnerability to be exploitable. As I mentioned before, I think this is very unlikely to be exploitable in the wild. However, this is indeed a vulnerability in itself because of the dependency to a user input leading to pickle loading. Removing/disabling the cache feature is probably the easiest mitigation. However, does removing caching impact the performance that much ? The main problem : Your cache feature loads a grammar of Python from a Pickle whose path is guessable. So, a normal cached grammar is the serialization of something relying on static data (e.g. Proposed mitigation : Given that, I think you should consider modifying your cache mechanism to add integrity checking on the valid pickled grammars (like a kind of grammar whitelisting), i.e. :
|
|
I don't think this is worth a CVE or even a 'mitigation'. Yes, a webserver that writes uploads to arbitrary paths is broken. No, things affected by uploaded malicious code put in a specific location are not broken. The 'exploit' loads from a specified path.
This is a broken webserver.
How? Python is not vulnerable simply because it's possible to add a user-controlled directory to Assuming that parso has any responsibility here at all, it seems like the simplest fix is to not have an argument for the cache file. If a user can change files in As far as I can tell, parso doesn't search around for files to load, but has exactly one well-specified place that it looks if the path argument is not given. ed: to be clear, I despise pickle and would prefer that no project use it for any serialization (and further clarification: marshal is even worse) but this isn't a security issue with pickle or parso. |
|
@dhondta I can generally understand that this is a problem. However isn't a much bigger problem that you can just modify Just wondering, because in that case it's a clear feature that makes this already possible without any work (like this CVE).
It impacts performance for some modules quite a bit.
I agree. I just fail to see any fast alternatives in the Python world (especially in the stdlib). |
|
Thinking about this further, I think it's less and less of a problem. For all normal users, writing files to
So I would be really interested if people really think this is an issue, because it seems like it's pretty normal to gain code execution upon having arbitrary write access. PS: And no I don't think pickle is a good choice, I just want to point out that this is not bad enough to alert all users of 30'000 github projects that parso has a bad security issue. |
I think it's an issue, even if it means a reduction in speed and some temporary inconvenience for those that depend on I think it would also demonstrate good form on the part of the maintainers that they're willing to fix security issues regardless of how small and unimpactful they personally consider the issue to be. |
|
On the contrary, I think it would be good for maintainers to be able to push back and close CVEs as invalid. People are already getting false positives with "this release of parso is vulnerable". It would be absurd to file a CVE against python with this as the proof of concept: with open('/tmp/outfile', 'w') as outfile:
outfile.write('''import subprocess; subprocess.run(['ls'])''')
with open('/tmp/outfile', 'r') as infile:
exec(infile.read()) # vulnerable lineThe CVE is filed as having a network attack vector. What network does |
|
@habnabit How do I push back? The more I think about it, the more I think this is more like a normal issue and not a security issue.
Obviously nothing. |
|
I was told you can ask for the CVE to be rejected by using the form on this site: https://cveform.mitre.org |
Given the elements I gave, MITRE indeed attributed a Network attack vector.
This question demonstrates that you did not get it at all... Of course, Note however that it could be discussed if the vulnerability should not be attributed to the application itself instead of
Of course, you can dispute this CVE if you want but note that MITRE analyzed this and estimated it was worth publishing. Also, I do not think this issue requires that much effort to be fixed, surely less than obstinately trying to dispute the CVE. @davidhalter it's up to you... |
Then it isn't a vulnerability in edit: @dhondta if you are interested in explaining that this is a |
Where did you ever see any reference to |
|
@dhondta it's the same issue: code that isn't in |
|
@habnabit Definitely not. In this issue, there is no discussion of writing files especially in user's folder (and, in particular PS: @davidhalter These thoughts also apply to your previous response about writing files to |
|
@dhondta what is relevant to this vulnerability? Application code has to go out of its way to both:
In what case would the application code author do this, intentionally or not? A web server might write untrusted uploaded files somewhere by default, but again, a user of |
@dhondta Please look at the implementation. It writes cache to |
|
@habnabit Obviously, if the developer handles its cache folder in application's workspace and this is reachable by end-users (which could frequently be the case in the wild), it's then vulnerable. Of course, it could be mitigated by extra hardening measures but it's always better to fix such an issue at the source instead of relying on measures that could potentially be applied. @szuliq Again, Also, remember that, as @davidhalter mentioned about the performance, even if caching is not the default behavior, any developer wanting to leverage this will start calling feature's functions, i.e. |
|
@dhondta is your assertion that an app developer would consciously, intentionally choose to make the |
|
@habnabit Not considering only Web servers, yes, definitely, for sure if he/she's a developer who is not as skilled as you (given the quality of your repositories, that I think seems to be great). |
|
Hey there
This isn't how CVSS is supposed to work, and I believe MITRE has made an error in judgement here. CVSS is for known factors, not hypotheticals. If a downstream project allows a user to provide a malicious pickle over the network, then that is considered a network access vector. But since parso has no network access, then the parso CVE should be considered "local" (which reduces CVSS from 7.5 to 7.0). @dhondta do you consider all uses of Pickle to be vulnerabilities? If not, what is unique about this case? Any application that allows an adversary to write to uncontrolled paths is going to have a bevy of security ramifications. If the adversary can write to IMO, the proper course of action here is to add a warning in the API docs, request the CVE to be invalidated, and ask GitHub to rescind the thousands of security alerts that have been sent out. |
|
Thanks everybody for shedding light onto this. I will definitely add warnings to the API docs. I will also add an issue to parso to replace pickle with a better serialization option, but that could take a long time. I will then try to get the CVE rejected and hope that Github stops notifying those 30'000 users. I personally don't think of this as a CVE as long as it's documented. It's totally fine to have a cache with pickle that has to be activated by using @mehaase Thanks for this response, you raised a few interesting points! @dhondta If it was easy to fix, I would have done so long ago, when you wrote me in February. It's not and that's the problem. There's no good and fast serialization options in Python AFAIK. It's also not at all comparable to the numpy vulnerability you listed. It's really something different in how people use it |
I hope these arguments clarify things... |
|
@jtrakk wrote:
@dhondta, @davidhalter, Why hasn't signature verification been discussed/considered? I'm wondering why a similar approach is not viable :-) eg: All existing users of
If a dependant package enables |
|
P.S. The objective is automatic mitigation for existing projects |
|
@sten0 the |
@mehaase In order to illustrate further what I told before, you can try this Google search and look at the results : |
|
@sten0 Where would you store the signature? If the signature is somewhere in your filesystem, it doesn't really change anything. Note however that I don't think that this is a vulnerability. I have documented in 19de3eb that parso uses pickle files when you enable the cache, which is totally fine and with that I consider this issue solved. Opened #79 to replace pickle, but this will probably not happen very soon. |
Addressing security issue davidhalter/parso#75
Addressing security issue davidhalter/parso#75

Vulnerability Description : See CVE-2019-12760
Note : Let us be honest, this should be very unlikely to be exploitable in the wild.
The text was updated successfully, but these errors were encountered: