Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization vulnerability (CVE-2019-12760) #75

Closed
dhondta opened this issue Jun 11, 2019 · 37 comments
Closed

Deserialization vulnerability (CVE-2019-12760) #75

dhondta opened this issue Jun 11, 2019 · 37 comments

Comments

@dhondta
Copy link

dhondta commented Jun 11, 2019

Vulnerability Description : See CVE-2019-12760

Note : Let us be honest, this should be very unlikely to be exploitable in the wild.

@davidhalter
Copy link
Owner

Thanks for your research.

I have looked into this before. It looks like it's not that easy to fix, because I don't really know of good and fast Python serializers. Probably need to look into marshal for a bit.

However speed for this is paramount and I won't merge anything that's not about as fast.

Note that you can disable caching by just using parso without cache=True, which is how you avoid this vulnerability altogether.

@bgw
Copy link
Contributor

bgw commented Jun 11, 2019

I think it would make a lot of sense to serialize the DFA to a python module that can be shipped with parso.

It's obviously not going to be fast as pickle, but it probably should be close enough, depending on the size of the DFA.

@davidhalter
Copy link
Owner

But do you think that generating the dfa is too slow in any way? I just feel like that was never really a problem.

@jtrakk
Copy link

jtrakk commented Jun 13, 2019

When python-rope had a pickle deserialization vulnerability a while back, they addressed it with signature verification.

Also, I haven't used Cerealizer myself, but it allows you to register explicitly which classes are allowed to be deserialized.

@bgw
Copy link
Contributor

bgw commented Jun 14, 2019

I think it would make a lot of sense to serialize the DFA to a python module that can be shipped with parso.

But do you think that generating the dfa is too slow in any way? I just feel like that was never really a problem.

Oh, sorry, I was misinterpreting what the cache was used for. I assumed it was just used to cache the language grammar (including the DFA), since that's what lib2to3 does. I didn't realize that you were caching the full syntax trees of modules.

@davidhalter
Copy link
Owner

@bgw It's quite a bit faster to cache full syntax trees instead of parsing them.

@max-arnold
Copy link

FYI: Github is creating automatic security alerts for projects that have parso listed in a requirements.txt

2019-06-17 at 12 16

@lsh-0
Copy link

lsh-0 commented Jun 18, 2019

... and we're getting alerts every day.

@terencehonles
Copy link

terencehonles commented Jun 18, 2019

I kinda think this was silly this was opened in the first place. Anything that uses pickle is vulnerable to this issue. If you can't trust your own FS then you're already running into problems. Unless you're able to write your pickled files to a non writable location you are vulnerable.

Should people just not use pickle at all?

Sphinx uses pickle in the same(?)/similar way.

@davidhalter marshal looks like carries the same/similar warning: https://docs.python.org/3.7/library/marshal.html

Rope was able to use a pre-shared key because it's generated while the rope server is running and is passed to the "blessed" clients. Using pickle for caching does not allow this to happen because there is nowhere to store the key for subsequent invocations without that key also being vulnerable. If an attacker only has write access to the FS and can't read files using a key would fix the issue, but at that point it feels like splitting hairs.

Because this library deals only with the AST it might be possible to define a JSON or other only data format, but it seems like the risk of using pickle is so small that it doesn't make a lot of sense unless the author really cares. I personally would probably just put a note in the documentation that if you're trying to use this for a service or in some sort of server code that you restrict the cache directory.

I haven't looked at how pickle is being used in this code, but if restricting the globals makes sense then it wouldn't hurt to add that. (Not sure if you'd be able to limit to the AST classes). Sphinx may be doing that, or may need to add that too.

@terencehonles
Copy link

terencehonles commented Jun 18, 2019

@dhondta Thinking about how pickle works, and the gist this does also imply the attacker can also modify the Python source, right?

Pickle only encodes the full path to an object not the code, so the malicious code would need to be shipped to the target to somewhere that would be on the path and importable from the cached AST. I believe this makes a real world attack even more unlikely (I am not saying impossible). The attacker would then also need to know the FS layout a little better and also know what is the current directory of the interpreter (and this is before you have any code running, so it seems like you'd have to spray the FS or know how the target lays things out and is run).

I think just adding some documentation and saying that if you are worried about malicious code it's safer just to disable the cache as @davidhalter mentioned fixes the issue.

I personally have already marked this vulnerability which comes via iPython as tolerable to out project.

@dhondta
Copy link
Author

dhondta commented Jun 18, 2019

@terencehonles

Anything that uses pickle is vulnerable to this issue.

Serialization with Pickle is inherently weak (that's not a bug, that's a feature) and should be handled with caution in implementations.

The vulnerability I reported is analog to CVE-2019-6446 (which relies on a kind of shortcut provided by Numpy to pickle.load ; note that the exploitability of this is somewhat trickier if you consider what we can found in the wild, i.e. on Web servers) but exploitable through a different avenue of approach which could eventually be encountered on systems providing a way to trigger grammar parsing with parso and using its cache feature. Again, I think this should be very unlikely to be usable in the wild, yet it is a vulnerability as it propagates a user-controllable value up to a pickle.load.

If you can't trust your own FS then you're already running into problems.

That's actually the case on Web servers that use (unsafe) file upload features.

Unless you're able to write your pickled files to a non writable location you are vulnerable.

In the current implementation of parso, the knowledge of the caching folder implies knowing where a user-forged file (e.g. uploaded through a file upload vulnerability) will trigger the creation of pickle cache file. If this folder is writable, an evil pickle is also very likely to be writable, which means that this user-forged pickle will be deserialized.

I don't think the present vulnerability could lead to a full compromise in itself, however it could be used in a combination of vulnerabilities to lead to a RCE.

Should people just not use pickle at all?

Definitely not, but pickling must be addressed with caution, i.e. not relying on any user-controllable value (which is the case here).

marshal looks like carries the same/similar warning

marshal is to any implementation using it what pickle is to parso, a dependency for providing serialization.

Thinking about how pickle works, and the gist this does also imply the attacker can also modify the Python source, right?

No, it just requires the followings, given that caching is enabled :

  1. Be able to write in the cache folder (e.g. using a file upload feature) OR modify the cache folder path to a user-controlled one.
  2. Trigger grammar loading and parsing.

Note: The proof-of-concept script is structured with functions to be tuned (see the TODO's) to automate the exploit in a pseudo-shell. I added a simpler exploit for the sake of simplicity on the related Gist.

@dhondta
Copy link
Author

dhondta commented Jun 18, 2019

@davidhalter

Note that you can disable caching by just using parso without cache=True, which is how you avoid this vulnerability altogether.

It is precisely one of the requirements for the vulnerability to be exploitable. As I mentioned before, I think this is very unlikely to be exploitable in the wild. However, this is indeed a vulnerability in itself because of the dependency to a user input leading to pickle loading.

Removing/disabling the cache feature is probably the easiest mitigation. However, does removing caching impact the performance that much ?

The main problem : Your cache feature loads a grammar of Python from a Pickle whose path is guessable. So, a normal cached grammar is the serialization of something relying on static data (e.g. /usr/local/lib/python2.7/dist-packages/parso/python/grammar34.txt or [...]/grammar36.txt) and is stored in a potentially dangerous place that could be overwritten by an evil Pickle.

Proposed mitigation : Given that, I think you should consider modifying your cache mechanism to add integrity checking on the valid pickled grammars (like a kind of grammar whitelisting), i.e. :

  • At parso loading, when calling cache.py, compute a list of all valid pickled grammars using e.g. SHA256 (with the hashlib module).
  • When loading a grammar through .load_grammar(), before reaching the pickle.load, compute the SHA256 of the opened file and if it is not part of the aforementioned list, just raise/log a security alert.

@habnabit
Copy link

habnabit commented Jun 18, 2019

I don't think this is worth a CVE or even a 'mitigation'. Yes, a webserver that writes uploads to arbitrary paths is broken. No, things affected by uploaded malicious code put in a specific location are not broken. The 'exploit' loads from a specified path.

Be able to write in the cache folder (e.g. using a file upload feature)

This is a broken webserver.

OR modify the cache folder path to a user-controlled one.

How? Python is not vulnerable simply because it's possible to add a user-controlled directory to sys.path. The mitigation is: don't allow users to specify this directory. That isn't an issue in parso.


Assuming that parso has any responsibility here at all, it seems like the simplest fix is to not have an argument for the cache file. If a user can change files in .../python2.7/dist-packages/ then the user can do much worse than making a pickle file malicious.

As far as I can tell, parso doesn't search around for files to load, but has exactly one well-specified place that it looks if the path argument is not given.


ed: to be clear, I despise pickle and would prefer that no project use it for any serialization (and further clarification: marshal is even worse) but this isn't a security issue with pickle or parso.

@davidhalter
Copy link
Owner

@dhondta I can generally understand that this is a problem. However isn't a much bigger problem that you can just modify ~/.local/lib/python3.6/site-packages/usercustomize.py for code execution in the same vein? Arguably that's much worse and happens for every start of a Python executable (I know about the -S switch, but pretty much nobody uses it).

Just wondering, because in that case it's a clear feature that makes this already possible without any work (like this CVE).

Removing/disabling the cache feature is probably the easiest mitigation. However, does removing caching impact the performance that much ?

It impacts performance for some modules quite a bit.

to be clear, I despise pickle and would prefer that no project use it for any serialization

I agree. I just fail to see any fast alternatives in the Python world (especially in the stdlib).

@davidhalter
Copy link
Owner

Thinking about this further, I think it's less and less of a problem. For all normal users, writing files to $HOME is pretty much arbitrary code execution anyway, through one of the following:

  • Any executable that people eventually use (like shell scripts/Python scripts/etc.)
  • ~/.local/lib/python3.6/site-packages/usercustomize.py is executed when Python is started
  • Modifying .bashrc is probably the most successful thing.

So I would be really interested if people really think this is an issue, because it seems like it's pretty normal to gain code execution upon having arbitrary write access.

PS: And no I don't think pickle is a good choice, I just want to point out that this is not bad enough to alert all users of 30'000 github projects that parso has a bad security issue.

@lsh-0
Copy link

lsh-0 commented Jun 24, 2019

So I would be really interested if people really think this is an issue,

I think it's an issue, even if it means a reduction in speed and some temporary inconvenience for those that depend on parso using pickle.

I think it would also demonstrate good form on the part of the maintainers that they're willing to fix security issues regardless of how small and unimpactful they personally consider the issue to be.

@habnabit
Copy link

On the contrary, I think it would be good for maintainers to be able to push back and close CVEs as invalid. People are already getting false positives with "this release of parso is vulnerable". It would be absurd to file a CVE against python with this as the proof of concept:

with open('/tmp/outfile', 'w') as outfile:
    outfile.write('''import subprocess; subprocess.run(['ls'])''')

with open('/tmp/outfile', 'r') as infile:
    exec(infile.read())  # vulnerable line

The CVE is filed as having a network attack vector. What network does parso connect to?

@davidhalter
Copy link
Owner

@habnabit How do I push back? The more I think about it, the more I think this is more like a normal issue and not a security issue.

What network does parso connect to?

Obviously nothing.

@habnabit
Copy link

I was told you can ask for the CVE to be rejected by using the form on this site: https://cveform.mitre.org

@dhondta
Copy link
Author

dhondta commented Jun 25, 2019

@habnabit

The CVE is filed as having a network attack vector.

Given the elements I gave, MITRE indeed attributed a Network attack vector.

What network does parso connect to ?

This question demonstrates that you did not get it at all...

Of course, parso does not connect to any network, there is no source code pointing to any network-related library. But what if parso is used as a library in a project that exposes an interface to remote users ? Then the conditions could be met for the application to be vulnerable.

Note however that it could be discussed if the vulnerability should not be attributed to the application itself instead of parso. But I think that the way the cache is managed could be simply modified to prevent any bad use of the library as a dependency in an application.

I was told you can ask for the CVE to be rejected by using the form on this site.

Of course, you can dispute this CVE if you want but note that MITRE analyzed this and estimated it was worth publishing. Also, I do not think this issue requires that much effort to be fixed, surely less than obstinately trying to dispute the CVE. @davidhalter it's up to you...

@habnabit
Copy link

habnabit commented Jun 25, 2019

Of course, parso does not connect to any network, there is no source code pointing to any network-related library.

Then it isn't a vulnerability in parso. Simple as that. Please close this issue and CVE.


edit: @dhondta if you are interested in explaining that this is a parso issue, please first explain why you haven't filed a CVE against python for the same behavior wrt. usercustomize.py.

@dhondta
Copy link
Author

dhondta commented Jun 25, 2019

@habnabit

[...] please first explain why you haven't filed a CVE against python for the same behavior wrt. usercustomize.py.

Where did you ever see any reference to usercustomize.py in my statements ? This item is completely out-of-scope...

@habnabit
Copy link

@dhondta it's the same issue: code that isn't in parso writes to a file inside the python's site-packages. In this case, usercustomize.py instead of a pickled grammar. This has a much broader scope, too; the malicious code is run every time python starts.

@dhondta
Copy link
Author

dhondta commented Jun 25, 2019

@habnabit Definitely not. In this issue, there is no discussion of writing files especially in user's folder (and, in particular ~/.local/[...]), but well in application's workspace (e.g. a Web server using parso in its backend), which is a different thing. Indeed, if you let users write to a system user folder, it implies other interactions with the OS, therefore other issues. But this is simply irrelevant in this vulnerability.

PS: @davidhalter These thoughts also apply to your previous response about writing files to $HOME. It is simply irrelevant in this case.

@habnabit
Copy link

@dhondta what is relevant to this vulnerability? Application code has to go out of its way to both:

  • make a path writeable to untrusted actors
  • load parso grammars from that same path

In what case would the application code author do this, intentionally or not? A web server might write untrusted uploaded files somewhere by default, but again, a user of parso would have to go out of their way to be loading cached grammars from that same path. It isn't the default behavior.

@szuliq
Copy link

szuliq commented Jun 25, 2019

In this issue, there is no discussion of writing files especially in user's folder (and, in particular ~/.local/[...]), but well in application's workspace (e.g. a Web server using parso in its backend) which is a different thing.

@dhondta Please look at the implementation. It writes cache to ~/.cache/parso on Linux. Any Python interpreter started by the user the web server is running with will also use usercustomize.py from the same $HOME directory. Correct me if I'm wrong.

@dhondta
Copy link
Author

dhondta commented Jun 25, 2019

@habnabit Obviously, if the developer handles its cache folder in application's workspace and this is reachable by end-users (which could frequently be the case in the wild), it's then vulnerable. Of course, it could be mitigated by extra hardening measures but it's always better to fix such an issue at the source instead of relying on measures that could potentially be applied.

@szuliq Again, $HOME is irrelevant here. We're talking about application's workspace, whatever the application and under the control of its sysadmin. The vulnerability I pointed out is on the side of the end-user, who has no access to the Python interpreter and no write access to $HOME.

Also, remember that, as @davidhalter mentioned about the performance, even if caching is not the default behavior, any developer wanting to leverage this will start calling feature's functions, i.e. grammar.parse(path=[...], cache=True, cache_path=[...]) (overwriting the default path you mentioned) and could collide both paths, which combined with other aforementioned conditions will leave the application vulnerable.

@habnabit
Copy link

@dhondta is your assertion that an app developer would consciously, intentionally choose to make the parso cache directory the same directory that the web server puts uploaded files?

@dhondta
Copy link
Author

dhondta commented Jun 25, 2019

@habnabit Not considering only Web servers, yes, definitely, for sure if he/she's a developer who is not as skilled as you (given the quality of your repositories, that I think seems to be great).

@mehaase
Copy link

mehaase commented Jun 27, 2019

Hey there 👋 I'm a huge infosec nerd and generally in favor of taking vulnerabilities seriously. I also got a GitHub alert on one of my repos which uses parso as a dependency of a dependency of a dependency—and a dev dependency at that! I don't see a convincing case that this is a legitimate vulnerability, let alone a 7.5/10 CVE (or as GitHub is labeling it, "high severity").

Of course, parso does not connect to any network, there is no source code pointing to any network-related library. But what if parso is used as a library in a project that exposes an interface to remote users ?

This isn't how CVSS is supposed to work, and I believe MITRE has made an error in judgement here. CVSS is for known factors, not hypotheticals. If a downstream project allows a user to provide a malicious pickle over the network, then that is considered a network access vector. But since parso has no network access, then the parso CVE should be considered "local" (which reduces CVSS from 7.5 to 7.0).

@dhondta do you consider all uses of Pickle to be vulnerabilities? If not, what is unique about this case?

Any application that allows an adversary to write to uncontrolled paths is going to have a bevy of security ramifications. If the adversary can write to $HOME/.cache/parso, they can probably write to $HOME/.bashrc or any number of other sensitive files to gain arbitrary code execution. It would make more sense to look for CVEs in projects that directly depend on parso (which I assume are relatively few, JEDI and ... ?).

IMO, the proper course of action here is to add a warning in the API docs, request the CVE to be invalidated, and ask GitHub to rescind the thousands of security alerts that have been sent out.

@davidhalter
Copy link
Owner

davidhalter commented Jun 28, 2019

Thanks everybody for shedding light onto this.

I will definitely add warnings to the API docs. I will also add an issue to parso to replace pickle with a better serialization option, but that could take a long time. I will then try to get the CVE rejected and hope that Github stops notifying those 30'000 users. I personally don't think of this as a CVE as long as it's documented. It's totally fine to have a cache with pickle that has to be activated by using cache=True.


@mehaase Thanks for this response, you raised a few interesting points!

@dhondta If it was easy to fix, I would have done so long ago, when you wrote me in February. It's not and that's the problem. There's no good and fast serialization options in Python AFAIK.

It's also not at all comparable to the numpy vulnerability you listed. It's really something different in how people use it numpy.load is something you would expose to pretty much anyone, so I get that this is an issue there, definitely not worth a CVE, but anyway...

@dhondta
Copy link
Author

dhondta commented Jul 8, 2019

@mehaase

  • Like almost everybody, you mix up vulnerability and risk ; while a vulnerability could be present, it does not mean that using the vulnerable library is at risk, you have to consider its likelihood of exploitation and its impact in the balance (cfr formula risk = likelihood x impact, where impact is related to the vulnerability). In this case, as I mentioned before, the likelihood of exploitation should be very (very...) low while not null, giving a risk that is surely very low but not null. This means you can safely use parso in your application if you do not make the mistake of gathering the conditions for the exploitation of the vulnerability.
  • That's a choice of GitHub to propagate an alert for every CVE associated to a repo. I apologize for the inconvenience but I do not step in this decision. Please note however that you probably get this alert because of a (very) poorly processed requirements.txt file in your dependency that loosely results from a (maybe badly filtered ?) pip freeze (keeping parso in its list of required modules while it is even not imported), unless I'm mistaken and that most of Python projects effectively use a parser (which would be very surprising)... I think this is a bad development practice but we can find lots of projects made this way.
  • Also, MITRE is free to assess its metrics related to any reported issue. Once again, I do not step in this process. I provided a scenario, mentioned it is very rare to find in the wild (but yet not impossible), and I provided a proof-of-concept. That's all. The rest is their business.
  • To get back to Pickle (again and again), no, it's not a bug, it's a feature, that's the way it works. And the vulnerability is not in the use of Pickle itself, but the computation of parameters that lead to a pickle.load and that depend on values that could eventually be left for tuning to end-users by some developers using parso. For instance, let us assume that a Web application uses a well-known CMS and that a SQLi is discovered, using parameters that are left for tuning to the "developer" of the Web app using the CMS ; you fix neither SQL nor the Web app, but well the vulnerable module of the CMS. It's the same here.
  • Moreover, regarding $HOME, please (re)read (again) the previous messages of this conversation. Your remark, as by others before, is not that relevant, I think.

@davidhalter

  • For the API docs, that's a possible reaction ; you can simply accept the risk and tell the developers using parso to use the related feature with caution (that's namely the case in logging where a Python code injection vulnerability exists, but that won't be fixed as the lead developer estimates it is sufficient to put a warning in the documentation). However, it does not remove the vulnerability. But that's a possible choice.
  • Also, as a possible improvement for GitHub, it could be considered to generate alerts based on the risk and not the vulnerability itself. In this case, as I mentioned multiple times before, this CVE would give a very (very...) low risk whatever its impact calculated by MITRE given its estimated likelihood, which could then not be worth an alert on every related repository.

I hope these arguments clarify things...

@sten0
Copy link

sten0 commented Jul 9, 2019

@jtrakk wrote:

When python-rope had a pickle deserialization vulnerability a while back, they addressed it with signature verification.

@dhondta, @davidhalter, Why hasn't signature verification been discussed/considered? I'm wondering why a similar approach is not viable :-) eg: All existing users of cache=True get a new implicit default of verified_cache=True. If that impacts performance too much, the onus is on dependent packages to audit their code to verify that their use of the cache is safe (via the proposed API docs), and after the audit they can enable verified_cache=False. This supports what @dhondta wrote:

This means you can safely use parso in your application if you do not make the mistake of gathering the conditions for the exploitation of the vulnerability.

If a dependant package enables verified_cache=False under unsafe conditions that wouldn't in any way be Parso's fault, and it seems to me that any potential CVEs would be for those projects and not Parso.

@sten0
Copy link

sten0 commented Jul 9, 2019

P.S. The objective is automatic mitigation for existing projects

@habnabit
Copy link

habnabit commented Jul 9, 2019

@sten0 the python-rope vulnerability was because pickled data was sent over the network, by python-rope. I don't think its mitigation is relevant (or useful in this case..), especially because it was fundamentally a different issue.

@dhondta
Copy link
Author

dhondta commented Jul 10, 2019

[...] Please note however that you probably get this alert because of a (very) poorly processed requirements.txt file in your dependency that loosely results from a [...] pip freeze [...]

@mehaase In order to illustrate further what I told before, you can try this Google search and look at the results : parso intitle:requirements.txt site:github.com...

@davidhalter
Copy link
Owner

davidhalter commented Jul 10, 2019

@sten0 Where would you store the signature? If the signature is somewhere in your filesystem, it doesn't really change anything. Note however that I don't think that this is a vulnerability.

I have documented in 19de3eb that parso uses pickle files when you enable the cache, which is totally fine and with that I consider this issue solved.

Opened #79 to replace pickle, but this will probably not happen very soon.

loeiten added a commit to loeiten/coursera_advanced_machine_learning that referenced this issue Aug 19, 2019
loeiten added a commit to loeiten/coursera_advanced_machine_learning that referenced this issue Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests