Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

font_manager.py takes multiple seconds to import #4756

Closed
ssanderson opened this issue Jul 22, 2015 · 11 comments
Closed

font_manager.py takes multiple seconds to import #4756

ssanderson opened this issue Jul 22, 2015 · 11 comments
Milestone

Comments

@ssanderson
Copy link

This is brutal amount of latency when trying to provide a smooth user experience from an application that uses matplotlib internally.

In [1]: %prun -r -s cumtime -l 20 import matplotlib.pyplot

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002    2.461    2.461 <string>:1(<module>)
        1    0.001    0.001    2.413    2.413 pyplot.py:17(<module>)
        1    0.001    0.001    2.392    2.392 colorbar.py:20(<module>)
        1    0.001    0.001    2.380    2.380 collections.py:10(<module>)
        1    0.002    0.002    2.374    2.374 backend_bases.py:28(<module>)
        1    0.001    0.001    2.365    2.365 textpath.py:3(<module>)
        1    0.001    0.001    2.362    2.362 font_manager.py:21(<module>)
        1    0.000    0.000    2.361    2.361 font_manager.py:952(pickle_load)
        1    0.003    0.003    2.361    2.361 {cPickle.load}
        1    0.001    0.001    2.358    2.358 font_manager.py:1013(__init__)
        2    0.098    0.049    2.275    1.137 font_manager.py:544(createFontList)
      215    0.000    0.000    2.149    0.010 afm.py:341(__init__)
      215    0.001    0.000    2.148    0.010 afm.py:323(parse_afm)
      215    0.001    0.000    1.443    0.007 afm.py:292(_parse_optional)
      182    0.979    0.005    1.434    0.008 afm.py:220(_parse_kern_pairs)
      215    0.374    0.002    0.690    0.003 afm.py:180(_parse_char_metrics)
  1298020    0.229    0.000    0.229    0.000 {method 'split' of 'str' objects}
   839978    0.157    0.000    0.157    0.000 {method 'readline' of 'file' objects}
   114276    0.087    0.000    0.107    0.000 afm.py:72(_to_list_of_floats)
   843180    0.092    0.000    0.092    0.000 {method 'startswith' of 'str' objects}
        4    0.000    0.000    0.082    0.021 font_manager.py:294(findSystemFonts)q
@ssanderson
Copy link
Author

It's also worrying that just importing matplotlib.pyplot causes a read of a pickle file from the filesystem. This would make me nervous about using matplotlib in a security-sensitive environment.

@tacaswell
Copy link
Member

That is the font cache which is there to save even more time from having to search the system for fonts.

@tacaswell tacaswell added this to the proposed next point release milestone Jul 22, 2015
@WeatherGod
Copy link
Member

He is right, though... if someone were to swap out that font cache pickle
with a malicious pickle, matplotlib becomes a vector for security issues.
Also, why is the font manager still going through a font-listing exercise
if it has the cache?

On Wed, Jul 22, 2015 at 10:55 AM, Thomas A Caswell <notifications@github.com

wrote:

That is the font cache which is there to save even more time from having
to search the system for fonts.


Reply to this email directly or view it on GitHub
#4756 (comment)
.

@ssanderson
Copy link
Author

Would it be possible to use a less general serialization format (e.g. JSON) for font caching?

@tacaswell
Copy link
Member

I am also confused by that, I thought loading from a pickle side-stepped normal __init__ path and just set the dictionary of the new object.

And yes, we probably should move to a (what I would have called a more general) serialization which stashes a bit more of the stuff that is being parsed out of the AFM files which will be a win on all accounts.

@ssanderson
Copy link
Author

I meant less general in the sense that matplotlib has to provide more information about how to actually serialize the objects in question, rather than relying on pickle to serialize arbitrary python objects.

@pelson
Copy link
Member

pelson commented Aug 27, 2015

It's also worrying that just importing matplotlib.pyplot causes a read of a pickle file from the filesystem. This would make me nervous about using matplotlib in a security-sensitive environment.

If a malicious user has access to the filesystem, then importing anything is a security concern...
The cache isn't something that users pass around to one another - it is a thing created by mpl for each individual user, so there isn't really a "use this cache" sharing type vector. Essentially, what I'm trying to say is that I agree, pickle files are insecure, but if you can do:

cat malicious_pickle.pkl > ~/.cache/matplotlib

You can also do:

echo "import os; os.system('echo bad things')" > ~/.local/lib/python2.7/site-packages/os.py

There may well be a strong argument for a custom serialisation though - particularly if there is a standard form for font caching that we can make use of.

@ssanderson
Copy link
Author

but if you can do:
cat malicious_pickle.pkl > ~/.cache/matplotlib
You can also do:
echo "import os; os.system('echo bad things')" > ~/.local/lib/python2.7/site-packages/os.py

@pelson I agree with this assessment for users who are running Python on their local machine and administering their own environments. I'd add, however, that one important distinction between writing to ~/.cache/matplotlib vs site-packages is that, more or less by definition, the user has write access to their home directory, whereas they may not have write access to their python distribution.

In particular, if you're deploying a server on the public internet that uses python, or if you're administering a shared machine with untrusted users, it might be reasonable security practice to enforce that your site-packages are read-only for users who are actually executing code, precisely for the reason you describe.

@tacaswell
Copy link
Member

Will mpl try to write the font cache to system level site-packages? mpl is deployed to google app engine and they run in a very locked down environment. I wonder how they deal with this.

I think the two things that need to be done here:

  1. make sure mpl will look for a read-only cache as a fallback
  2. use something other than a pickle for storing the cache

Given the other issues we seem to have with the font cache not going smoothly across updates, this should probably be reasonably high priority for the next point release.

@mdboom
Copy link
Member

mdboom commented Aug 31, 2015

Will mpl try to write the font cache to system level site-packages?

No. Caches never go in Python source directories.

mpl is deployed to google app engine and they run in a very locked down environment. I wonder how they deal with this.

They just don't use a cache, but regenerate the font directory each time. This is less of an issue on GAE because very few fonts other than the built-in matplotlib ones are available. See #1824.

make sure mpl will look for a read-only cache as a fallback

The issue there is that the font cache is user environment-specific. We can't really provide a read-only one unless we limit the set of fonts to the ones that ship with matplotlib.

use something other than a pickle for storing the cache

It's a really simple data structure -- JSON would work just fine for this purpose. (The matplotlib font cache predates JSON itself, let alone its inclusion in the Python stdlib, so that wasn't an "easy" option at the time).

It will probably still require some versioning, but JSON at least avoids most of the security concerns (outside of exploiting the occasional bugs in Freetype opening malformed font files).

@jenshnielsen
Copy link
Member

We are now warning about the regeneration and using a json cache. I think this can close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants