-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TTFont() constructor always loads the entire file into memory #482
Comments
Later I realized it's not just sniffing I'm worried about: I wrote scripts that go through an entire library and finds and reports "problems" in light-weight tables, say, name table, head, OS/2. I'm pretty sure that would be noticably slower with today's fonttools. |
Hi Just, You must be referring to this commit: The reason we did that was to be able to save a TTFont to the same input file (overwrite): Sure, we could skip loading the whole font in memory when lazy==True, like @behdad suggests. I'm not sure I understood the point about of the lack of sniffing api which would force us to try loading the entire font in memory. Are you perhaps referring to the https://github.com/behdad/fonttools/blob/master/Lib/fontTools/ttLib/__init__.py#L149 That occurs even before we wrap the file in a BytesIO stream. Of course, I understand why we need a sniffing api in general (we could build on |
The connection to sniffing was that I (perhaps naively) sometimes just attempt to create a TTFont() instance to see if a particular file is indeed a file TTFont can handle. But don't worry, the new 'lazy' behavior solves those needs and more. |
As discussed here: #580 (comment) Before: $ python -m timeit 'from fontTools.ttLib import TTFont; TTFont("sazanami-gothic.ttf")' 10 loops, best of 3: 66.9 msec per loop After: $ python -m timeit 'from fontTools.ttLib import TTFont; TTFont("sazanami-gothic.ttf")' 10000 loops, best of 3: 110 usec per loop That's a 600x speedup! Fixes #482 HOWEVER, it reintroduces #302 Or worse, we'll crash when overwriting: $ cp Lobster.ttf t.ttf $ ./ttx -o ./t.ttf ./t.ttf Dumping "./t.ttf" to "./t.ttf"... Dumping 'GlyphOrder' table... Bus error (core dumped) IMO we should fix this by changing both XML and font output routines to, instead of calling open, use os.tmpfile(), create output in a new file and then rename it to the final destination. It has the benefit of not leaving a half-written output file behind if an exception occurs. The tempfile.NamedTemporaryFile also comes handy. While checking those out, tempfile.SpooledTemporaryFile also comes handy, when it's available, to replace BytesIO in the following part of TTFont.save(): # write to a temporary stream to allow saving to unseekable streams tmp = BytesIO() Looks like we need to start misc.fileTools to abstract the details away.
As discussed here: #580 (comment) Before: $ python -m timeit 'from fontTools.ttLib import TTFont; TTFont("sazanami-gothic.ttf")' 10 loops, best of 3: 66.9 msec per loop After: $ python -m timeit 'from fontTools.ttLib import TTFont; TTFont("sazanami-gothic.ttf")' 10000 loops, best of 3: 110 usec per loop That's a 600x speedup! Fixes #482 HOWEVER, it reintroduces #302 Or worse, we'll crash when overwriting: $ cp Lobster.ttf t.ttf $ ./ttx -o ./t.ttf ./t.ttf Dumping "./t.ttf" to "./t.ttf"... Dumping 'GlyphOrder' table... Bus error (core dumped) IMO we should fix this by changing both XML and font output routines to, instead of calling open, use os.tmpfile(), create output in a new file and then rename it to the final destination. It has the benefit of not leaving a half-written output file behind if an exception occurs. The tempfile.NamedTemporaryFile also comes handy. While checking those out, tempfile.SpooledTemporaryFile also comes handy, when it's available, to replace BytesIO in the following part of TTFont.save(): # write to a temporary stream to allow saving to unseekable streams tmp = BytesIO() Looks like we need to start misc.fileTools to abstract the details away.
Closing in favor of #2252 |
@justvanrossum pointed out today, that reading the entire font into memory can be visibly inefficient, specially since we don't have any sniffing api, so trying to load the font is the only way.
Indeed, we knew that we want to fix this later when we have proper options infrastructure, but just filing here so we keep track. Maybe we can skip the read if lazy=True at least?
Independently, we should have sniffing API that returns font-type of a file.
The text was updated successfully, but these errors were encountered: