TTFont() constructor always loads the entire file into memory #482

behdad · 2016-01-23T23:36:17Z

@justvanrossum pointed out today, that reading the entire font into memory can be visibly inefficient, specially since we don't have any sniffing api, so trying to load the font is the only way.

Indeed, we knew that we want to fix this later when we have proper options infrastructure, but just filing here so we keep track. Maybe we can skip the read if lazy=True at least?

Independently, we should have sniffing API that returns font-type of a file.

justvanrossum · 2016-01-24T07:38:45Z

Later I realized it's not just sniffing I'm worried about: I wrote scripts that go through an entire library and finds and reports "problems" in light-weight tables, say, name table, head, OS/2. I'm pretty sure that would be noticably slower with today's fonttools.

anthrotype · 2016-01-25T09:39:17Z

Hi Just,

You must be referring to this commit:
356c923

The reason we did that was to be able to save a TTFont to the same input file (overwrite):
#302

Sure, we could skip loading the whole font in memory when lazy==True, like @behdad suggests.

I'm not sure I understood the point about of the lack of sniffing api which would force us to try loading the entire font in memory. Are you perhaps referring to the macUtils.getSFNTResIndices?

https://github.com/behdad/fonttools/blob/master/Lib/fontTools/ttLib/__init__.py#L149

That occurs even before we wrap the file in a BytesIO stream.
Or maybe you mean in fontTools.ttLib.sfnt.SFNTReader class, where we do "sniff" the first four bytes in order to return either a generic SFNT or a WOFF2 reader?
https://github.com/behdad/fonttools/blob/master/Lib/fontTools/ttLib/sfnt.py#L31

Of course, I understand why we need a sniffing api in general (we could build on guessFileType function in ttx module). It's just I didn't get the connection with the loading-input-file-in-memory thing.

As discussed in fonttools#482.

justvanrossum · 2016-01-25T10:54:51Z

The connection to sniffing was that I (perhaps naively) sometimes just attempt to create a TTFont() instance to see if a particular file is indeed a file TTFont can handle. But don't worry, the new 'lazy' behavior solves those needs and more.

As discussed here: #580 (comment) Before: $ python -m timeit 'from fontTools.ttLib import TTFont; TTFont("sazanami-gothic.ttf")' 10 loops, best of 3: 66.9 msec per loop After: $ python -m timeit 'from fontTools.ttLib import TTFont; TTFont("sazanami-gothic.ttf")' 10000 loops, best of 3: 110 usec per loop That's a 600x speedup! Fixes #482 HOWEVER, it reintroduces #302 Or worse, we'll crash when overwriting: $ cp Lobster.ttf t.ttf $ ./ttx -o ./t.ttf ./t.ttf Dumping "./t.ttf" to "./t.ttf"... Dumping 'GlyphOrder' table... Bus error (core dumped) IMO we should fix this by changing both XML and font output routines to, instead of calling open, use os.tmpfile(), create output in a new file and then rename it to the final destination. It has the benefit of not leaving a half-written output file behind if an exception occurs. The tempfile.NamedTemporaryFile also comes handy. While checking those out, tempfile.SpooledTemporaryFile also comes handy, when it's available, to replace BytesIO in the following part of TTFont.save(): # write to a temporary stream to allow saving to unseekable streams tmp = BytesIO() Looks like we need to start misc.fileTools to abstract the details away.

behdad · 2022-08-19T19:26:21Z

Closing in favor of #2252

anthrotype pushed a commit to anthrotype/fonttools that referenced this issue Jan 25, 2016

[ttLib] skip reading the whole file in memory if lazy == True

3f7c67e

As discussed in fonttools#482.

anthrotype mentioned this issue Jan 25, 2016

[ttLib] don't load whole input file in memory if 'lazy' is True #486

Merged

behdad mentioned this issue Apr 17, 2016

[WIP] TTFont: Use mmap if possible #581

Closed

behdad closed this as completed Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTFont() constructor always loads the entire file into memory #482

TTFont() constructor always loads the entire file into memory #482

behdad commented Jan 23, 2016

justvanrossum commented Jan 24, 2016

anthrotype commented Jan 25, 2016

justvanrossum commented Jan 25, 2016

behdad commented Aug 19, 2022

TTFont() constructor always loads the entire file into memory #482

TTFont() constructor always loads the entire file into memory #482

Comments

behdad commented Jan 23, 2016

justvanrossum commented Jan 24, 2016

anthrotype commented Jan 25, 2016

justvanrossum commented Jan 25, 2016

behdad commented Aug 19, 2022