Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTFont() constructor always loads the entire file into memory #482

Closed
behdad opened this issue Jan 23, 2016 · 4 comments
Closed

TTFont() constructor always loads the entire file into memory #482

behdad opened this issue Jan 23, 2016 · 4 comments

Comments

@behdad
Copy link
Member

behdad commented Jan 23, 2016

@justvanrossum pointed out today, that reading the entire font into memory can be visibly inefficient, specially since we don't have any sniffing api, so trying to load the font is the only way.

Indeed, we knew that we want to fix this later when we have proper options infrastructure, but just filing here so we keep track. Maybe we can skip the read if lazy=True at least?

Independently, we should have sniffing API that returns font-type of a file.

@justvanrossum
Copy link
Collaborator

Later I realized it's not just sniffing I'm worried about: I wrote scripts that go through an entire library and finds and reports "problems" in light-weight tables, say, name table, head, OS/2. I'm pretty sure that would be noticably slower with today's fonttools.

@anthrotype
Copy link
Member

Hi Just,

You must be referring to this commit:
356c923

The reason we did that was to be able to save a TTFont to the same input file (overwrite):
#302

Sure, we could skip loading the whole font in memory when lazy==True, like @behdad suggests.

I'm not sure I understood the point about of the lack of sniffing api which would force us to try loading the entire font in memory. Are you perhaps referring to the macUtils.getSFNTResIndices?

https://github.com/behdad/fonttools/blob/master/Lib/fontTools/ttLib/__init__.py#L149

That occurs even before we wrap the file in a BytesIO stream.
Or maybe you mean in fontTools.ttLib.sfnt.SFNTReader class, where we do "sniff" the first four bytes in order to return either a generic SFNT or a WOFF2 reader?
https://github.com/behdad/fonttools/blob/master/Lib/fontTools/ttLib/sfnt.py#L31

Of course, I understand why we need a sniffing api in general (we could build on guessFileType function in ttx module). It's just I didn't get the connection with the loading-input-file-in-memory thing.

@justvanrossum
Copy link
Collaborator

The connection to sniffing was that I (perhaps naively) sometimes just attempt to create a TTFont() instance to see if a particular file is indeed a file TTFont can handle. But don't worry, the new 'lazy' behavior solves those needs and more.

behdad added a commit that referenced this issue Apr 17, 2016
As discussed here:
#580 (comment)

Before:
$ python -m timeit 'from fontTools.ttLib import TTFont; TTFont("sazanami-gothic.ttf")'
10 loops, best of 3: 66.9 msec per loop

After:
$ python -m timeit 'from fontTools.ttLib import TTFont; TTFont("sazanami-gothic.ttf")'
10000 loops, best of 3: 110 usec per loop

That's a 600x speedup!

Fixes #482

HOWEVER, it reintroduces #302
Or worse, we'll crash when overwriting:

$ cp Lobster.ttf t.ttf
$ ./ttx -o ./t.ttf ./t.ttf
Dumping "./t.ttf" to "./t.ttf"...
Dumping 'GlyphOrder' table...
Bus error (core dumped)

IMO we should fix this by changing both XML and font output routines to,
instead of calling open, use os.tmpfile(), create output in a new file
and then rename it to the final destination.  It has the benefit of not
leaving a half-written output file behind if an exception occurs.  The
tempfile.NamedTemporaryFile also comes handy.

While checking those out, tempfile.SpooledTemporaryFile also comes handy,
when it's available, to replace BytesIO in the following part of TTFont.save():

	# write to a temporary stream to allow saving to unseekable streams
	tmp = BytesIO()

Looks like we need to start misc.fileTools to abstract the details away.
brawer pushed a commit that referenced this issue Feb 13, 2017
As discussed here:
#580 (comment)

Before:
$ python -m timeit 'from fontTools.ttLib import TTFont; TTFont("sazanami-gothic.ttf")'
10 loops, best of 3: 66.9 msec per loop

After:
$ python -m timeit 'from fontTools.ttLib import TTFont; TTFont("sazanami-gothic.ttf")'
10000 loops, best of 3: 110 usec per loop

That's a 600x speedup!

Fixes #482

HOWEVER, it reintroduces #302
Or worse, we'll crash when overwriting:

$ cp Lobster.ttf t.ttf
$ ./ttx -o ./t.ttf ./t.ttf
Dumping "./t.ttf" to "./t.ttf"...
Dumping 'GlyphOrder' table...
Bus error (core dumped)

IMO we should fix this by changing both XML and font output routines to,
instead of calling open, use os.tmpfile(), create output in a new file
and then rename it to the final destination.  It has the benefit of not
leaving a half-written output file behind if an exception occurs.  The
tempfile.NamedTemporaryFile also comes handy.

While checking those out, tempfile.SpooledTemporaryFile also comes handy,
when it's available, to replace BytesIO in the following part of TTFont.save():

	# write to a temporary stream to allow saving to unseekable streams
	tmp = BytesIO()

Looks like we need to start misc.fileTools to abstract the details away.
@behdad
Copy link
Member Author

behdad commented Aug 19, 2022

Closing in favor of #2252

@behdad behdad closed this as completed Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants