Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ttx] Option to preserve PUSHB/PUSHW etc #92

Closed
behdad opened this issue Jan 19, 2014 · 24 comments
Closed

[ttx] Option to preserve PUSHB/PUSHW etc #92

behdad opened this issue Jan 19, 2014 · 24 comments

Comments

@behdad
Copy link
Member

behdad commented Jan 19, 2014

From Werner:

"""
[FontTools-2.4-696-gfea81ee]

I'm not happy that the TrueType instruction dump in ttx no longer
shows the difference between PUSHB and PUSHW. I agree that this
information isn't relevant to the functionality, however, it's a
valuable information while debugging and comparing dumps.

Similarly, it is quite helpful to see the formats of tables (Coverage,
GlyphClassDef, etc.)

Can you revert that change? Or maybe add an option to ttx which
increases the verbosity of table dumps in general.
"""

behdad added a commit that referenced this issue Jan 19, 2014
And avoid merging consequent pushes.  No option to control this
externally right now, but the logic is there.

Part of #92
@avpanov
Copy link

avpanov commented Dec 2, 2014

Please, revert the behavior of FontTools-2.4. Translation of the PUSHB, PUSHW, NPUSHB and NPUSHW instructions to non-existing PUSH one makes ttx incompatible with xgridfit which utilizes fonttols in some cases.

@behdad
Copy link
Member Author

behdad commented Dec 2, 2014

Can you please explain exactly how xgridfit uses FontTools?

@avpanov
Copy link

avpanov commented Dec 2, 2014

@schriftgestalt
Copy link
Contributor

I agree that this information isn't relevant to the functionality,

In the case there is a JMPR command, the difference is indeed vital to the fictionality of the code. Compiling the instruction again will change the length of the code and thous invalidates the jump address. This will break all TTF generated with Glyphs.

@schriftgestalt
Copy link
Contributor

And an option to switch of the optimisation is not sufficient as everybody is using it to import/export the fonts with ttx will get broken fonts and will not know to set that flag. Maybe the other way around, add an option to optimise (and check if there is a jump command).

@lemzwerg
Copy link

lemzwerg commented Mar 9, 2016

Indeed, I haven't thought of JMPR.

On the other hand: Making ttx squeeze a sequence of PUSHB and PUSHW as much as possible would be a good thing as an optimization, in case it can also properly adjust JMPR offsets.

@schriftgestalt
Copy link
Contributor

If someone figures out how to fix the jump offsets I’m more than happy with the optimisation.

@behdad
Copy link
Member Author

behdad commented Mar 10, 2016

Ok I definitely had not thought about JMPR instruction.

Note that the original "optimize at write time" was done before my time. What I did was change the bytecode to reflect that. Ie. before, bytecode was being corrupted silently at write time!

I think we should disable this feature and enable it as an optimization option somehow.

Fixing the jump offset is NOT impossible, but a lot more work than this feature entails. I suppose maybe we can still do this for the burst of push operations at the beginning of each bytecode string but never in the middle?

@lemzwerg
Copy link

Yes. It shouldn't be too difficult to scan for JMPR instructions to find out the negative offset where PUSHX optimization has to stop. Or, as a starter, simply refrain from PUSHX optimization as soon as a JMPR instruction is found.

@justvanrossum
Copy link
Collaborator

Wow, it seems FontTools has indeed been broken for long time... (I never thought of JMPR, either.) And indeed, disabling the optimization when a JMPR instruction is there makes sense.

@schriftgestalt
Copy link
Contributor

Or, as a starter, simply refrain from PUSHX optimization as soon as a JMPR instruction is found.

I suppose maybe we can still do this for the burst of push operations at the beginning of each bytecode string but never in the middle?

You need to disable optimization before the jump target. The jump address is an absolute code address. It is only safe after the the biggest jump address. Because that is not known at the beginning of the parsing, the optimization (and potentially the offset corrections) need to be done in a second step. First parse with preserve=True and then run the optimization in a second step. For now, it could just disable the optimization run if a jump was found. The jump will most likely be used in the prep or fpm but not in the instructions per glyph. So "lost" space saving is minimal.

@lemzwerg
Copy link

Well, addresses for jumps might be stored in registers or in CVT. In other words, you need a real bytecode compiler that considers all branches before optimization is really possible.

@anthrotype
Copy link
Member

thanks @schriftgestalt for catching this bug!
If any of you have an idea how to keep the PUSH optimization without breaking the JMPR instructions, please send a pull request. In the meantime, I think it'd safer to just disable the optimization altogether.
Shall we set preserve=True by default for now?

@justvanrossum
Copy link
Collaborator

Btw. I'm not happy with the optimization happening in the decompile phase. That way you'll never see what changed if you round-trip through TTX. Yes, earlier, FontTools was already broken with its previous PUSHB and PUSHW optimization, but at least one could see if a font had changed when recompiled. So the current situation is worse in that it silently breaks things and makes it impossible to detect.

This is somewhat similar to the recent changes with the Format thing in the OTL subtables. Sure, the compiler may decide with format to generate, but by omitting the Format field in the TTX output one can no longer see if something has changed since recompilation. Which was a main purpose of TTX to begin with.

@justvanrossum
Copy link
Collaborator

I think the differences in our views can be summarized as follows:

  1. TTX is an instruction for how to compile a font (Behdad)
  2. TTX is a truthful dump from an existing font (Just)
    Of course it's a bit of both, but I think we're losing important (debugging/analysis) functionality if we move away too much from 2 through abstraction.

@justvanrossum
Copy link
Collaborator

I want people to be able to use TTX to debug other tools. What did font editor XYZ emit? PUSHB or PUSHW? Which OTL subtable format did it generate? That has always been a core feature of TTX and I'd hate to lose that. These subtleties need not always be round-trippable by FontTools (because of optimizations or other reasons) but I want to be able to detect them. Sure, FontTools has not always been consistent with that view (eg. the 'loca' offsets have never been dumped to TTX), but I think the PUSHB/PUSHW abstraction takes TTX to an abstraction level that is too high.

@behdad
Copy link
Member Author

behdad commented Mar 15, 2016

The jump address is an absolute code address.

I just checked. The jump address is a relative offset.

@twardoch
Copy link
Contributor

I am with Werner and Just on the “philosophical” level. TTX has been an important debugging tool in the past.

When Adobe, Apple, Mcrosoft, Tiro and myself were, in 2001, working on an alternative XML representation of SFNT during a two-day sprint in Redmond (one that got fully specified with an XSD schema but never really got any tools implemented apart from some portions of Apple's ftx tools), we had a big debate on whether fields in SFNT tables like head or OS/2 should be stored as attributes of a big XML element or as separate XML elements.

Microsoft’s Paul Nelson pushed for attributes while I argued for elements (actually I proposed just using TTX). I was overruled. Of course that notation never got properly, widely adopted while TTX has, even in AFDKO (despite Adobe’s Eric Muller being at that meeting back then). One of the “I told you” moments for me.

The reason I argued for elements was that the TTX notation stored the sequence of the fields which was useful because it reflected the SFNT structure more closely, so it was more low-level and allowed a person to follow the SFNT more closely, be educated about SFNT, and find problems easier. Code that was to be written could emit a partial table on decompilation and then fail, and you as a person could find the place of the flaw easier. And elements could hold their own optional attributes including those from other namespaces, which would be more future-proof. I argued all those points.

Paul backed the solution where any unique fields only needed to be expressed as attributes of a larger element (think <head version="1.0" unitsPerEm="2048" xMax="986" macStyle="0"> etc.). We had a very heated debate on this :) He named certain advantages of the approach. He mostly made the point that the structure was for authoring, and as such, the order of the fields within a table was not significant in the source notation, and that a tool should take care of it.

Basically, Paul wanted a smarter tool and a simpler notation, I wanted a more detailed notation and a dumber tool.

In the end, I think TTX prevailed not only because it was already written and was implemented in Python (rather than some C#), but also because Just’s original design of the format was the “just right” compromise between the “low-level debugginess” and “portability” of the TTX format.

Just’s format sits between the needs of super-clever engineers like Behdad who may prefer a simpler format because they can do super-clever tools, and of designers/coders like myself who can do copy-paste and learn simple programming techniques but often need simple “serialized” dumps of data to make sense of what’s going on. I think Just designed TTX just right because he’s “not a rocket scientist” if you know what I mean.

Also — after >10 years of the virtual domination of FontLab 4.x/Studio 5 as a font-building tool, where most .otfs and .ttfs were built by just one app, we now have a multitude of libraries for font building and manipulation, i.e. a multitude of potential binary outputs from seemingly the same “sources”. The debugging functionality of TTX is just as useful now as it was useful when Just originally wrote it.

@justvanrossum
Copy link
Collaborator

Any progress on this? I actually think the PUSH optimizations should be removed completely (possibly reimplemented as a separate tool), and that the instruction disassembly should again show exactly what's in the bytecode.

@schriftgestalt
Copy link
Contributor

I have send a pull request.

@justvanrossum
Copy link
Collaborator

Merged. Thanks Georg. Leaving the issue open as the preserve=False route is still broken.

@justvanrossum
Copy link
Collaborator

Also: the issue of the Format attribute of OTL subtables has not yet been addressed. I propose to delete all lines

del self.Format # Don't need this anymore

in otTables.py. I can work on a pull request.

@behdad
Copy link
Member Author

behdad commented Mar 21, 2016

A pull request to removw the del format change lgtm, so feel free to merge.

I really want to expand on the design choices we make on this, but I've been busy and then on vacation. Will try to get my perspective in soon but definitely don't want to block anyone.

justvanrossum added a commit that referenced this issue Mar 21, 2016
…the compiler but is useful for debugging. Part of fixing #92.
@justvanrossum
Copy link
Collaborator

Thanks, committed as ad386ee. I was a little surprised how little tests needed to be fixed (just one), I hope I didn't overlook anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants