Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fragile macro support (textcase.sty) #932

Closed
Grief opened this issue Jan 17, 2018 · 16 comments

Comments

@Grief
Copy link

@Grief Grief commented Jan 17, 2018

When textcase.sty is included with --includestyles option, conversion fails with the following errors:

Error:expected:{ Expected <general text> here
	at textcase.sty; line 49 col 59
	Next token is T_OTHER[1]
	In Core::Definition::Expandable[\upperca... from TeX.pool.ltxml line 2097
	 <= Core::Gullet[@0x55e35ac94910] <= Core::Definition::Primitive[\edef Ski... <= Core::Stomach[@0x55e35a917868] <= ...

Fatal:misdefined:\MakeTextUppercase Expansion of '\MakeTextUppercase' has unbalanced {}
	at textcase.sty; line 49 col 59
	Expansion is \begingroup\def\i{I}\def\j{J}\def\({$}\let\)\(\def#1{\NoCaseChange{#1}}
	\def\label#1{\label{#1}}\def\ref#1{\ref{#1}}\def\protect\@ensuremath#1{\protect\@ensuremath{#1}}\def\@@cite[cite]{[\@@bibref{Refnum}{#}{}{}]}1#{\toks@{\cite#1}\@citex}\def\@citex#1{{#1}}\def\reserved@a#1#2{\let#1#2\reserved@a}\reserved@a\oe\OE\o\O\ae\AE\dh\DH\dj\DJ\l\L\ng\NG\ss\SS\th\TH\reserved@b{\reserved@b\let\@@protect\protect\let\protect\@unexpandable@protect\afterassignment\let\protect\@@protect\edef\reserved@a{\endgroup\@skipmath#1$\valign$}\reserved@a
	In Core::Definition::Primitive[\edef Ski... from TeX.pool.ltxml line 1142
	 <= Core::Stomach[@0x55e35a917868] <= Core::Gullet[@0x55e35ac94910] <= Core::Definition::Constructor[\Requir... <= Core::Stomach[@0x55e35a917868] <= Core::Gullet[@0x55e35ac94910] <= Core::Definition::Constructor[\docume... <= Core::Stomach[@0x55e35a917868]
1 error; 1 fatal error
@dginev dginev added this to the LaTeXML-0.8.4 milestone Jan 17, 2018
@dginev

This comment has been minimized.

Copy link
Collaborator

@dginev dginev commented Jan 17, 2018

Should I assume this is a request for us to add support for textcase.sty?

A quick look at the error you've pasted may suggest that the \uppercase macro binding in TeX.pool isn't playing well with the raw latex definitions

@Grief

This comment has been minimized.

Copy link
Author

@Grief Grief commented Jan 17, 2018

I might be wrong but as far as I understand, --includestyles does some parsing of additional .cls and .sty files and applies definitions from these files the same as if they were ltxml bindings. At least some of styles and classes work that way. I think that there are tons of different classes and styles in the world and adding all of them via bindings is too huge task to implement. For me, it would be ok if these styles and classes won't lead to fatal errors at least. Even if the macro won't work (I'd be glad if it is) it should provide some output at least.

@dginev

This comment has been minimized.

Copy link
Collaborator

@dginev dginev commented Jan 17, 2018

Yes, we are completely on the same page here. The viable way to cover as big a subsection of LaTeX as possible would be to improve the native raw TeX coverage to the point we can ingest all latex packages and classes without any issues - or as you say at least without errors.

For now we also add an explicit .ltxml bindings to LaTeX packages we have tested, where the binding itself loads the TeX directly, so that we mark them as "supported" in a more obvious way. See for example t1enc.sty.ltxml

So I am still leaning to interpret this issue as "improve latexml so that textcase.sty works"

@Grief

This comment has been minimized.

Copy link
Author

@Grief Grief commented Jan 17, 2018

"improve latexml so that textcase.sty works" is fine. I'd prefer to see it works without the binding but if it's not possible then ok, bound solution would be good enough too. Sorry, I'm into neither TeX nor Perl so LaTeXML is mostly just a magic black box for me. What can I add is that we have more than a thousand custom classes and styles (including some hard legacy specimens) most of them doesn't work automatically which is very very sad. I tried to implement some binding but for me it appears to be too hard. During my frustration period I found the class which is most actively used in the documents I have and raised the current issue against the problem with it because I cannot do much here. You did a lot of work guys, but a huge amount of work still remains. Hope you'll not abandon the development. Cheers!

@dginev

This comment has been minimized.

Copy link
Collaborator

@dginev dginev commented Jan 17, 2018

We are not abandoning anything, to the contrary! 👍

Feel free to share other hard to interpret cases, we will try to add support as we can, very valuable to have isolated examples that fail, as the usual situation with latex packages is that you need to unravel a very substantial execution chain to find the exact breaking point (especially if we are talking about a badly implemented primitive in latexml).

And thanks for the details of your use case!

@dginev

This comment has been minimized.

Copy link
Collaborator

@dginev dginev commented Feb 26, 2018

Making some progress in diagnozing the issue here, this is actually a very well-contained package for debugging.

Just wanted to remark that a binding will likely be a lot more reliable than using the native implementation, because it is quite beastly. As one example that made me think of Rube Goldberg machines:

The $\valign$ is just a fake math expression used to terminate the parsing done by @skipmath.

textcase.pdf docs

I am looking for a generic interpretation fix though.

@dginev

This comment has been minimized.

Copy link
Collaborator

@dginev dginev commented Feb 26, 2018

There are two sticking points here after debugging:

  1. The GeneralText parameter type is too restrictive for this use of uppercase and lowercase. When wrapped around, the braces are often not readily available, and moreover single token invocations do work, such as \uppercase s. Let's say I can fix this by making the parameter type a bit more permissive.

  2. More mysteriously, there is a use of edef I can't quite understand.

\protected@edef\MakeTextUppercase#1{\MakeTextUppercase{#1}}

pdflatex is more than happy to interpret this, and go ahead, but latexml tries to expand the inner #1 and crashes (again in the GenericText parameter type). I need to read up on this type of edef use, but if anyone has tips, they are most welcome.

Edit: It is not a generic deficiency, I think this is specific to this macro.
Edit 2: It's really related to the recursive invocations via \reserved@a and \reserved@b that are terminated by \@gobble, and I'm trying to carefully figure out the details without drowning now.

@dginev

This comment has been minimized.

Copy link
Collaborator

@dginev dginev commented Feb 26, 2018

Some more diagnostic progress, there seems to be an interesting expansion bug, in arguably the most complex line of the textcase style:

      \expandafter\reserved@a\@uclclist\reserved@b{\reserved@b\@gobble}%

In particular, and I have verified this, there is an active invocation of \@gobble on the closing brace as its first argument. This is quite peculiar, and has to be due to the expansion machinery specific to these recursive unwindings, as an outright direct \@gobble{}} invocation returns the obvious error right away.

Will inspect further.

@brucemiller

This comment has been minimized.

Copy link
Owner

@brucemiller brucemiller commented Feb 26, 2018

@dginev

This comment has been minimized.

Copy link
Collaborator

@dginev dginev commented Feb 26, 2018

Thanks for the tip! Reading into it some more, it seems in latexml we currently have \protect to always map to \relax, while the SO thread you linked to claims the definition tends to vary with execution context.

That said, I added a few \show\protect calls to the .sty file and reran my test with pdflatex, and it prints \relax each time. Maybe there is something more mischievous going on, but the basic protect-related infrastructure seems to be matching the latex definitions. I will carefully follow the execution chain, see where it diverges.

@brucemiller

This comment has been minimized.

Copy link
Owner

@brucemiller brucemiller commented Feb 26, 2018

Actually, \protected@edef redefines \protect, so within the expansion of the definition, it should be a more protected \protect. Was just rereading the stackexchange page, and the referenced one about moving arguments. I'm trying to figure out how to create a test for LaTeXML that would correctly distinguis between fragile, robust, moving, etc. Clearly Expand() does the expansion without the execution (until later).

@dginev

This comment has been minimized.

Copy link
Collaborator

@dginev dginev commented Feb 26, 2018

It's quite remarkable the amount of rope we need to do nothing, but keep the rest of the boilerplate unharmed.

I can mostly follow what you're looking into, but it sounds like we need a machinery upgrade rather than a simple fix, which is what I was hoping for when I took a sneak-peek here. Good to have it all hashed out!

@dginev dginev changed the title textcase.sty is unusable with --includestyles Improve fragile macro support (textcase.sty) Apr 14, 2018
@dginev

This comment has been minimized.

Copy link
Collaborator

@dginev dginev commented Apr 14, 2018

This was a very illuminating demonstration of the concept, leaving the link here - also nice for test cases:
https://tex.stackexchange.com/a/4747/78008

@dginev

This comment has been minimized.

Copy link
Collaborator

@dginev dginev commented Apr 14, 2018

I ended up reading a bit more into this, since it is such a minimal and elegant issue. I think that LaTeXML may have most of the relevant latex.ltx machinery already available in the latex.pool file, and the big omission is really the declaration of robust macros. Namely, we have:

# Need to figure out exactly what `robust' means to LaTeXML...
DefPrimitive('\DeclareRobustCommand OptionalMatch:* DefToken [Number][]{}', sub {
    my ($stomach, $star, $cs, $nargs, $opt, $body) = @_;
    DefMacroI($cs, convertLaTeXArgs($nargs, $opt), $body); });

which needs to be upgraded to a version that is closer to the native definition, and specifically a version that uses the \protect macro as part of the inner expansion. The raw version from latex.ltx boils down to:

\def\declare@robustcommand#1{%
   \ifx#1\@undefined\else\ifx#1\relax\else
      \@latex@info{Redefining \string#1}%
   \fi\fi
   \edef\reserved@a{\string#1}%
   \def\reserved@b{#1}%
   \edef\reserved@b{\expandafter\strip@prefix\meaning\reserved@b}%
   \edef#1{%
      \ifx\reserved@a\reserved@b
         \noexpand\x@protect
         \noexpand#1%
      \fi
      \noexpand\protect
      \expandafter\noexpand\csname
         \expandafter\@gobble\string#1 \endcsname
   }%
   \let\@ifdefinable\@rc@ifdefinable
   \expandafter\new@command\csname
      \expandafter\@gobble\string#1 \endcsname
}
@dginev

This comment has been minimized.

Copy link
Collaborator

@dginev dginev commented Apr 26, 2018

I did some more investigating into the raw interpretation of etoolbox.sty (per #478 ), and I think the protected declarations also play a part in the definitions there not getting correctly interpreted by latexml.

So I am tempted to claim that a proper solution here would be a prerequisite to enable etoolbox.

@brucemiller

This comment has been minimized.

Copy link
Owner

@brucemiller brucemiller commented Apr 26, 2018

Not to spoil your fun, but it ends up boiling down to two issues. The first is doing \DeclareRobustCommand, correctly. What's hidden away in that TeX code above is that it first defines a macro whose cs is the requested one plus a space! Then it defines the cs to call the weird one prepended with `\protect'. Actually kind of simple, once you see it.

But it turns out that \uppercase and '\lowercase`, although they read and "return" tokens, are not in fact expandable; they're primitives. And that affects when they get processed as well.

With those two patched, we can interpret textcase; I went ahead and made a binding anyway.

Aside: it's probably true that these fixes would help interpreting etoolbox, but there are some tools in there that would still need changes/support from LaTeXML; for example, adding various hooks to document and other environments.

@dginev dginev modified the milestones: LaTeXML-0.8.4, LaTeXML-0.8.3 May 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.