Advanced macro support and magic \dots #794

edemaine · 2017-08-13T23:17:18Z

This is a re-implementation of #599 (\dots support) but extending the MacroExpander as necessary to build it as a macro instead of a function, following @gagern's outline.

Along the way, we get some nice new macro features:

Macros (e.g. given by defineMacro) can now be JavaScript functions, with this bound to the MacroExpander. (Note the slightly different interface from functions, which get a complex context object -- up for discussion!) This lets us extend the MacroExpander in more powerful ways, and use the following features:
future() returns the next token without expansion (similar to \futurelet but without the "let" aspect of assigning to a variation)
expandAfterFuture() returns the next token after one level of expansion (equivalent to a careful use of \expandafter and \futurelet)
\relax is now implemented! It stops expansion, but doesn't actually get returned from the MacroExpander.

We also get fixes to old KaTeX bugs (with new tests to confirm, some of which used to fail):

\text{\textellipsis !} used to include a space in the middle. The trouble is that we were only eating spaces when in math mode, but in text mode we need to be more careful to eat the spaces that make up part of the command name. This is now consumed during macro expansion, even though it's not technically a macro expansion (\textellipsis is an unexpandable character).
Ditto for \text{\foo } where \foo is defined to something -- we got a space, but shouldn't have.
Macros did not handle command arguments correctly. \def\foo#1{(#1)}\foo\bar should expand to (\bar) and then expand \bar, whereas the old macro expander would first expand \bar and then use its tokens for arguments. For example, this didn't work if \def\bar{} or \def\bar{ } (especially the latter because of bad space ignoring). Also, if \foo is a multiargument function, \bar should constitute one argument instead of one per token. (I did a bunch of testing in LaTeX to confirm this behavior.)

A few other miscellaneous changes:

Lexer.js optionally exports Token, so that I can do instanceof tests.
consumeSpaces() irreversibly consumes space tokens. This is used to fix the first bug listed above.
nextToken() got split into expandOnce() and expandNextToken(). So, e.g., expandAfterFuture() is equivalent to expandOnce() and then future(). (This makes for some awkward diffs -- sorry!)
The cdots symbol is now \cdots@, because amsmath defines a more complex \cdots in terms of this symbol.
@ is now considered a valid character for commands, as if \makeatletter is in effect. I don't think we currently define \@ but if we did this change would be annoying.
A rewrite of Implement AMSMath's \dots #599 to handle \dots, \cdots, etc. correctly.

Here's a texcmp confirming I got at least the tested cases right:

kevinbarabash · 2017-08-14T01:11:17Z

@edemaine this is awesome. I don't have a very good understanding of macros so hopefully @gagern and others can help out with the review.

kohler · 2017-08-14T14:02:01Z

src/Lexer.js

@@ -75,14 +75,14 @@ const tokenRegex = new RegExp(
    "([ \r\n\t]+)|" +                                 // whitespace
    "([!-\\[\\]-\u2027\u202A-\uD7FF\uF900-\uFFFF]" +  // single codepoint
    "|[\uD800-\uDBFF][\uDC00-\uDFFF]" +               // surrogate pair
-    "|\\\\(?:[a-zA-Z]+|[^\uD800-\uDFFF])" +           // function name
+    "|\\\\(?:[a-zA-Z@]+|[^\uD800-\uDFFF])" +          // function name


This is going to be dangerous. @ is not actually a valid character for macro nams in default TeX. It only works after \makeatletter, and is intended for internal macros that users don't type.

Overall it is probably safe to include @ by default, or wait for complaints before trying a complete fix; the \@ macro is rare in math.

Good to have at least one negative reaction -- I was curious what others would think. I see three alternatives:

Add a Lexer option for whether to include @ as a letter. Then we could actually support \makeatletter and \makeatother (presumably this would need to be carried in the Settings object...).

Have the cdots macro return manually parsed Tokens instead of strings. Then we don't need to touch the Lexer.

Rename \@cdots to something else like \cdotsINTERNAL or \latexcdots.

Thoughts/preferences?

Incidentally, the regex change won't affect @{...} support in tabular (the only use I know of @ as an active char), nor will it affect the most typical usage of \@, namely, \@. (not that KaTeX currently supports \@). What it would break is if you used \@ immediately followed by a letter, which ... I can't imagine doing. So the other option is to leave this change as is for now.

That is where I came down and for a similar reason: keep @ as is in this commit.

kohler · 2017-08-14T14:03:51Z

src/macros.js

+// \let\DOTSX\relax
+defineMacro("\\DOTSI", "\\relax");
+defineMacro("\\DOTSB", "\\relax");
+defineMacro("\\DOTSX", "\\relax");


amsmath defines these "null macros" for other macros to indicate the behavior of \dots preceding them. Indeed, I should have added use of these macros to \iff, \implies, and \impliedby. See new commit.

kevinbarabash · 2017-08-22T11:42:18Z

I'm going to try to get this reviewed over the next couple of days.

edemaine · 2017-08-22T12:44:36Z

Thanks @kevinbarabash! I just rebased to fix a conflict with the CJK tweak on \dots commands.

kevinbarabash · 2017-08-22T11:07:32Z

test/katex-spec.js

+
+    it("should consume spaces after macro", function() {
+        compareParseTree("\\text{\\foo }", "\\text{x}", {"\\foo": "x"});
+    });


Nice tests. I like how these call out differences in behavior between math mode and text mode.

kevinbarabash · 2017-08-22T11:09:22Z

test/katex-spec.js

+        compareParseTree("\\text{\\foo 1 2}", "\\text{12end}", {"\\foo": "#1#2end"});
+        compareParseTree("\\text{\\foo {1} {2}}", "\\text{12end}", {"\\foo": "#1#2end"});
+    });
+
    it("should allow for multiple expansion", function() {
        compareParseTree("1\\foo2", "1aa2", {
            "\\foo": "\\bar\\bar",
            "\\bar": "a",
        });
    });


Not necessary for this PR, but it might be good to have some tests that verify things about multiple expansions involving arguments.

For fun, I added such a test in 56d8900

kevinbarabash · 2017-08-24T04:24:41Z

src/MacroExpander.js

-     * Recursively expand first token, then return first non-expandable token.
+     * Return the next unexpanded token without removing anything from the
+     * stack.  Similar in behavior to TeX's `\futurelet`.
+     */


I find the term unexpanded token a little confusing, because it's possible that future gets called sometime after expandOnce gets called in which case there may be tokens on the stack that were the result of expanding a token. Maybe stating this in the following way:

Returns the topmost token on the stack, without expanding it.

Nice rewrite. Implemented.

kevinbarabash · 2017-08-24T04:34:01Z

src/MacroExpander.js

+
+    /**
+     * Expand next token only once, and leave it on the stack.
+     * Returns the token or its expansion.


This comment could be a bit clearer, maybe:

Expand the next token only once if possible. If the token is expanded, the resulting tokens will be push on to the stack in reverse order and will be return in as an array, also in reverse order. If not, the next token will be return without removing it from the stack. As a result, as long as there are tokens on the stack, the next token is at the top of the stack.

Feel free to wordsmith this more.

Revised accordingly.

kevinbarabash · 2017-08-24T04:43:36Z

src/MacroExpander.js

+    expandOnce() {
+        const topToken = this.popToken();
+        const name = topToken.text;
+        const macro = (name.charAt(0) === "\\");


macro => isMacro to more clearly indicate that it's a boolean.

kevinbarabash · 2017-08-24T04:48:39Z

src/MacroExpander.js

+        let expansion = this.macros[name];
+        if (typeof expansion === "function") {
+            expansion = expansion.call(this);
+        }


If expansion is a function then it must always return a string with valid TeX code which can contain argument placeholders, is that correct?

Technically, it could also return an Array that's a reverse-ordered list of tokens. This was functionality from before that I'm preserving, though I'm not sure we'd want to use it exactly that way. (In particular, the reverse-order aspect is weird.) But it is a nice way for a macro to be able to force strange parsing.

kevinbarabash · 2017-08-24T05:03:20Z

src/MacroExpander.js

+        if (!(macro && this.macros.hasOwnProperty(name))) {
+            // Fully expanded
+            this.stack.push(topToken);
+            return topToken;


This is a little weird to pop topToken at the start just to push it back on. I can't think of demonstrably better way to do this though.

I agree, unfortunately on both counts. If I didn't have to consumeSpaces, I could use future() and then popToken after the fully expanded case. Maybe there's another place to put the consumeSpaces... but at least this works.

kevinbarabash · 2017-08-24T05:07:55Z

src/MacroExpander.js

+    /**
+     * Expand the next token once (ignoring initial spaces like `get`),
+     * without removing anything from the stack, and return the top token
+     * on the stack.  Similar in behavior to TeX's `\expandafter\futurelet`.


How does it ignore initial spaces if it calls expandOnce right away which only ignores trailing spaces?

That comment was out-of-date. Updated.

kevinbarabash · 2017-08-24T05:09:22Z

src/MacroExpander.js

+            // expandOnce returns Token if and only if it's fully expanded.
+            if (expanded instanceof Token) {
+                // \relax stops the expansion, but shouldn't get returned (a
+                // null return value couldn't get implemented as a function).


I'm not sure what a null return value couldn't get implemented as a function means. Could you give an example?

Functions (as defined by defineFunction) return a ParseNode or an object that gets converted into a ParseNode by the parser. It's not possible for a function to return null that would just disappear in the final parse tree. Hence it would be impossible to implement a \relax that expands to nothing (which is different to how {} expands).

So if we return null the parser would think that it's an object an convert it to a ParseNode. I wonder when would we ever want to write a function that returns a null?

Long-term, I vaguely hope we can transition functions over to macros, as that's what TeX really does. So maybe not too critical... though certainly could be done by tweaking group parsing.

kevinbarabash · 2017-08-24T05:13:05Z

src/macros.js

@@ -3,6 +3,9 @@
 * This can be used to define some commands in terms of others.
 */

+import symbols from "./symbols";
+import utils from "./utils";
+
 // This function might one day accept additional argument and do more things.
 function defineMacro(name, body) {


Once we get flow in place we'll be able to type this as:

function defineMacro(name: string, body: string | () => string) {

edemaine · 2017-09-04T20:38:27Z

Sorry for the delay. Finally went through and implemented all the comments. Thanks for the review!!

Incidentally, this PR also fixes a bug in master with \text{\v x}. Currently, because of incorrect space handling, this renders as an accent (over a space) followed by an x. With the PR, it renders correctly (same as \text{\v{x}} does now).

marcianx · 2017-09-04T23:13:32Z

src/macros.js

 // This function might one day accept additional argument and do more things.
-function defineMacro(name, body) {
+function defineMacro(name: string, body: string | () => string) {


Cool! Mind adding // @flow at the top of this file?

Oops, indeed! Clearly still a flow noob...

Me too, for all that's worth. ;)

edemaine mentioned this pull request Aug 13, 2017

Implement AMSMath's \dots #599

Closed

kohler reviewed Aug 14, 2017

View reviewed changes

edemaine added 3 commits August 22, 2017 08:43

Advanced macro support and magic \dots

47c4cbf

Fix \relax behavior

6da7ced

Use \DOTSB in \iff, \implies, \impliedby

f49263d

edemaine force-pushed the macrofun branch from 122dfb4 to f49263d Compare August 22, 2017 12:44

kevinbarabash reviewed Aug 24, 2017

View reviewed changes

edemaine added 6 commits September 4, 2017 13:06

Add multiple expansion test

56d8900

Implement some of @kevinbarash's comments

11432c7

More @kevinbarabash comments

8de17b2

Merge remote-tracking branch 'upstream/master' into macrofun

de1f64b

Token moved from merge

7aebcf9

Add type to defineMacro

ca640ee

edemaine force-pushed the macrofun branch from 72f5171 to ca640ee Compare September 4, 2017 20:33

This was referenced Sep 4, 2017

To @flow: Token, Lexer, ParseError, and ParseNode. #839

Merged

Port MacroExpander to @flow #842

Closed

edemaine assigned kevinbarabash Sep 4, 2017

marcianx reviewed Sep 4, 2017

View reviewed changes

@flow

9063856

kevinbarabash approved these changes Sep 5, 2017

View reviewed changes

kevinbarabash merged commit 6857689 into KaTeX:master Sep 5, 2017

ronkok mentioned this pull request Sep 6, 2017

mhchem extension support #50

Closed

kevinbarabash mentioned this pull request Sep 15, 2017

Support \dots #528

Closed

This was referenced Jan 19, 2018

Made up LaTeX2e/AMS Mathjax aliases/symbols #130

Closed

Support Reaction Arrows #1078

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced macro support and magic \dots #794

Advanced macro support and magic \dots #794

edemaine commented Aug 13, 2017 •

edited

kevinbarabash commented Aug 14, 2017

kohler Aug 14, 2017

edemaine Aug 14, 2017

edemaine Aug 14, 2017 •

edited

kohler Aug 14, 2017

kohler Aug 14, 2017

edemaine Aug 14, 2017

kevinbarabash commented Aug 22, 2017

edemaine commented Aug 22, 2017

kevinbarabash Aug 22, 2017

kevinbarabash Aug 22, 2017

edemaine Sep 4, 2017

kevinbarabash Aug 24, 2017

edemaine Sep 4, 2017

kevinbarabash Aug 24, 2017

edemaine Sep 4, 2017

kevinbarabash Aug 24, 2017

edemaine Sep 4, 2017

kevinbarabash Aug 24, 2017

edemaine Sep 4, 2017

kevinbarabash Aug 24, 2017

edemaine Sep 4, 2017

kevinbarabash Aug 24, 2017

edemaine Sep 4, 2017

kevinbarabash Aug 24, 2017

edemaine Sep 4, 2017

kevinbarabash Sep 5, 2017

edemaine Sep 5, 2017

kevinbarabash Aug 24, 2017

edemaine Sep 4, 2017

edemaine commented Sep 4, 2017

marcianx Sep 4, 2017 •

edited

edemaine Sep 4, 2017

marcianx Sep 4, 2017

Advanced macro support and magic \dots #794

Advanced macro support and magic \dots #794

Conversation

edemaine commented Aug 13, 2017 • edited

kevinbarabash commented Aug 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edemaine Aug 14, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinbarabash commented Aug 22, 2017

edemaine commented Aug 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edemaine commented Sep 4, 2017

marcianx Sep 4, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edemaine commented Aug 13, 2017 •

edited

edemaine Aug 14, 2017 •

edited

marcianx Sep 4, 2017 •

edited