Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrices, arrays, environments #246

Merged
merged 2 commits into from Jun 18, 2015

Conversation

@gagern
Copy link
Collaborator

gagern commented Jun 12, 2015

This is a first step towards support for matrices, and it addresses issues raised in

  • #206 – support for \begin\end environments
  • #47\begin{array} (with argument) in particular
  • #43 – various matrix flavours like matrix, bmatrix

In particular, I've implemented parser support for environments, made environments available for array, matrix, pmatrix, bmatrix, vmatrix, Vmatrix, and added a proof-of-concept renderer for these.

This branch is really just a first step, one day of work put into this. This pull request is not meant as “please pull this now” but rather as a platform where future development can be discussed and coordinated, to result in a pull eventually.

My branch doesn't make much effort to do spacing the same way TeX does, mainly because I don't own a copy of Knuth's TeX Book. It doesn't work well with small font sizes. For really large numbers of rows, the delimiters appear too small in my opinion. But before I continue on these details, I'd like to know whether others have already put in some effort here. Perhaps someone has all the spacing calculations already worked out, and was just waiting for parser support to hook this up? I'd also like to know whether the way I tackled the problem, extending the parser in particular, is going in the right direction.

The way I see it, KaTeX has some knowledge of how high its boxes are, both above and below the baseline, but none about how wide they are. Is this correct? Seeing this, I decided that the most feasible way to implement HTML output for matrices would be column-wise: each column is a vlist, with items positioned according to the overall height and depth of the corresponding matrix row. Perhaps this will even allow matrices to line-wrap in certain environments. I've got some doubts whether the same approach would be reasonable for align* and similar environments. Should we stick to this approach, or should aligned equations and/or matrices be encoded to a <table> instead?

I can't make any guarantees how far I'll be able to follow this up. I'm doing this for a project at TU München, and once it satisfies the requirements we have there, continued development would likely have to be in my spare time, which is a rare commodity. How feature-complete does the thing have to be before it can get merged into the master?

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 12, 2015

@gagern Thank you for this pull request. It looks like you have some lint errors. Please have a look at those. You should also write some automated tests as well as image comparison tests. Please have a look the /dockers/Screenshotter folder for instructions. Also, you'll need to sign a CLA. See https://github.com/Khan/KaTeX/blob/master/CONTRIBUTING.md#cla for more details. All of these things are required before merging.

We would like the output of KaTeX to match TeX as much as possible. Here's a link to the TeXBook: http://www.ctex.org/documents/shredder/src/texbook.pdf.

@xymostech should be able to answer your question about whether to output tables column-wise vs. row-wise. As for knowing about the height and/or width of its boxes, it should have enough information to calculate both because we have glpyh metrics for everything.

@sophiebits

This comment has been minimized.

Copy link
Contributor

sophiebits commented Jun 12, 2015

@xymostech and I came to the same conclusion as you: that each column should be a vlist.

function array() { }
array.prototype = new EnvironmentBase();
array.prototype.numArgs = 1;
array.prototype.end = function() {

This comment has been minimized.

Copy link
@kevinbarabash

kevinbarabash Jun 12, 2015

Member

Putting props on the prototype is going to mess up nested environments. These should be member variables that are defined in the constructors.

This comment has been minimized.

Copy link
@gagern

gagern Jun 12, 2015

Author Collaborator

The properties in question are read-only, and even if some instance were to assign to them, those assignments would be local to the instance, since the values are immutable. I thought about having them as properties of the constructor instead of the prototype, but elements of the instance allow for two similar environments to share one implementation even if their number of arguments differs. Have I convinced you that this use here is all right?

This comment has been minimized.

Copy link
@kevinbarabash

kevinbarabash Jun 12, 2015

Member

I read through this a little to quickly on the first take. I see that you're creating new EnvironmentBase() objects so this should be okay.

* @param {ParseResult} body the overall parse result to be decomposed
* @param {Array} decomposed data in row major mode
*/
Parser.prototype.decomposeArray = function(mode, body) {

This comment has been minimized.

Copy link
@kevinbarabash

kevinbarabash Jun 12, 2015

Member

This is only used in environments, maybe it should go in that file.

This comment has been minimized.

Copy link
@gagern

gagern Jun 12, 2015

Author Collaborator

That is where I had them first, but it needs the ParseNode constructor, which I hadn't made part of Parser's prototype at that time. Since I later on needed ParseNode for the leftright group, and moving that to the parser felt even more wrong, I can move this decomposition to the environments. I guess it would be cleaner to export ParseNode without reference to the instance, preferably by moving it to a separate file.

This comment has been minimized.

Copy link
@kevinbarabash

kevinbarabash Jun 12, 2015

Member

I think it make sense to have ParseNode in a separate file.

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 12, 2015

I wrote some parser test cases for environment parsing. A more specific test case for array doesn't fit in well with what's currently in katex-spec.js. I wonder whether it would be a good idea to test input string against resulting parse trees, saved in a big JSON list. Of course noone writes down the expected parse tree manually, but one may well verify it before saving the file. What do you think?

Screenshot tests seem a bit premature. Since I know that I'll probably be modifying the layout, I'd rather postpone that till shortly before merging. Regarding CLA I'll have to check with my university.

env.lexer = this.lexer;
env.mode = mode;
env.beginNode = begin;
env.args = [];

This comment has been minimized.

Copy link
@kevinbarabash

kevinbarabash Jun 12, 2015

Member

If all of these things are needed by an environment, why not just pass them as arguments to the constructor. That way people won't forget to set these values.

This comment has been minimized.

Copy link
@gagern

gagern Jun 12, 2015

Author Collaborator

Not needed by most environments, but offered to them. The most important of these are args, argPositions and of course body. These can't be passed to the constructor. But even if they could, I'd rather not do this. There are potentially many environments, but there is only one place where instances of these are created. So having that code just in this one place avoids code duplication. Particularly since environments, the way I implemented them, don't call a base class constructor for common functionality.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 12, 2015

@gagern I think it's a good start for the katex-specs. Can you add tests for nested matrices as well as embedded fractions using \over within the matrices? Parse tree comparisons are a bit finicky because if something changes then it's a lot of work to redo them. There's a few tests that look at some of the trees. I don't think anything more comprehensive is worth the effort.

The nice thing about the screenshot tests is that if you change things in the tree but it still renders the same then it still passes. Only create screenshot tests when you're happy with the layout, although I wouldn't mind seeing a couple screenshots here in this thread.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 13, 2015

@gagern I'm going to do a more thorough review this weekend. In the meantime, if you have time, focus on the layout.

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 13, 2015

Up till today I had assumed that delimiter parentheses would scale smoothly between different sizes. Now I know that they grow in increments, both in LaTeX and in KaTeX. However, the points at which they grow seem to be very different:

Screenshot comparing KaTeX with LaTeX

This is simply $$\begin{pmatrix}1\\[1.52ex]2\end{pmatrix}$$ rendered but with different values plugged into that extra row space argument. How much effort should we put into delimiters to get the sizes just as LaTeX uses them?

Apart from this, the spacing of my arrays should be pretty close to what LaTeX does now, since I've put in some effort to reproduce those algorithms. But if delimiters are going to change (and I believe that we should be switching to bigger delimiters a bit earlier), then I'd rather see that handled before creating screenshots for automated testing.

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 13, 2015

I can't work out why the delimiters are smaller than they are in LaTeX. The computation from Rule 19, with \delimiterfactor=901 and \delimitershortfall=5pt is reproduced in makeLeftRightDelim. The \left\right documentation on page 292 doesn't seem to add any space either. (Note: page 359 lists more delimiters than KaTeX supports.) Page 442 suggests

See the METAFONT manual or the system documentation of tfm files for further information about successors and extensible characters.

I don't have that manual, and I don't know what file this tfm documentation could be. Perhaps either of these contains some more details on how extensible characters should be composed from their parts? Or perhaps the font we use doesn't exactly match what LaTeX uses? Right now I don't know how to proceed.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 13, 2015

Please have a look at xymostech@422c77b, in particular buildHTML.js and environment.js. The reason why we're not merging that branch in is because we discovered a bad interaction with infix operators such as \over and @xymostech hasn't had time to work on it.

That's unfortunate that the TeXBook isn't more elucidating in this situation. There's a reference in the commit to page 153 of "TeX for the Impatient" that may be more helpful. Here's a link that book (hopefully it's the same version): http://www.gnu.org/software/teximpatient/.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 13, 2015

I just saw your screenshots (I should've scrolled up sooner). We definitely want to match the delimiter sizes.

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 13, 2015

My code won't work with infix operators either (so far). I haven't yet decided how to tackle this. I could either have parseImplicitGroup return null for every &, \\ or \cr, and have some outer function to collect the implicit groups of the alignment. Or I could try to parse the list after the fact, including handling for the alignment commands in handleInfixNodes. The latter looks closer to what I already have, but I fear that other stuff is also affected by the implicit grouping, e.g. style commands. \small a & b must get parsed as {\small a} & b not \small {a & b}. So I tend towards the former.

Page 153 of TeX for the Impatient describes things like \lineskiplimit. But as far as I can tell, that only affects text mode vboxes. In $$\begin{matrix}\dfrac88\\\dfrac88\end{matrix}$$ the two matrix elements will always overlap, no even if \lineskiplimit and \lineskip are set to 100pt each. So I'd say that was a bit too impatient here.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 14, 2015

@gagern I cloned your repo as well as the other repo and have the two branches in separate working folders so I can compare them and see if I can help figure out the cause of the layout differences.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 14, 2015

screen shot 2015-06-13 at 8 27 17 pm

The correct rendering is on the right. On the left, the vlist is too high and there's not enough space between the 1 and the 2. As for actually positions, using the default font size the vlist child containing the 1 should be at top: -0.52778em; as opposed to top: -0.61em; and the 2 should be at top: 0.67222em; instead of top: 0.59em;.

Another interesting thing that probably doesn't matter is that the order that vlist children appear in the DOM is reverse.

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 14, 2015

There is how LaTeX sets this \begin{pmatrix}1\\2\end{pmatrix}. Display style or not makes no difference.

.\mathon
.\hbox(14.5001+9.50012)x19.7223
..\hbox(0.39998+23.60025)x7.36115, shifted -14.10013
...\OMX/cmex/m/n/10 ^^R
..\glue -5.0
..\vbox(14.5+9.5)x15.00002
...\hbox(8.39996+3.60004)x15.00002
....\glue(\tabskip) 0.0
....\hbox(8.39996+3.60004)x15.00002
.....\rule(8.39996+3.60004)x0.0
.....\glue 5.0
.....\glue 0.0 plus 1.0fil
.....\mathon
.....\OT1/cmr/m/n/10 1
.....\mathoff
.....\mathon
.....\hbox(0.0+0.0)x0.0
.....\mathoff
.....\glue 0.0 plus 1.0fil
.....\glue 5.0
....\glue(\tabskip) 0.0
...\glue(\lineskip) 0.0
...\hbox(8.39996+3.60004)x15.00002
....\glue(\tabskip) 0.0
....\hbox(8.39996+3.60004)x15.00002
.....\rule(8.39996+3.60004)x0.0
.....\glue 5.0
.....\glue 0.0 plus 1.0fil
.....\mathon
.....\OT1/cmr/m/n/10 2
.....\mathoff
.....\glue 0.0 plus 1.0fil
.....\glue 5.0
....\glue(\tabskip) 0.0
..\glue -5.0
..\hbox(0.39998+23.60025)x7.36115, shifted -14.10013
...\OMX/cmex/m/n/10 ^^S
.\mathoff

So each line has a height of 8.39996pt+3.60004pt=12pt, with zero space between them. And with ptPerEm=10 this makes 12pt=1.2em=0.61em+0.59em. So I'd say that my rendering is more correct here. It also agrees with what LaTeX does:

Comparison of three two-row vectors

I'm not worries about the internal spacing of my array box, I have much confidence in that. What I am worries about is the delimiters, since as you can see in my previous screenshot, they don't agree with what LaTeX does. And the individual parts of the delimiters are not represented in the box structure either, so we can't compare at that level.

What we can see is that the size of the delimiters, 0.39998+23.60025=24.00023, is slightly bigger than the total height of the vbox, 14.5+9.5=24.0. This leads to 14.5001+9.50012=24.00022 as the height of the whole box with delimiters. If anyone can explain where one of these two bigger sums comes from, then we're making progress here.

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 14, 2015

I asked for help on the TeX Stack Exchange. Perhaps someone there knows where these numbers come from.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 14, 2015

@gagern I'm surprised that LaTeX's rendering isn't centered. I just assumed that is should be centered.

I was able to modify your branch to reproduce @xymostech's layout. Getting the delimiters to be the right size is accomplished by increasing the height and depth of the span returned by buildHTML::array(array, group, options) by adding baselineSkip - 1.0 to each.

It's interesting that LaTeX overlaps the \dfrac{8}{8}s. Personally, I feel like increasing the gap would make for better rendering, but let's stick with replicating LaTeX's behavior. I'll make a branch with the contents centered and the other spacing (for posterity).

Here are a couple of screenshots with the increased delimiter size (1.52ex, and 2.17ex):
screen shot 2015-06-13 at 10 12 44 pm screen shot 2015-06-13 at 10 12 52 pm

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 14, 2015

Now that I think of it, perhaps the difference is simply due to the way the next suitable delimiter size gets chosen. But it's not that easy. With $\begin{pmatrix}1\\[2.16ex]2\end{pmatrix}$ I get

.\hbox(19.15+14.14998)x20.83339
..\hbox(0.39998+29.60031)x7.91669, shifted -17.10016
...\OMX/cmex/m/n/10  
..\glue -5.0
..\vbox(19.15+14.14998)x15.00002

with 19.15+14.14998=33.29998 outer and inner size but 0.39998+29.60031=30.00029 delimiter size. So the assumption “the delimiter is no smaller than the content” is incorrect. And I was wrong: one can see the parts of a big delimiter, the examples so far just weren't big enough. With $\begin{pmatrix}1\\[2.17ex]2\end{pmatrix}$ LaTeX suddenly produces

.\hbox(20.50017+15.50017)x22.50005
..\vbox(0.39998+35.60036)x8.75002, shifted -20.10019
...\hbox(0.39998+17.60019)x8.75002
....\OMX/cmex/m/n/10 0
...\hbox(0.39998+17.60019)x8.75002
....\OMX/cmex/m/n/10 @
..\glue -5.0
..\vbox(19.17151+14.17151)x15.00002

i.e. a paren split in two parts. It takes 3.1ex till KaTeX uses two parts there. At that point, it reports delimiters with a size of 1.155+0.64502=1.80002em, which is a good match for the 0.39998+17.60019=18.00017pt in LaTeX. So it knows how big its delimiters are. But what about the content? Still with 3.1ex, makeLeftRightDelim sees 19.2249+12.5805pt while LaTeX reports
21.1736+16.1736 for the inner vbox. So the outside height of that vbox is wrong.

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 14, 2015

Found my mistake. Now I'll only have to get docker up and running, and then I'll be able to take some snapshots. But not today.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 14, 2015

I pulled the baselineSkip - 1.0 number from environments.js on xymostech's branch. Note: this should only be added for the matrix environment and not the array or align environment. We'd also like to support the align environment because it's used heavily at Khan Academy, but I wouldn't worry about it b/c there's enough other stuff going on in this pull request.

As for the issue of infix operators, I think maybe parseExpression could be modified to accept multiple tokens to break on. Also, it seems like instead of calling parseExpression after we see \begin{array} we should probably have a parseArray function so that it can recursively call parseExpression using & and \\ as tokens to break on as opposed to processing the array later in decomposeArray.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 14, 2015

Found my mistake. Now I'll only have to get docker up and running, and then I'll be able to take some snapshots. But not today.

Excellent!

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 14, 2015

I think maybe parseExpression could be modified to accept multiple tokens to break on.

Do we need an argument list for this, or can we hard-wire this? Is there ever a situation where we want parseExpression to continue past one of }, \right, \end, &, \\, \cr, \egroup? I can think of no use case where this would make sense, so I'd terminate on all of these, and the caller can then ensure that the current lexer token is the one (or among those) it expects.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 14, 2015

The mode and environment dictate which tokens to break on, e.g. when inside \text we shouldn't break on & and when we're in "math" mode but not inside an "array" environment & should cause an error.

Hard coding this inside parseExpression() would require us to put logic to deal with a particular environment and/or mode inside parseExpression(). Changing breakOnToken to breakOnTokens gets us what we need with minimal modifications to the rest of the codebase.

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 14, 2015

When we are inside \text, we break on all the things I mentioned, and report an error if it's not }. After all, a lone & or \end inside a \text is an error, and expect can be used to report that. So instead of letting parseExpression decide whether to break or error, I'd let it break in all cases and leave generating the error to the caller.

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 14, 2015

Come to think of it, we probably need this behavior in any case. When parseImplicitGroup has to handle a size or style function, that implicit body has to end at & if we are inside a matrix, and has to end at \end if we are inside an environment, and so on. So if we don't want to pass the list of break symbols all the way down to the parseImplicitGroup call, then we have to make sure that we don't need such a list in the first place.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 14, 2015

I tried hardcoding it, but there's a few errors that appear in the tests. Here's the code that I used:

Parser.prototype.parseExpression = function(pos, mode, breakOnInfix) {
    var body = [];
    while (true) {
        var lex = this.lexer.lex(pos, mode);
        if (["}", "]", "&", "\\right", "\\end"].indexOf(lex.text) !== -1) {
            break;
        }
        var atom = this.parseAtom(pos, mode);
        if (!atom) {
            break;
        }
        if (breakOnInfix && atom.result.type === "infix") {
            break;
        }
        body.push(atom.result);
        pos = atom.position;
    }
    return new ParseResult(this.handleInfixNodes(body, mode), pos);
};

Here are the resulting errors which I haven't had time to look into:

Failures:

  1) A close parser should not fail
    Message:
      ')]?!' failed parsing with error: KaTeX parse error: Expected 'EOF', got ']' at position 2: )]̲?!


    Stacktrace:
      Error: ')]?!' failed parsing with error: KaTeX parse error: Expected 'EOF', got ']' at position 2: )]̲?!
    at Object.<anonymous> (/Users/kevin/KaTeX/test/katex-spec.js:229:28)


  2) A close parser should build a list of closes
    Message:
      ')]?!' failed parsing with error: KaTeX parse error: Expected 'EOF', got ']' at position 2: )]̲?!


    Stacktrace:
      Error: ')]?!' failed parsing with error: KaTeX parse error: Expected 'EOF', got ']' at position 2: )]̲?!
    at getParsed (/Users/kevin/KaTeX/test/katex-spec.js:26:18)
    at Object.<anonymous> (/Users/kevin/KaTeX/test/katex-spec.js:233:21)


  3) A close parser should build a list of closes
    Message:
      ParseError: KaTeX parse error: Expected 'EOF', got ']' at position 2: )]̲?!


    Stacktrace:
      ParseError: KaTeX parse error: Expected 'EOF', got ']' at position 2: )]̲?!
  at new ParseError (/Users/kevin/KaTeX/src/ParseError.js:29:16)
  at Parser.expect (/Users/kevin/KaTeX/src/Parser.js:86:15)
  at Parser.parseInput (/Users/kevin/KaTeX/src/Parser.js:112:10)
  at Parser.parse (/Users/kevin/KaTeX/src/Parser.js:100:22)
  at parseTree (/Users/kevin/KaTeX/src/parseTree.js:14:19)
  at getParsed (/Users/kevin/KaTeX/test/katex-spec.js:28:12)
  at Object.<anonymous> (/Users/kevin/KaTeX/test/katex-spec.js:233:21)
  at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)



  4) An optional argument parser should not fail
    Message:
      '\frac[1]{2}{3}' failed parsing with error: KaTeX parse error: Expected 'EOF', got ']' at position 8: \frac[1]̲{2}{3}


    Stacktrace:
      Error: '\frac[1]{2}{3}' failed parsing with error: KaTeX parse error: Expected 'EOF', got ']' at position 8: \frac[1]̲{2}{3}
    at Object.<anonymous> (/Users/kevin/KaTeX/test/katex-spec.js:1343:35)

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 14, 2015

@gagern please note, these failures are for the code without your changes. I'm on my personal computer and I don't have a copy of your repo, but I assume that the failures would be similar.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 14, 2015

Using parseExpression might be more difficult that than I originally thought. The problem is that we we want each cell in the array to be parsed as a separate expression. These are delimited by & (and rows by \\). At the same time we want parseExpression to return when it it hits a \end. The only problem is that we can't tell the difference between ParseResults in the middle of the array and ParseResults at the end. In order to get parseExpression working correctly in this situation we'd have to have it return which token it's halting on.

I suppose another approach would be to modify handleInfix to handle the & and \\ when inside certain environments. It would be nice if we could call decomposeArray array first, then we could call handleInfix on each of the cells.

@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 15, 2015

@KevinB7

if (["}", "]", "&", "\\right", "\\end"].indexOf(lex.text) !== -1) {

] is a special case, since it's an ordinary symbol except when parsing an optional argument. So we should keep the breakOnToken approach for that, in addition to the hard-wired expression endings.

In order to get parseExpression working correctly in this situation we'd have to have it return which token it's halting on.

I've let parseExpression return the token it's halting on as a peek property. But that's just for the sake of performance. Theoretically we could lex again starting at the returned position to see what token comes next. I think tat on the whole we might be lexing a lot of stuff more than once. Might be worthwhile to investigate whether this is really the case, and if so decide on how to mitigate the issue. But that would be a separate ticket.

@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 18, 2015

@gagern I agree with a lot of what you're saying here. It might be good to revisit some of these design decisions. Could you open a separate issue so that we can continue this discussion there?

Everything looks good to go. We'd like to have a clean history. Can you squash this into a single commit?

Thanks so much for this pull request.

var lex = this.lexer.lex(pos, mode);
if (breakOnToken != null && lex.text === breakOnToken) {
lex = this.lexer.lex(pos, mode);
if (["}", "\\end", "\\right", "&", "\\\\", "\\cr"]

This comment has been minimized.

Copy link
@xymostech

xymostech Jun 18, 2015

Contributor

Could you move this list out into a constant somewhere?

.indexOf(lex.text) !== -1) {
break;
}
if (breakOnToken !== null && lex.text === breakOnToken) {

This comment has been minimized.

Copy link
@xymostech

xymostech Jun 18, 2015

Contributor

If you make this breakOnToken != null, then you can just make the last argument optional instead of having to pass null in everywhere.

@xymostech

This comment has been minimized.

Copy link
Contributor

xymostech commented Jun 18, 2015

Hi! Thank you so much for this! Everything looks great! I'm just going through and making some small javascript nits, but nothing major. Thanks!

@@ -231,28 +231,24 @@ var makeStackedDelim = function(delim, heightTotal, center, options, mode) {
var repeatHeightTotal = repeatMetrics.height + repeatMetrics.depth;
var bottomMetrics = getMetrics(bottom, font);
var bottomHeightTotal = bottomMetrics.height + bottomMetrics.depth;
var middleMetrics, middleHeightTotal;
var middleMetrics = 0, middleHeightTotal = 0, middleFactor = 1;

This comment has been minimized.

Copy link
@xymostech

xymostech Jun 18, 2015

Contributor

middleMetrics isn't a number, it's an object, so it probably shouldn't start out initialized to 0. Also, when we initialize and declare variables at the same time, we like to put them on separate var lines, so

var middleMetrics;
var middleHeightTotal = 0;
...
this.lexer, positions[1]);
}
var name = "";
for (i = 0; i < nameGroup.value.length; ++i) {

This comment has been minimized.

Copy link
@xymostech

xymostech Jun 18, 2015

Contributor

this should be var i


it("should parse a simple environment", function() {
expect("\\begin{matrix}a&b\\\\c&d\\end{matrix}").toParse();
});

This comment has been minimized.

Copy link
@xymostech

xymostech Jun 18, 2015

Contributor

Can you add a test that makes sure the \cr command also works?

@gagern gagern force-pushed the gagern:matrices branch from d7fc29c to 9e852a2 Jun 18, 2015
@gagern

This comment has been minimized.

Copy link
Collaborator Author

gagern commented Jun 18, 2015

I squashed everything into one commit, except the improvements to the stacked delimiters, since these are actually a separate thing and not directly linked to all the rest. I had worked on that since I had assumed the reason for the spacing discrepancies somewhere in there, but in the end it was somewhere else. Hope that's OK for you.

gagern added 2 commits Jun 12, 2015
This commit introduces environments, and implements the parser
infrastructure to handle them, even including arguments after the
“\begin{name}” construct.  It also offers a way to turn array-like data
structures, i.e. delimited by “&” and “\\”, into nested arrays of groups.
Environments are essentially functions which call back to the parser to
parse their body.  It is their responsibility to stop at the next “\end”,
while the parser takes care of verifing that the names match between
“\begin” and “\end”.  The environment has to return a ParseResult, to
provide the position that goes with the resulting node.

One application of this is the “array” environment.  So far, it supports
column alignment, but no column separators, and no multi-column shorthands
using “*{…}”.  Building on the same infrastructure, there are “matrix”,
“pmatrix”, “bmatrix”, “vmatrix” and “Vmatrix” environments.  Internally
these are just “\left..\right” wrapped around an array with no margins at
its ends.  Spacing for arrays and matrices was derived from the LaTeX
sources, and comments indicate the appropriate references.

Now we have hard-wired breaks in parseExpression, to always break on “}”,
“\end”, “\right”, “&”, “\\” and “\cr”.  This means that these symbols are
never PART of an expression, at least not without some nesting.  They may
follow AFTER an expression, and the caller of parseExpression should be
expecting them.  The implicit groups for sizing or styling don't care what
ended the expression, which is all right for them.  We still have support
for breakOnToken, but now it is only used for “]” since that MAY be used to
terminate an optional argument, but otherwise it's an ordinary symbol.
Using a loop to determine the number of symbols we need is intuitive but
hardly efficient.  A Math.ceil in this situation is much better.

After that ceil or the loop that it replaced, the total height should be
equal to the minimal height plus an integral number times the height of the
repeat symbol.  That integer equals (half) number of loop iterations in the
original code, and the repeatCount variable in my new code.  So later on,
the quotient (repeatHeight / repeatHeightTotal) should be an integer, at
least up to numeric errors.  Applying ceil instead of round to that is
asking for these numeric errors to seriously break stuff.  Just reusing the
repeatCount is much simpler, shorter and more elegant.

Having distinct topSymbolCount and bottomSymbolCount seems pointless, since
the only reason why these could ever be different is due to the fact that
bottomRepeatHeight was computed using topHeightTotal, which looks like a
bug.  The old loop and new ceil assume a symmetric repeatCount.
@gagern gagern force-pushed the gagern:matrices branch from 9e852a2 to 397dcb3 Jun 18, 2015
@kevinbarabash

This comment has been minimized.

Copy link
Member

kevinbarabash commented Jun 18, 2015

@gagern Thank you for addressing all of @xymostech's concerns. I think it makes sense to separate the commits like you've done. LGTM.

kevinbarabash added a commit that referenced this pull request Jun 18, 2015
Matrices, arrays, environments
@kevinbarabash kevinbarabash merged commit 32e8ffe into KaTeX:master Jun 18, 2015
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@xymostech

This comment has been minimized.

Copy link
Contributor

xymostech commented Jun 18, 2015

+1 on keeping the stacked delimiters stuff separate. Thanks for cleaning that code up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.