New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matrices, arrays, environments #246
Conversation
@gagern Thank you for this pull request. It looks like you have some lint errors. Please have a look at those. You should also write some automated tests as well as image comparison tests. Please have a look the /dockers/Screenshotter folder for instructions. Also, you'll need to sign a CLA. See https://github.com/Khan/KaTeX/blob/master/CONTRIBUTING.md#cla for more details. All of these things are required before merging. We would like the output of KaTeX to match TeX as much as possible. Here's a link to the TeXBook: http://www.ctex.org/documents/shredder/src/texbook.pdf. @xymostech should be able to answer your question about whether to output tables column-wise vs. row-wise. As for knowing about the height and/or width of its boxes, it should have enough information to calculate both because we have glpyh metrics for everything. |
@xymostech and I came to the same conclusion as you: that each column should be a vlist. |
function array() { } | ||
array.prototype = new EnvironmentBase(); | ||
array.prototype.numArgs = 1; | ||
array.prototype.end = function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting props on the prototype is going to mess up nested environments. These should be member variables that are defined in the constructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The properties in question are read-only, and even if some instance were to assign to them, those assignments would be local to the instance, since the values are immutable. I thought about having them as properties of the constructor instead of the prototype, but elements of the instance allow for two similar environments to share one implementation even if their number of arguments differs. Have I convinced you that this use here is all right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through this a little to quickly on the first take. I see that you're creating new EnvironmentBase() objects so this should be okay.
I wrote some parser test cases for environment parsing. A more specific test case for array doesn't fit in well with what's currently in Screenshot tests seem a bit premature. Since I know that I'll probably be modifying the layout, I'd rather postpone that till shortly before merging. Regarding CLA I'll have to check with my university. |
env.lexer = this.lexer; | ||
env.mode = mode; | ||
env.beginNode = begin; | ||
env.args = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all of these things are needed by an environment, why not just pass them as arguments to the constructor. That way people won't forget to set these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed by most environments, but offered to them. The most important of these are args
, argPositions
and of course body
. These can't be passed to the constructor. But even if they could, I'd rather not do this. There are potentially many environments, but there is only one place where instances of these are created. So having that code just in this one place avoids code duplication. Particularly since environments, the way I implemented them, don't call a base class constructor for common functionality.
@gagern I think it's a good start for the katex-specs. Can you add tests for nested matrices as well as embedded fractions using The nice thing about the screenshot tests is that if you change things in the tree but it still renders the same then it still passes. Only create screenshot tests when you're happy with the layout, although I wouldn't mind seeing a couple screenshots here in this thread. |
@gagern I'm going to do a more thorough review this weekend. In the meantime, if you have time, focus on the layout. |
Up till today I had assumed that delimiter parentheses would scale smoothly between different sizes. Now I know that they grow in increments, both in LaTeX and in KaTeX. However, the points at which they grow seem to be very different: This is simply Apart from this, the spacing of my arrays should be pretty close to what LaTeX does now, since I've put in some effort to reproduce those algorithms. But if delimiters are going to change (and I believe that we should be switching to bigger delimiters a bit earlier), then I'd rather see that handled before creating screenshots for automated testing. |
I can't work out why the delimiters are smaller than they are in LaTeX. The computation from Rule 19, with
I don't have that manual, and I don't know what file this tfm documentation could be. Perhaps either of these contains some more details on how extensible characters should be composed from their parts? Or perhaps the font we use doesn't exactly match what LaTeX uses? Right now I don't know how to proceed. |
Please have a look at xymostech@422c77b, in particular buildHTML.js and environment.js. The reason why we're not merging that branch in is because we discovered a bad interaction with infix operators such as That's unfortunate that the TeXBook isn't more elucidating in this situation. There's a reference in the commit to page 153 of "TeX for the Impatient" that may be more helpful. Here's a link that book (hopefully it's the same version): http://www.gnu.org/software/teximpatient/. |
I just saw your screenshots (I should've scrolled up sooner). We definitely want to match the delimiter sizes. |
My code won't work with infix operators either (so far). I haven't yet decided how to tackle this. I could either have Page 153 of TeX for the Impatient describes things like |
@gagern I cloned your repo as well as the other repo and have the two branches in separate working folders so I can compare them and see if I can help figure out the cause of the layout differences. |
The correct rendering is on the right. On the left, the vlist is too high and there's not enough space between the 1 and the 2. As for actually positions, using the default font size the vlist child containing the Another interesting thing that probably doesn't matter is that the order that vlist children appear in the DOM is reverse. |
There is how LaTeX sets this
So each line has a height of 8.39996pt+3.60004pt=12pt, with zero space between them. And with ptPerEm=10 this makes 12pt=1.2em=0.61em+0.59em. So I'd say that my rendering is more correct here. It also agrees with what LaTeX does: I'm not worries about the internal spacing of my array box, I have much confidence in that. What I am worries about is the delimiters, since as you can see in my previous screenshot, they don't agree with what LaTeX does. And the individual parts of the delimiters are not represented in the box structure either, so we can't compare at that level. What we can see is that the size of the delimiters, 0.39998+23.60025=24.00023, is slightly bigger than the total height of the vbox, 14.5+9.5=24.0. This leads to 14.5001+9.50012=24.00022 as the height of the whole box with delimiters. If anyone can explain where one of these two bigger sums comes from, then we're making progress here. |
I asked for help on the TeX Stack Exchange. Perhaps someone there knows where these numbers come from. |
@gagern I'm surprised that LaTeX's rendering isn't centered. I just assumed that is should be centered. I was able to modify your branch to reproduce @xymostech's layout. Getting the delimiters to be the right size is accomplished by increasing the height and depth of the span returned by It's interesting that LaTeX overlaps the Here are a couple of screenshots with the increased delimiter size (1.52ex, and 2.17ex): |
Now that I think of it, perhaps the difference is simply due to the way the next suitable delimiter size gets chosen. But it's not that easy. With
with 19.15+14.14998=33.29998 outer and inner size but 0.39998+29.60031=30.00029 delimiter size. So the assumption “the delimiter is no smaller than the content” is incorrect. And I was wrong: one can see the parts of a big delimiter, the examples so far just weren't big enough. With
i.e. a paren split in two parts. It takes 3.1ex till KaTeX uses two parts there. At that point, it reports delimiters with a size of 1.155+0.64502=1.80002em, which is a good match for the 0.39998+17.60019=18.00017pt in LaTeX. So it knows how big its delimiters are. But what about the content? Still with 3.1ex, |
Found my mistake. Now I'll only have to get docker up and running, and then I'll be able to take some snapshots. But not today. |
I pulled the As for the issue of infix operators, I think maybe |
Excellent! |
Do we need an argument list for this, or can we hard-wire this? Is there ever a situation where we want |
The mode and environment dictate which tokens to break on, e.g. when inside Hard coding this inside |
When we are inside |
Come to think of it, we probably need this behavior in any case. When |
I tried hardcoding it, but there's a few errors that appear in the tests. Here's the code that I used:
Here are the resulting errors which I haven't had time to look into:
|
@gagern please note, these failures are for the code without your changes. I'm on my personal computer and I don't have a copy of your repo, but I assume that the failures would be similar. |
Using I suppose another approach would be to modify |
I've let |
@gagern I agree with a lot of what you're saying here. It might be good to revisit some of these design decisions. Could you open a separate issue so that we can continue this discussion there? Everything looks good to go. We'd like to have a clean history. Can you squash this into a single commit? Thanks so much for this pull request. |
var lex = this.lexer.lex(pos, mode); | ||
if (breakOnToken != null && lex.text === breakOnToken) { | ||
lex = this.lexer.lex(pos, mode); | ||
if (["}", "\\end", "\\right", "&", "\\\\", "\\cr"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this list out into a constant somewhere?
Hi! Thank you so much for this! Everything looks great! I'm just going through and making some small javascript nits, but nothing major. Thanks! |
@@ -231,28 +231,24 @@ var makeStackedDelim = function(delim, heightTotal, center, options, mode) { | |||
var repeatHeightTotal = repeatMetrics.height + repeatMetrics.depth; | |||
var bottomMetrics = getMetrics(bottom, font); | |||
var bottomHeightTotal = bottomMetrics.height + bottomMetrics.depth; | |||
var middleMetrics, middleHeightTotal; | |||
var middleMetrics = 0, middleHeightTotal = 0, middleFactor = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
middleMetrics
isn't a number, it's an object, so it probably shouldn't start out initialized to 0
. Also, when we initialize and declare variables at the same time, we like to put them on separate var
lines, so
var middleMetrics;
var middleHeightTotal = 0;
...
I squashed everything into one commit, except the improvements to the stacked delimiters, since these are actually a separate thing and not directly linked to all the rest. I had worked on that since I had assumed the reason for the spacing discrepancies somewhere in there, but in the end it was somewhere else. Hope that's OK for you. |
This commit introduces environments, and implements the parser infrastructure to handle them, even including arguments after the “\begin{name}” construct. It also offers a way to turn array-like data structures, i.e. delimited by “&” and “\\”, into nested arrays of groups. Environments are essentially functions which call back to the parser to parse their body. It is their responsibility to stop at the next “\end”, while the parser takes care of verifing that the names match between “\begin” and “\end”. The environment has to return a ParseResult, to provide the position that goes with the resulting node. One application of this is the “array” environment. So far, it supports column alignment, but no column separators, and no multi-column shorthands using “*{…}”. Building on the same infrastructure, there are “matrix”, “pmatrix”, “bmatrix”, “vmatrix” and “Vmatrix” environments. Internally these are just “\left..\right” wrapped around an array with no margins at its ends. Spacing for arrays and matrices was derived from the LaTeX sources, and comments indicate the appropriate references. Now we have hard-wired breaks in parseExpression, to always break on “}”, “\end”, “\right”, “&”, “\\” and “\cr”. This means that these symbols are never PART of an expression, at least not without some nesting. They may follow AFTER an expression, and the caller of parseExpression should be expecting them. The implicit groups for sizing or styling don't care what ended the expression, which is all right for them. We still have support for breakOnToken, but now it is only used for “]” since that MAY be used to terminate an optional argument, but otherwise it's an ordinary symbol.
Using a loop to determine the number of symbols we need is intuitive but hardly efficient. A Math.ceil in this situation is much better. After that ceil or the loop that it replaced, the total height should be equal to the minimal height plus an integral number times the height of the repeat symbol. That integer equals (half) number of loop iterations in the original code, and the repeatCount variable in my new code. So later on, the quotient (repeatHeight / repeatHeightTotal) should be an integer, at least up to numeric errors. Applying ceil instead of round to that is asking for these numeric errors to seriously break stuff. Just reusing the repeatCount is much simpler, shorter and more elegant. Having distinct topSymbolCount and bottomSymbolCount seems pointless, since the only reason why these could ever be different is due to the fact that bottomRepeatHeight was computed using topHeightTotal, which looks like a bug. The old loop and new ceil assume a symmetric repeatCount.
@gagern Thank you for addressing all of @xymostech's concerns. I think it makes sense to separate the commits like you've done. LGTM. |
Matrices, arrays, environments
+1 on keeping the stacked delimiters stuff separate. Thanks for cleaning that code up! |
…joberg (KaTeX#246) * Fixed navigationbar top, premium screen and small fixes in chapter screens * Exercise, premium and navbar screen
This is a first step towards support for matrices, and it addresses issues raised in
\begin
…\end
environments\begin{array}
(with argument) in particularmatrix
,bmatrix
…In particular, I've implemented parser support for environments, made environments available for
array
,matrix
,pmatrix
,bmatrix
,vmatrix
,Vmatrix
, and added a proof-of-concept renderer for these.This branch is really just a first step, one day of work put into this. This pull request is not meant as “please pull this now” but rather as a platform where future development can be discussed and coordinated, to result in a pull eventually.
My branch doesn't make much effort to do spacing the same way TeX does, mainly because I don't own a copy of Knuth's TeX Book. It doesn't work well with small font sizes. For really large numbers of rows, the delimiters appear too small in my opinion. But before I continue on these details, I'd like to know whether others have already put in some effort here. Perhaps someone has all the spacing calculations already worked out, and was just waiting for parser support to hook this up? I'd also like to know whether the way I tackled the problem, extending the parser in particular, is going in the right direction.
The way I see it, KaTeX has some knowledge of how high its boxes are, both above and below the baseline, but none about how wide they are. Is this correct? Seeing this, I decided that the most feasible way to implement HTML output for matrices would be column-wise: each column is a vlist, with items positioned according to the overall height and depth of the corresponding matrix row. Perhaps this will even allow matrices to line-wrap in certain environments. I've got some doubts whether the same approach would be reasonable for
align*
and similar environments. Should we stick to this approach, or should aligned equations and/or matrices be encoded to a<table>
instead?I can't make any guarantees how far I'll be able to follow this up. I'm doing this for a project at TU München, and once it satisfies the requirements we have there, continued development would likely have to be in my spare time, which is a rare commodity. How feature-complete does the thing have to be before it can get merged into the master?