Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement strict mode (replacing unicodeTextInMathMode) #1278

Merged
merged 15 commits into from May 13, 2018

Conversation

edemaine
Copy link
Member

Add new "strict" setting (default value false) that can take a boolean (whether to throw an error or silently ignore), string ("ignore", "warn", or "error"), or a function possibly returning such a value. This enables a variety of ways of handling or ignoring transgressions from "true" LaTeX behavior, making KaTeX easier to use while still providing the ability for strict LaTeX adherance.

Resolve #1226, implementing that spec, for two existing transgressions from regular LaTeX:

  • src/functions/kern.js had some errors and warnings about use of (units in) math vs. text mode commands.
  • The former setting unicodeTextInMathMode (not in any released version) needed to be set to true to enable Unicode text symbols in math mode.

Now these are controlled by the strict setting. By default, KaTeX is now very permissive, but if desired, the user can request warnings or errors.

@codecov
Copy link

codecov bot commented Apr 30, 2018

Codecov Report

Merging #1278 into master will decrease coverage by 0.11%.
The diff coverage is 78.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1278      +/-   ##
==========================================
- Coverage   83.99%   83.87%   -0.12%     
==========================================
  Files          62       64       +2     
  Lines        4017     4081      +64     
  Branches      670      689      +19     
==========================================
+ Hits         3374     3423      +49     
- Misses        546      559      +13     
- Partials       97       99       +2
Impacted Files Coverage Δ
src/ParseNode.js 100% <ø> (ø) ⬆️
src/MacroExpander.js 92.39% <ø> (-0.25%) ⬇️
test/Warning.js 100% <100%> (ø)
src/Lexer.js 100% <100%> (ø) ⬆️
src/environments/array.js 43.63% <100%> (+0.77%) ⬆️
src/macros.js 85.09% <100%> (+0.09%) ⬆️
src/Parser.js 92.95% <100%> (+0.07%) ⬆️
src/functions/kern.js 70.58% <50%> (-10.67%) ⬇️
src/Settings.js 71.87% <64%> (-28.13%) ⬇️
test/setup.js 75% <75%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4801ab8...ca212c8. Read the comment docs.

Add new "strict" setting (default value false) that can take a boolean
(whether to throw an error or silently ignore), string ("ignore",
"warn", or "error"), or a function possibly returning such a value.
This enables a variety of ways of handling or ignoring transgressions
from "true" LaTeX behavior, making KaTeX easier to use while still
providing the ability for strict LaTeX adherance.

Resolve KaTeX#1226, implementing that spec, for two existing
transgressions from regular LaTeX:

* src/functions/kern.js had some errors and warnings about use of
  (units in) math vs. text mode commands.
* The former setting unicodeTextInMathMode (not in any released version)
  needed to be set to true to enable Unicode text symbols in math mode.

Now these are controlled by the strict setting.  By default, KaTeX is now
very permissive, but if desired, the user can request warnings or errors.
README.md Outdated
- `maxSize`: `number`. If non-zero, all user-specified sizes, e.g. in `\rule{500em}{500em}`, will be capped to `maxSize` ems. Otherwise, users can make elements and spaces arbitrarily large (the default behavior).
- `strict`: `boolean` or `string` or `function` (default: `false`). If `false` or `"ignore`", ignore features that make writing LaTeX convenient but are not actually supported by (Xe)LaTeX (similar to MathJax). If `true` or `"error"` (LaTeX compatibility mode), throw an error for any such transgressions. If `"warn"`, warn about such behavior via `console.warn`. Provide a custom function `handler(errorCode, errorMsg, token)` to default behavior depending on the type of transgression (summarized by the string code `errorCode` and detailed in `errorMsg`). A list of such features and their `errorCode`s:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't compatibility mode usually mean non-strict mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, "compatibility" maybe isn't the right term. Perhaps "adherence" or "faithfulness"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a rewording of this paragraph that in particular uses "faithfulness" instead of "compatibility".

Copy link
Member

@kevinbarabash kevinbarabash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for aggregating various settings into strict. Looking forward to getting this merged in so that we can continue to build out functionality while have a consistent way to provide a strict mode to users who need it.

src/Settings.js Outdated
* Report nonstrict (non-LaTeX-compatible) input.
* Can safely not be called if `this.strict` is false in JavaScript.
*/
nonstrict(errorCode: string, errorMsg: string, token?: Token | ParseNode) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. Maybe rename token to tokenOrParseNode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was matching ParseError's constructor. If I change one, I should probably change both... I could also see something like where or context being a good name.

"LaTeX-incompatible input and strict mode is set to 'error': " +
`${errorMsg} [${errorCode}]`, token);
} else if (strict === "warn") {
typeof console !== "undefined" && console.warn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is console not defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of such an environment, but presumably they might exist (or the global console could be overwritten?). In any case, I'm following existing code from the KaTeX codebase, in particular:

        typeof console !== "undefined" && console.warn(
            "No character metrics for '" + value + "' in style '" +
                fontName + "'");

It might be good to define a katex.warn or something that can be overwritten by the user, but defaults to console.warn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, I didn't realize we were doing that elsewhere.

if (query.strict.match(/^(1|t|y|e)/)) {
options.strict = "error";
} if (query.strict && query.strict.match(/^(w)/)) {
options.strict = "warn";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of this.

src/Settings.js Outdated

import type { MacroMap } from "./macros";

export type StrictFunction =
(errorCode: string, errorMsg: string, token?: Token | ParseNode) =>
?(boolean | string);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing this level of customizability to users seems like overkill.

Copy link
Member Author

@edemaine edemaine May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This level of customizability lets users allow some nonstrict behavior and not others. @ylemkimon at least seemed to think this would be useful, and it's not exactly hard to support. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point for allowing a function is that it enables custom warning functionality, if console.warn isn't the desired warning output. For example, this would make our own testing easier, as jest doesn't really handle warnings well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's nice to have customizability as long as we have simpler options to set them. We may add lots of features to the non-strict mode, and strict option may become too generic.

src/Settings.js Outdated
export type SettingsOptions = {
displayMode?: boolean;
throwOnError?: boolean;
errorColor?: string;
macros?: MacroMap;
colorIsTextColor?: boolean;
unicodeTextInMathMode?: boolean;
strict?: boolean | string | StrictFunction;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's sufficient if strict?: boolean | "ignore" | "warn" | "error".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced string with "ignore" | "warn" | "error" but left StrictFunction for now.

@edemaine
Copy link
Member Author

edemaine commented May 8, 2018

If possible (in particular, if it doesn't slow things down much), I think it would be good to merge this before the next release, because it makes behavior more permissive than the current release. So far we haven't released with the unicodeTextInMathMode setting, which on master has a default value of false, which is less permissive than the current release.

In other words, if we release now, we might break some code. If we merge this first, we shouldn't, because of more permissive defaults. (Alternatively, we could change the default unicodeTextInMathMode to true.)

Also, this PR removes the unicodeTextInMathMode setting, so it'd be nice not to add a setting and then remove it in the next release, if we can avoid it.

@edemaine
Copy link
Member Author

edemaine commented May 8, 2018

By the way, we might also discuss whether the right default for strict is

  1. false/"ignore", as it is currently in this PR; or
  2. "warn", which feels a little "safer", so at least people learn about the idea of strictness and then can make a decision to leave it or set to true or false?

this.settings.nonstrict("unicodeTextInMathMode",
`Unicode text character "${text[0]}" used in math mode`,
nucleus);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about allowing all unicodes(non-ASCII) on non-strict mode? (#1217, allowAllSymbols)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about that, but planning to deal with it when we return to discussing #1217. allowAllSymbols feels like more of a hack, not to be encouraged -- the rendering will probably be poor in general, and it's really a sign that we need to add symbols to the fonts. So I was thinking of keeping that a separate option... but I'm open to suggestions. (And maybe it is worth discussing here, in case we want to tweak the strict functionality to deal with it...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's where the customizability should come in. Normal users would intend used symbols to be shown, but advanced users or who worries about the layout would want to know unsupported symbol has been used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I agree. I think this is more motivation to set the default strict value to "warn", so at least people get a sense of why the rendering is funny in these cases.

@edemaine
Copy link
Member Author

I've gone ahead and changed the default setting for strict to "warn".

I also added functionality to check for warnings in Jest, turning them into a special type of Error. These changes are also a first step toward #1126.

@edemaine
Copy link
Member Author

@ylemkimon Thanks for the review!
@kevinbarabash Are you OK with this being merged, with the following decisions?

  • Default strict setting is "warn"
  • Allow functions to be specified in strict argument

@edemaine
Copy link
Member Author

edemaine commented May 13, 2018

OK, I propose merging this first, then I can add nonstrict behavior to \\. @kevinbarabash , are you OK with leaving function support in?

@edemaine
Copy link
Member Author

The changes to make strict mode control \newline behavior are extensive, so I'm going to put them in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants