Symbol unicode replacement doesn’t work #243

flying-sheep · 2015-06-09T09:09:03Z

the trivial katex.renderToString('σ') throws ParseError: KaTeX parse error: Unexpected character: 'σ' at position 0: ̲σ

The text was updated successfully, but these errors were encountered:

sophiebits · 2015-06-09T23:09:39Z

I thought we had a tracking issue for this already, but I guess not. Brief discussion on #73 and #59.

flying-sheep · 2015-06-10T07:46:16Z

well, i didn’t even try to understand this code, but wouldn’t it be possible to loop symbols, and create an index for all literals that maps back to the relevant part of the structure?

let unicodeSymbols = {}
for (let mode of ['math', 'text']) {
    unicodeSymbols[mode] = {}
    for (let macro in symbols[mode]) {
        let spec = symbols[mode][macro]
        if (spec.replace) {
            unicodeSymbols[mode][spec.replace] = { font: spec.font, group: spec.group }
        }
    }
}

and then, using this, we could search for the symbol before the end of the lexer tests:

if (input[pos] in unicodeSymbols.text || input[pos] in unicodeSymbols.math)
    return new Token(input[pos], ...)

and later we make sure that parseSymbol is called for those tokens (if it wouldn’t already) and extend the lines

} else if (symbols[mode][nucleus.text]) {
...
    new ParseNode(symbols[mode][nucleus.text].group, nucleus.text, mode),

to

let spec = null
...
} else if ((spec = symbols[mode][nucleus.text]) || (spec = unicodeSymbols[mode][nucleus.text])) {
...
    new ParseNode(spec.group, nucleus.text, mode),

as alternative to unicodeSymbols, we could simply alias the elements in symbols to be both keyed by the macro (e.g. "\\sigma") and their “.replace” (e.g. "σ"):

let symbols = {
    ...
    "\\sigma": { ... }
    "σ": { ...(same as above)... }
    ...
}

sophiebits · 2015-06-10T17:32:51Z

It's not quite that simple because (I believe) the character used in the math fonts we use might not necessarily align with the character that's appropriate in the input string.

flying-sheep · 2015-06-10T22:08:14Z

you believe? how to be sure?

qbolec · 2015-06-13T10:20:34Z

I have problems with existing codebase, which contains following symbols:
'−'
'⟨'
'⋅'
'≤'
'≥'
'α'

kevinbarabash · 2015-06-13T17:51:27Z

@flying-sheep I like the idea of auto-generating symbols based on existing "replace" symbols and just adding them to the symbols dictionary. It doesn't cover every unicode char, but it's a good start. Would you be able to create a pull request that automatically adds entries to symbols based on existing entries? I think it's more maintainable than adding aliases manually, people don't have to grab the unicode character when adding a new entry.

flying-sheep · 2015-06-13T20:13:24Z

you’ll first have to check @spicyj’s claim that it may not always be a bidirectional mapping

kevinbarabash · 2015-06-13T21:26:38Z

I created a jsfiddle that displays the glyphs for all of the "replace" symbols using the unicode character specified in symbols.js: https://jsfiddle.net/047yzexz/1/. It seems like most are the correct character, but a few are showing up as boxes which is probably because the default font doesn't support those.

flying-sheep · 2015-06-14T08:49:25Z

umm, that’s called textContent, not innerText…

also i added some names… is there some better unicodedata out there that has everything in JSON or so?

kevinbarabash · 2015-06-14T15:10:27Z

http://www.unicode.org/Public/UCD/latest/ucd/Index.txt should contain all of the names for every unicode entry. Did you want the data to verify those glyphs are being displayed as missing? I'm confident that everything will check out, but it's good to be sure.

flying-sheep · 2015-06-16T09:11:09Z

Did you want the data to verify those glyphs are being displayed as missing

yeah, to check @spicyj’ claim. and i’m already using that list.

flying-sheep · 2015-06-16T11:59:17Z

about @qbolec’s case: we have all those symbols, except “−”, which is the real mathematical minus, and is encoded in TeX as the “-” aka hyphen-minus, the thing everyone has on the keyboard.

kevinbarabash · 2015-06-16T13:36:37Z

@flying-sheep the hyphen is in the jsfiddle and maps correctly to the minus sign. In terms of verify that everything maps correctly, including the missing glyphs, it might be easier to just add the code to do the mapping and then programmatically create KaTeX layouts for each glyph in that list, add them to a page, and check that each pair has identical symbols.

flying-sheep · 2015-06-16T14:30:12Z

seems that i overlooked it. great!

gagern · 2015-06-20T14:53:15Z

The tool I recently commited in gagern@5e127ba can be used to display KaTeX fonts in browser, together with the corresponding rendering in system default fonts. As far as I could tell from skimming the lists, the symbols all match up except for symbols from the private use area. There are a few of these in the Size1 and Size4 font, apparently for horizontal braces or something like that. And the Typewriter font has typographic single quotation marks at \u07E2 and \u07e3 which is incorrect.

gagern · 2015-06-20T15:53:18Z

@spicyj wrote:

I thought we had a tracking issue for this already, but I guess not.

Were you perhaps referring to #16? That bug has no discussion to it, but it does have an asignee.

In trying to sort through the various unicode bugs, this one here seems to have the most momentum to it just now. I'm conducting a short survey, trying to see what else might be useful.

Apparently pull requests Add unicode support #59 and unicode support (including ∑∏∐ ∫ and √) #73 were closed by the contributor, perhaps due to lack of positive feedback? There the approach was to list unicode symbols with common characteristics (same font, same group type) in a string literal, which is an alternative approach to the auto-generation discussed above. I guess they might still be used to cross-check anything that's been done here, and identify contradictions to be investigated further.
Incorporate existing symbol mappings #49 suggests a tool which maps unicode to LaTeX. We might use that to check any results here for completeness, although I think it would be easier to check whether there are any symbols, particularly in the Main and AMS fonts, which are not accessible by one of our symbol definitions.
I just opened Accept unicode from mathematical alphabets on input #260, to cover unicode symbols which we map to latin letters from some special font.
problems with unicode #65 mentions accented characters in the original statement. I believe that there might be a distinction there for text vs. math. For text it could have been handled by allowing unicode in text fragments, as Support unicode in text #15 suggests. For math mode we'd probably need more information, and a font to render it, so that's way more complicated.

fixes KaTeX#243

flying-sheep · 2015-06-20T17:42:29Z

fixed in #261

kevinbarabash · 2015-06-20T17:54:00Z

I think we just need to exclude those from the list because they have special meaning. If we come across a _ or ^ it should not be parsed as a symbol.

flying-sheep · 2015-06-20T17:57:25Z

done. at first i was confused since ^ didn’t appear to be in the symbols, but then i remembered that my texteditor’s search interprets things as regex, so ofc it was there!

sophiebits · 2015-06-20T23:19:47Z

Were you perhaps referring to #16?

@gagern Yes, thanks.

fixes KaTeX#243

qbolec · 2015-10-13T16:43:01Z

The problem still exists. It also concerns angle brackets: '⟩' and '⟨' .

kevinbarabash · 2015-10-13T22:56:22Z

@qbolec I was going to create a pull request containing the those symbols which appear in only a single family, but I haven't got around to it. Unfortunately '⋅' appears a few, see #261 (comment) for details.

gagern · 2015-10-14T13:21:46Z

@kevinbarabash Adding the extra symbols would be easy, now that we have d423bec to get them past the lexer without hassle.

What has me worried is the opt-in nature of this feature, which we agreed upon in response to #261 (comment). We need that configuration information available and taken into account in all the places where we make use of the symbols table, and I haven't yet decided on the most elegant way to achieve that.

whykushal93 · 2017-05-02T23:59:11Z

I found one resource containing the mappings between Unicode characters and the corresponding Macros, probably could be useful. http://ctan.math.washington.edu/tex-archive/macros/latex/contrib/unicode-math/unimath-symbols.pdf

kalvdans · 2024-05-02T13:17:01Z

I'd like to reopen this issue since simple greek letters like µ generates two warnings:

LaTeX-incompatible input and strict mode is set to 'warn': Unrecognized Unicode character "µ" (181) [unknownSymbol] katex.min.js:1:5587
No character metrics for 'µ' in style 'Main-Regular' and mode 'text'

According to the documentation at https://katex.org/docs/supported.html#letters-and-unicode , unicode versions of greek letters "will render properly in any KaTeX rendering mode".

edemaine · 2024-05-02T14:00:09Z

This sounds like a documentation issue. I believe they are only supported (and supposed to work) in math mode. I believe this is how LaTeX behaves as well.

kalvdans · 2024-05-02T18:35:25Z

@edemaine thanks for your reply. I digged deeper and found out that I've used µ U+00B5 MICRO SIGN, instead of μ U+03BC GREEK SMALL LETTER MU that katex handles just fine. Mystery solved! Stupid Unicode with its duplicates.

flying-sheep · 2024-05-03T08:37:18Z

Stupid Unicode trying to encode all of human written language is an interoperable way 😉

gagern mentioned this issue Jun 20, 2015

Accept unicode from mathematical alphabets on input #260

Closed

flying-sheep added a commit to flying-sheep/KaTeX that referenced this issue Jun 20, 2015

first attempt of adding all unicode stuff

3d6e5ea

fixes KaTeX#243

flying-sheep mentioned this issue Jun 20, 2015

Add unicode symbols to symbols table #261

Closed

sophiebits mentioned this issue Jun 20, 2015

Handle unicode symbols #16

Closed

flying-sheep added a commit to flying-sheep/KaTeX that referenced this issue Jun 21, 2015

Added unicode symbols

de2b853

fixes KaTeX#243

kevinbarabash added the Unicode label Jun 22, 2015

gagern mentioned this issue Oct 8, 2015

Support unicode in text #15

Closed

jonathanolson mentioned this issue Nov 14, 2015

Library needs for math sims phetsims/scenery#457

Closed

sophiebits mentioned this issue Apr 16, 2016

Is there any versioning of KaTeX? #462

Closed

gagern mentioned this issue Nov 1, 2016

unicode support (including ∑∏∐ ∫ and √) #73

Closed

kevinbarabash self-assigned this Jan 31, 2018

kevinbarabash mentioned this issue Jan 31, 2018

add support for unicode angle brackets, single vertical bar, double vertical bar #1123

Merged

kevinbarabash closed this as completed in #1123 Feb 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Symbol unicode replacement doesn’t work #243

Symbol unicode replacement doesn’t work #243

flying-sheep commented Jun 9, 2015

sophiebits commented Jun 9, 2015

flying-sheep commented Jun 10, 2015

sophiebits commented Jun 10, 2015

flying-sheep commented Jun 10, 2015

qbolec commented Jun 13, 2015

kevinbarabash commented Jun 13, 2015

flying-sheep commented Jun 13, 2015

kevinbarabash commented Jun 13, 2015

flying-sheep commented Jun 14, 2015

kevinbarabash commented Jun 14, 2015

flying-sheep commented Jun 16, 2015

flying-sheep commented Jun 16, 2015

kevinbarabash commented Jun 16, 2015

flying-sheep commented Jun 16, 2015

gagern commented Jun 20, 2015

gagern commented Jun 20, 2015

flying-sheep commented Jun 20, 2015

kevinbarabash commented Jun 20, 2015

flying-sheep commented Jun 20, 2015

sophiebits commented Jun 20, 2015

qbolec commented Oct 13, 2015

kevinbarabash commented Oct 13, 2015

gagern commented Oct 14, 2015

whykushal93 commented May 2, 2017

kalvdans commented May 2, 2024

edemaine commented May 2, 2024

kalvdans commented May 2, 2024

flying-sheep commented May 3, 2024 •

edited

Symbol unicode replacement doesn’t work #243

Symbol unicode replacement doesn’t work #243

Comments

flying-sheep commented Jun 9, 2015

sophiebits commented Jun 9, 2015

flying-sheep commented Jun 10, 2015

sophiebits commented Jun 10, 2015

flying-sheep commented Jun 10, 2015

qbolec commented Jun 13, 2015

kevinbarabash commented Jun 13, 2015

flying-sheep commented Jun 13, 2015

kevinbarabash commented Jun 13, 2015

flying-sheep commented Jun 14, 2015

kevinbarabash commented Jun 14, 2015

flying-sheep commented Jun 16, 2015

flying-sheep commented Jun 16, 2015

kevinbarabash commented Jun 16, 2015

flying-sheep commented Jun 16, 2015

gagern commented Jun 20, 2015

gagern commented Jun 20, 2015

flying-sheep commented Jun 20, 2015

kevinbarabash commented Jun 20, 2015

flying-sheep commented Jun 20, 2015

sophiebits commented Jun 20, 2015

qbolec commented Oct 13, 2015

kevinbarabash commented Oct 13, 2015

gagern commented Oct 14, 2015

whykushal93 commented May 2, 2017

kalvdans commented May 2, 2024

edemaine commented May 2, 2024

kalvdans commented May 2, 2024

flying-sheep commented May 3, 2024 • edited

flying-sheep commented May 3, 2024 •

edited