-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symbol unicode replacement doesn’t work #243
Comments
well, i didn’t even try to understand this code, but wouldn’t it be possible to loop symbols, and create an index for all literals that maps back to the relevant part of the structure? let unicodeSymbols = {}
for (let mode of ['math', 'text']) {
unicodeSymbols[mode] = {}
for (let macro in symbols[mode]) {
let spec = symbols[mode][macro]
if (spec.replace) {
unicodeSymbols[mode][spec.replace] = { font: spec.font, group: spec.group }
}
}
} and then, using this, we could search for the symbol before the end of the lexer tests: if (input[pos] in unicodeSymbols.text || input[pos] in unicodeSymbols.math)
return new Token(input[pos], ...) and later we make sure that parseSymbol is called for those tokens (if it wouldn’t already) and extend the lines } else if (symbols[mode][nucleus.text]) {
...
new ParseNode(symbols[mode][nucleus.text].group, nucleus.text, mode), to let spec = null
...
} else if ((spec = symbols[mode][nucleus.text]) || (spec = unicodeSymbols[mode][nucleus.text])) {
...
new ParseNode(spec.group, nucleus.text, mode), as alternative to let symbols = {
...
"\\sigma": { ... }
"σ": { ...(same as above)... }
...
} |
It's not quite that simple because (I believe) the character used in the math fonts we use might not necessarily align with the character that's appropriate in the input string. |
you believe? how to be sure? |
I have problems with existing codebase, which contains following symbols: |
@flying-sheep I like the idea of auto-generating symbols based on existing "replace" symbols and just adding them to the |
you’ll first have to check @spicyj’s claim that it may not always be a bidirectional mapping |
I created a jsfiddle that displays the glyphs for all of the "replace" symbols using the unicode character specified in symbols.js: https://jsfiddle.net/047yzexz/1/. It seems like most are the correct character, but a few are showing up as boxes which is probably because the default font doesn't support those. |
umm, that’s called also i added some names… is there some better unicodedata out there that has everything in JSON or so? |
http://www.unicode.org/Public/UCD/latest/ucd/Index.txt should contain all of the names for every unicode entry. Did you want the data to verify those glyphs are being displayed as missing? I'm confident that everything will check out, but it's good to be sure. |
yeah, to check @spicyj’ claim. and i’m already using that list. |
about @qbolec’s case: we have all those symbols, except “−”, which is the real mathematical minus, and is encoded in TeX as the “-” aka hyphen-minus, the thing everyone has on the keyboard. |
@flying-sheep the hyphen is in the jsfiddle and maps correctly to the minus sign. In terms of verify that everything maps correctly, including the missing glyphs, it might be easier to just add the code to do the mapping and then programmatically create KaTeX layouts for each glyph in that list, add them to a page, and check that each pair has identical symbols. |
seems that i overlooked it. great! |
The tool I recently commited in gagern@5e127ba can be used to display KaTeX fonts in browser, together with the corresponding rendering in system default fonts. As far as I could tell from skimming the lists, the symbols all match up except for symbols from the private use area. There are a few of these in the Size1 and Size4 font, apparently for horizontal braces or something like that. And the Typewriter font has typographic single quotation marks at \u07E2 and \u07e3 which is incorrect. |
Were you perhaps referring to #16? That bug has no discussion to it, but it does have an asignee. In trying to sort through the various unicode bugs, this one here seems to have the most momentum to it just now. I'm conducting a short survey, trying to see what else might be useful.
|
fixed in #261 |
I think we just need to exclude those from the list because they have special meaning. If we come across a |
done. at first i was confused since ^ didn’t appear to be in the symbols, but then i remembered that my texteditor’s search interprets things as regex, so ofc it was there! |
The problem still exists. It also concerns angle brackets: '⟩' and '⟨' . |
@qbolec I was going to create a pull request containing the those symbols which appear in only a single family, but I haven't got around to it. Unfortunately '⋅' appears a few, see #261 (comment) for details. |
@kevinbarabash Adding the extra symbols would be easy, now that we have d423bec to get them past the lexer without hassle. What has me worried is the opt-in nature of this feature, which we agreed upon in response to #261 (comment). We need that configuration information available and taken into account in all the places where we make use of the symbols table, and I haven't yet decided on the most elegant way to achieve that. |
I found one resource containing the mappings between Unicode characters and the corresponding Macros, probably could be useful. http://ctan.math.washington.edu/tex-archive/macros/latex/contrib/unicode-math/unimath-symbols.pdf |
I'd like to reopen this issue since simple greek letters like µ generates two warnings:
According to the documentation at https://katex.org/docs/supported.html#letters-and-unicode , unicode versions of greek letters "will render properly in any KaTeX rendering mode". |
This sounds like a documentation issue. I believe they are only supported (and supposed to work) in math mode. I believe this is how LaTeX behaves as well. |
@edemaine thanks for your reply. I digged deeper and found out that I've used µ U+00B5 MICRO SIGN, instead of μ U+03BC GREEK SMALL LETTER MU that katex handles just fine. Mystery solved! Stupid Unicode with its duplicates. |
Stupid Unicode trying to encode all of human written language is an interoperable way 😉 |
test case
the trivial
katex.renderToString('σ')
throwsParseError: KaTeX parse error: Unexpected character: 'σ' at position 0: ̲σ
The text was updated successfully, but these errors were encountered: