Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labels do not correctly render languages that require text shaping #2521

Open
mramato opened this issue Feb 27, 2015 · 12 comments
Open

Labels do not correctly render languages that require text shaping #2521

mramato opened this issue Feb 27, 2015 · 12 comments

Comments

@mramato
Copy link
Contributor

mramato commented Feb 27, 2015

As discussed on the forum JavaScript treats surrogate pairs as two characters, when in reality they are one.

To reproduce:

var viewer = new Cesium.Viewer('cesiumContainer');
var scene = viewer.scene;
var camera = viewer.scene.camera;

camera.lookAt(Cesium.Cartesian3.fromDegrees(100.5382368,13.8, 50000),
                     Cesium.Cartesian3.fromDegrees(100.5382368,13.7242002, 0),
                     Cesium.Cartesian3.UNIT_Z);

var labels = scene.primitives.add(new Cesium.LabelCollection());
labels.add({
    position : Cesium.Cartesian3.fromDegrees(100.545624,13.743179),
    text     : 'ตึก'
});

The easiest short-term work around for this is to use a Billboard instead (writeTextToCanvas works fine, this problem is specific to LabelCollection).

@mramato
Copy link
Contributor Author

mramato commented Feb 27, 2015

Using ES6 (and the polyfill available here: http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/) it should be possible to fix LabelCollection to detect this and draw surrogate pairs as a single glyph.

@tonpo
Copy link

tonpo commented Apr 11, 2015

I'm not sure the case in the example is surrogate pairs issue.
(I think it's only the wrong name to this issue, but i'm not sure it will be solved in ES6).

According to the link in the forum post:

Astral code points are pretty easy to recognize: if you need more than 4 hexadecimal digits to represent the code point, it’s an astral code point.

But it seems that each character in the in the example string can be represented as 3 hexadecimal digits.

maybe the issue similar to this:
https://mathiasbynens.be/notes/javascript-unicode#other-grapheme-clusters

BTW:
for surrogate pairs issues one can use twitter-cldr which @goergeBerg linked to in #2543.
it has some helpful functions like 'char_code_at' , 'unpack_string', 'from_char_code' and 'pack_array'.

for example:

TwitterCldr.Utilities.char_code_at('\uD83D\uDCA9',0) //PILE OF POO

returns 128169

while

TwitterCldr.Utilities.char_code_at('\uD83D\uDCA9',1) //PILE OF POO

returns NaN.
and

TwitterCldr.Utilities.unpack_string('\uD83D\uDCA9')  //PILE OF POO

returns [128169]

if we take the example string of this issue:

TwitterCldr.Utilities.unpack_string('ตึก')

we get [3605, 3638, 3585] (still 3 charcters)

@hpinkos
Copy link
Contributor

hpinkos commented Dec 1, 2017

This appears to be fixed
image

@hpinkos hpinkos closed this as completed Dec 1, 2017
@cesium-concierge
Copy link

Congratulations on closing the issue! I found these Cesium forum links in the comments above:

https://groups.google.com/d/msg/cesium-dev/6EA78tUxGRY/xMr9cfJGS1IJ

If this issue affects any of these threads, please post a comment like the following:

The issue at #2521 has just been closed and may resolve your issue. Look for the change in the next stable release of Cesium or get it now in the master branch on GitHub https://github.com/AnalyticalGraphicsInc/cesium.


I am a bot who helps you make Cesium awesome! Contributions to my configuration are welcome.

🌍 🌎 🌏

@scottnc27603
Copy link

We are still seeing disconnected arabic lettering. We've set the enableRightToLeftDetection properly based upon the browsers language. I really don't know much about the rtl language characters, so some of the discussion here is over my head. Should arabic labels appear connected? Does the TwitterCldr fixed this issue?

@hpinkos
Copy link
Contributor

hpinkos commented Mar 16, 2018

@scottnc27603 Could you please paste a short code example to reproduce what you're seeing? Thanks!

@OmarShehata
Copy link
Contributor

I think the original issue still exists as @siloboula shows. writeTextToCanvas itself works fine. It's the fact that the label collection will write one character at a time, which the shaping for the text:

https://github.com/AnalyticalGraphicsInc/cesium/blob/4ee25edb8e7bf15e681d85bd53b73501b3acdc21/Source/Scene/LabelCollection.js#L177-L196

Here's a better example that shows it. The text in the billboard is correct. The one in the label is not.

var viewer = new Cesium.Viewer('cesiumContainer');

Cesium.Label.enableRightToLeftDetection = true;
var text =  "عمر سامح شحاتة ";
viewer.entities.add({
   position: Cesium.Cartesian3.fromDegrees(10, 10),
   label : { text: text}
});

var pinBuilder = new Cesium.PinBuilder();

viewer.entities.add({
    position : Cesium.Cartesian3.fromDegrees(10, 10.000001),
    billboard : {
        image : pinBuilder.fromText(text, Cesium.Color.BLACK, 528).toDataURL(),
        verticalOrigin : Cesium.VerticalOrigin.BOTTOM
    }
});

viewer.zoomTo(viewer.entities);

Sandcastle

@mramato
Copy link
Contributor Author

mramato commented Oct 31, 2019

I would have expected #7280 to have fixed this (which is master only) but that doesn't appear to be the case. Master does fix अनुच्छेद for example

My hunch is that we need to do something special for RTL, such as iterating in the other direction.

@OmarShehata
Copy link
Contributor

My hunch is that we need to do something special for RTL, such as iterating in the other direction.

It's not a unicode issue, and I don't think it's an RTL issue. These are the right characters in the right place, but they're not the right shape. The shaping step in a text rendering system takes care of figuring out the right glyph for the character based on where it is. This is not possible if each character is rendered separately.

This is a good article on this. Some quotes:

Step two assumes that for every character there is exactly one visual representation....assumptions are contradicted by Arabic, a language in which the same character will have different visual representations depending on where it appears in a word

Unfortunately, there is no single simple algorithm that will correctly shape text in every possible script. Human language is too complex and arbitrary for that, and the positioning of glyphs in many languages depends on intricate regional rules. To do it right, you will have to use a text shaping library that tells you which glyphs to display and where to place them.

If the Label is rendering each character to a separate canvas in order to re-use them, that's also an incorrect assumption (the same character can have up to 3 different representations in Arabic). We'd have to at the very least render each word together I think.

@OmarShehata OmarShehata reopened this Oct 31, 2019
@OmarShehata OmarShehata changed the title Labels do not work for surrogate pairs Labels do not correctly render languages that require text shaping Oct 31, 2019
@mramato
Copy link
Contributor Author

mramato commented Oct 31, 2019

Spoke offline with @OmarShehata the best solution is probably to add an option to render labels as whole strings, either at the Entity API level (which would be trivial) or the LabelPrimitive level (which may be more involved).

This would definitely cause memory issues for large collections of labels, so care must be taken either way. The reason Cesium renders per-character to begin with is because per-word rendering does not scale with texture usage so that needs to remain the default behavior. If we could efficiently auto-detect, that's great, but I doubt that's possible.

@siloboula we are certainly interested in fixing this, but there is currently no ETA on when that will happen. We welcome pull requests if you would like to propose a solution. See our Contributing Guide if you want to give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants
@pjcozzi @scottnc27603 @mramato @OmarShehata @hpinkos @tonpo @cesium-concierge and others