Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect length of UTF-8 string for pango layout #393

Merged
merged 1 commit into from
Apr 15, 2020

Conversation

tomyun
Copy link
Contributor

@tomyun tomyun commented Apr 11, 2020

I believe this is also related to GiovineItalia/Gadfly.jl#1410.

julia> using Compose
julia> import Cairo, Fontconfig

before

julia> Compose.text_extents("", 8pt, "α")

(process:35143): Pango-WARNING **: 21:27:47.868: pango_layout_set_markup_with_accel: Error on line 1 char 10: Invalid UTF-8 encoded text in name — not valid “\xce”
1-element Array{Tuple{Measures.Length{:mm,Float64},Measures.Length{:mm,Float64}},1}:
 (1.5075737847222221mm, 1.522732204861111mm)

julia> Compose.text_extents("", 8pt, "αα")
1-element Array{Tuple{Measures.Length{:mm,Float64},Measures.Length{:mm,Float64}},1}:
 (1.5075737847222221mm, 1.522732204861111mm)

Note that the first case complained about a clipped string array, but still got a correct width. The length of UTF-8 string array for "α" supposed to be 2 ("\xce\xb1"), but pango_text_extents internally used a length of Julia String which was 1 for this.

The second case seemed working fine with no complaints, but due to spurious string length, only the first letter was accounted for layout. That's why we had same widths for them which are not we'd expect.

after

julia> Compose.text_extents("", 8pt, "α")
1-element Array{Tuple{Measures.Length{:mm,Float64},Measures.Length{:mm,Float64}},1}:
 (1.5075737847222221mm, 1.522732204861111mm)

julia> Compose.text_extents("", 8pt, "αα")
1-element Array{Tuple{Measures.Length{:mm,Float64},Measures.Length{:mm,Float64}},1}:
 (3.231499565972222mm, 1.522732204861111mm)

Now we no longer have pango warning and get correct widths for the both cases.

@bjarthur
Copy link
Member

thanks for this! sorry for the delay in merging. travis CI hung on julia 1.4 windows initially.

@bjarthur bjarthur merged commit 7a6a0a4 into GiovineItalia:master Apr 15, 2020
@ScottPJones
Copy link
Contributor

@tomyun The length is 1 (it is one character). If you want the number of bytes, you need to use the sizeof function instead.

Copy link
Contributor

@ScottPJones ScottPJones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is incorrect. The convert(String, text) should be left as-is.
Further down, the length needs to be changed to sizeof.

@tomyun
Copy link
Contributor Author

tomyun commented Apr 16, 2020

I don't think this change is necessarily incorrect. pango_layout_set_markup() requires an Ptr{UInt8} array and now we explicitly provide one with codeunits(). I believe this is a canonical way as Julia documentation explains. And textarray is now an AbstractVector{UInt8} wrapper of String text, so we can still use length() on it to get a correct length of the byte array without complication.

@tomyun
Copy link
Contributor Author

tomyun commented Apr 16, 2020

Maybe now I understand where your concern came from. If text was not a regular UTF-8 backed String, but something different like UTF16String/UTF32String from LegacyStrings.jl, code units would be no longer in UInt8 and their length shouldn't be used for counting byte array size. In such case, convert(String, text) is still needed to make sure we always have a UTF-8 String.

@bjarthur
Copy link
Member

should probably either open a new issue or PR so we don't forget about this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants