Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upUnicode EM DASH symbol corrupted when copying #2845
Comments
andrewdavidwong
added
bug
C: core
P: minor
labels
Jun 4, 2017
andrewdavidwong
added this to the Release 3.2 updates milestone
Jun 4, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
qubesuser
Jun 19, 2017
I'm seeing this issue too and it seems that the data goes correctly to dom0, and the problem lies in pasting.
In particular, as strange as it looks, it seems that the Xutf8TextListToTextProperty function in Xlib is just broken.
Here's a test program:
#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv)
{
Display* dpy = XOpenDisplay(0);
char* str = argv[1];
XTextProperty ct;
int ret = Xutf8TextListToTextProperty(dpy, (char **)&str, 1, XUTF8StringStyle, &ct);
printf("%i %li %li %li %i %li\n", ret, strlen(str), strlen((char*)ct.value), ct.encoding, ct.format, ct.nitems);
return 0;
}
$ ./a.out abcdefghij 0 10 10 246 8 10 $ ./a.out abcdefghij$'\xe2\x80\x94' 0 13 10 246 8 10
So it just silently drops some UTF-8 characters for no reason, despite the fact that it should be doing no conversion at all! This is with the Xlib shipped in Ubuntu 17.04
It seems that GTK does not use Xlib to convert text, and instead does nothing for UTF-8 -> UTF-8, while for STRING it does not convert to Latin-1, but rather treats STRING as Compound Text and thus removes C0 and C1 characters.
The simplest thing for Qubes is probably to only support the UTF8_STRING target and just copy the data verbatim there.
qubesuser
commented
Jun 19, 2017
•
|
I'm seeing this issue too and it seems that the data goes correctly to dom0, and the problem lies in pasting. In particular, as strange as it looks, it seems that the Xutf8TextListToTextProperty function in Xlib is just broken. Here's a test program:
$ ./a.out abcdefghij 0 10 10 246 8 10 $ ./a.out abcdefghij$'\xe2\x80\x94' 0 13 10 246 8 10 So it just silently drops some UTF-8 characters for no reason, despite the fact that it should be doing no conversion at all! This is with the Xlib shipped in Ubuntu 17.04 It seems that GTK does not use Xlib to convert text, and instead does nothing for UTF-8 -> UTF-8, while for STRING it does not convert to Latin-1, but rather treats STRING as Compound Text and thus removes C0 and C1 characters. The simplest thing for Qubes is probably to only support the UTF8_STRING target and just copy the data verbatim there. |
starius commentedJun 4, 2017
Qubes OS version:
R3.2Affected TemplateVMs:
fedora-23, debian-8
Expected behavior:
I copied and pasted a text that includes "—" symbol (hex "e2 80 94") using Ctrl+Shift+C and Ctrl+Shift+V. I expected my text to be pasted as-is, but the dash symbol corrupted (see below).
Actual behavior:
The symbol was pasted as "�" (hex "c3 a2 c2 80 c2 94").
Steps to reproduce the behavior:
In one VM:
select the symbol with mouse, click right mouse button, choose "Copy". Then press Ctrl+Shift+C.
In another VM: press Ctrl+Shift+V. Open a terminal or a text editor (gedit), click right mouse button, choose "Paste". The symbol pasted corrupted:
General notes:
Info about the symbol:
See also http://www.fileformat.info/info/unicode/char/2014/index.htm