New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode EM DASH symbol corrupted when copying #2845

Open
starius opened this Issue Jun 4, 2017 · 1 comment

Comments

Projects
None yet
3 participants
@starius

starius commented Jun 4, 2017

Qubes OS version:

R3.2

Affected TemplateVMs:

fedora-23, debian-8


Expected behavior:

I copied and pasted a text that includes "—" symbol (hex "e2 80 94") using Ctrl+Shift+C and Ctrl+Shift+V. I expected my text to be pasted as-is, but the dash symbol corrupted (see below).

Actual behavior:

The symbol was pasted as "�" (hex "c3 a2 c2 80 c2 94").

Steps to reproduce the behavior:

In one VM:

$ echo '0000000: e280 940a' | xxd -r
—

select the symbol with mouse, click right mouse button, choose "Copy". Then press Ctrl+Shift+C.

In another VM: press Ctrl+Shift+V. Open a terminal or a text editor (gedit), click right mouse button, choose "Paste". The symbol pasted corrupted:

dash

General notes:

Info about the symbol:

$ unicode —
U+2014 EM DASH
UTF-8: e2 80 94  UTF-16BE: 2014  Decimal: —
— (—)
Uppercase: U+2014
Category: Pd (Punctuation, Dash)
Bidi: ON (Other Neutrals)

See also http://www.fileformat.info/info/unicode/char/2014/index.htm

@qubesuser

This comment has been minimized.

Show comment
Hide comment
@qubesuser

qubesuser Jun 19, 2017

I'm seeing this issue too and it seems that the data goes correctly to dom0, and the problem lies in pasting.

In particular, as strange as it looks, it seems that the Xutf8TextListToTextProperty function in Xlib is just broken.

Here's a test program:

#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv)
{
	Display* dpy = XOpenDisplay(0);
	char* str = argv[1];
	XTextProperty ct;
        int ret = Xutf8TextListToTextProperty(dpy, (char **)&str, 1, XUTF8StringStyle, &ct);
	printf("%i %li %li %li %i %li\n", ret, strlen(str), strlen((char*)ct.value), ct.encoding, ct.format, ct.nitems);
	return 0;
}
$ ./a.out abcdefghij
0 10 10 246 8 10
$ ./a.out abcdefghij$'\xe2\x80\x94'
0 13 10 246 8 10

So it just silently drops some UTF-8 characters for no reason, despite the fact that it should be doing no conversion at all! This is with the Xlib shipped in Ubuntu 17.04

It seems that GTK does not use Xlib to convert text, and instead does nothing for UTF-8 -> UTF-8, while for STRING it does not convert to Latin-1, but rather treats STRING as Compound Text and thus removes C0 and C1 characters.

The simplest thing for Qubes is probably to only support the UTF8_STRING target and just copy the data verbatim there.

qubesuser commented Jun 19, 2017

I'm seeing this issue too and it seems that the data goes correctly to dom0, and the problem lies in pasting.

In particular, as strange as it looks, it seems that the Xutf8TextListToTextProperty function in Xlib is just broken.

Here's a test program:

#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char** argv)
{
	Display* dpy = XOpenDisplay(0);
	char* str = argv[1];
	XTextProperty ct;
        int ret = Xutf8TextListToTextProperty(dpy, (char **)&str, 1, XUTF8StringStyle, &ct);
	printf("%i %li %li %li %i %li\n", ret, strlen(str), strlen((char*)ct.value), ct.encoding, ct.format, ct.nitems);
	return 0;
}
$ ./a.out abcdefghij
0 10 10 246 8 10
$ ./a.out abcdefghij$'\xe2\x80\x94'
0 13 10 246 8 10

So it just silently drops some UTF-8 characters for no reason, despite the fact that it should be doing no conversion at all! This is with the Xlib shipped in Ubuntu 17.04

It seems that GTK does not use Xlib to convert text, and instead does nothing for UTF-8 -> UTF-8, while for STRING it does not convert to Latin-1, but rather treats STRING as Compound Text and thus removes C0 and C1 characters.

The simplest thing for Qubes is probably to only support the UTF8_STRING target and just copy the data verbatim there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment