WriteConsoleW used with ConEmu duplicates Chinese characters output #945

Nelson-numerical-software · 2016-11-09T19:39:57Z

Versions

ConEmu build: 161023 x64 stable
OS version: Windows 10 x64 (1607)
Microsoft Windows [version 10.0.14959] cmd

Problem description

WriteConsoleW duplicates chinese characters

Steps to reproduce

Actual results

Output: Traditional Chinese 漢漢字字

Expected results

Original string: Traditional Chinese 漢字

Additional files

build this code with VS 2015 C++:

#include <Windows.h>
#include

int main()
{
std::wstring msg = L"Traditional Chinese 漢字";
HANDLE consoleHandle = GetStdHandle(STD_OUTPUT_HANDLE);
WriteConsoleW(consoleHandle, msg.c_str(), msg.size(), NULL, NULL);
return 0;
}

Maximus5 · 2016-11-10T09:02:29Z

Why do you talk about the WriteConsoleW? Have you checked the result in the RealConsole by Ctrl-Win-Alt-Space?
Please run from ConEmu's prompt ConEmuC -checkunicode and show result here.

Nelson-numerical-software · 2016-11-10T18:53:35Z

1]
It seems that it is also a bug of Windows 10 insiders 14959, 14965
With a Windows 10 stable version 1607 and same version of ConEmu 161023, it works .

2]
Please notice duplicated characters 中中文文

ConEmuC -checkunicode
ConEmu 161022 x86
OS Version: 10.0.14965 (2:)
SM_IMMENABLED=1, SM_DBCSENABLED=0, ACP=1252, OEMCP=850
ConHWND=0x00090634, Class="ConsoleWindowClass"
Console font info: 0, {3x5}, 54, 400, "Lucida Console"
Handles: In=x8 (Mode=x1F7) Out=xC (x3) Err=x10 (x3)
Buffer={131,1000} Window={0,0}-{130,35} MaxSize={131,166}
Cursor: Pos={0,9} Size=25% Visible
ConsoleCP=850, ConsoleOutputCP=850
CP850: Max=1 Def=x3F,x00 UDef=x3F
Lead=x00,x00,x00,x00,x00,x00,x00,x00,x00,x00,x00,x00
Name="850 (OEM - latin multilingue I)"

123456789也也不不是是可可运运行行的的程程序序１１２２３３４４５５６６７７８８９９
Normal Reverse x7 x4007 Normal:x7 Reverse:x4007

Check AÀÀΑΑ╬╬豈豈ＡＡꊠꊠ黠黠だだ➀ጀะڰЯ09
Text: AÀÀΑΑ╬╬豈豈ＡＡꊠꊠ黠黠だだ➀ጀะڰЯ09
Read: A:x7 ÀÀ:x107 ΑΑ:x207 ╬╬:x107 豈豈:x207 ＡＡ:x107 ꊠꊠ:x207 黠黠:x107 だだ:x207 ➀:x107 ጀ:x207 ะ:x107 ڰ:x207 Я:x107 0:x207 9:x107
Blck: A:x7 ÀÀ:x107 ÀÀ:x207 ΑΑ:x107 ΑΑ:x207 ╬╬:x107 ╬╬:x207 豈豈:x107 豈豈:x207 ＡＡ:x107 ＡＡ:x207 ꊠꊠ:x107 ꊠꊠ:x207 黠黠:x107 黠黠:x207 だだ:x107 Info: 0,1,1,16,1,1,24,1

╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╦╦══
════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗║ 中中文文 ║中中
文文║╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
╩╩════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
══╝
Unicode check succeeded

Maximus5 · 2016-11-10T19:00:33Z

@miniksa Can you take a look at this? Reported already several times here.

miniksa · 2016-11-10T19:03:46Z

@Maximus5 I've filed it as MSFT:9751066 internally and assigned to myself. I'm currently in a deep thought on something else, so I'll probably get to it early next week. Thanks for the report.

miniksa · 2016-11-15T20:31:39Z

I see the issue. There appear to be duplicates coming out of ReadConsoleOutputW/A. I'm not sure what happened there. I'll have to keep investigating, but it looks like it will need a fix on our side once I figure it out.

Maximus5 · 2016-11-15T21:17:58Z

Perhaps this comes from changes in attributes processing. I noted some time ago (not sure where exactly) that new Windows build process high byte of console attributes "in proper and better way"...
One of the most weird things in conhost is COMMON_LVB_LEADING_BYTE/COMMON_LVB_TRAILING_BYTE processing. It works differently on DBCS (Chenese/Japanese/...) Windows distros than on "European" distros. On DBCS systems, when certain CJK codepages are selected, each double-width glyph takes two (or more?) CHAR_INFOs (cells). That never happened on European distros, even if CJK support was installed and these codepages were selected in the console.
I can't reproduce this issue on my test Win 10 boxes yet.

miniksa · 2016-11-21T16:38:22Z

FYI, I haven't forgotten about this investigation. We've just suddenly got slammed with e-mails and bugs from all sources and so getting to investigating this may take me significantly longer than I originally predicted. I will be back when I get a chance.

miniksa · 2017-01-20T16:11:52Z

FYI, the fix for this should have just landed with Insider Build 15014 today.

ncihnegn · 2017-01-24T09:24:29Z

Just tested Build 15014. Not fixed yet.

miniksa · 2017-01-24T16:11:07Z

Hmmm. Not sure what's up. I'll dig into character handling stuff today.

Maximus5 · 2017-01-25T08:41:44Z

@miniksa Finally I managed to install insider build.

First, the expected behavior from "stable" Win10 build. All glyphs are written and displayed properly, no doubled CJK and data properly fit on screen.

Now the 15014.

I'm still checking the results, here first notes.

Regardless the fact SM_DBCSENABLED is 0, COMMON_LVB_LEADING_BYTE and COMMON_LVB_TRAILING_BYTE are set. Is that intended on non-DBCS enabled OS? There were not used previously, only CJK versions of Windows (up to Win 10 14393) used them.
More worse that even conhost treats CJK glyphs in different ways.

Somewhere it shows them (by squares, yep) supposing they have double-cell width, somewhere - single-cell width.
When ConEmu writes 80 characters (the console width) on non-CJK Windows, the data is expected to be written properly without wrapping. But that's not true anymore. Even in conhost's window we may see that only 77 characters (I counted them) were written under the frame (the line with three CJK glyphs).

Finally. Here are drawing bugs during selection in conhost's window. I selected one by one cells with mouse. Cells have unexpected width during selection. And strangely the line below the selection is broken during selection.

Maximus5 · 2017-01-25T09:49:09Z

@miniksa Inconsistency of API... WriteConsoleOutputAttribute, WriteConsoleOutputCharacter, ReadConsoleOutputCharacter, ReadConsoleOutputAttribute, ReadConsoleOutput...
Some of functions treat CJK as normal single-cell glyphs (WriteConsoleOutputCharacter, ReadConsoleOutputCharacter).
Some of functions return COMMON_LVB_LEADING_BYTE/COMMON_LVB_TRAILING_BYTE and therefore double cells (ReadConsoleOutputAttribute, ReadConsoleOutput).
Some of functions has undefined behavior (after WriteConsoleOutputAttribute and further WriteConsoleOutputCharacter glyphs are "written" after filled with attributes cells).
It's all on non-CJK insider Win 10.

miniksa · 2017-01-25T16:07:38Z

Yeah, I was finding bad behavior like this yesterday as well. Part of the deal is that it behaves differently with Raster Fonts vs. TrueType fonts as well. I'll probably be spending the rest of the week on trying to fix this up and make it consistent. I don't know what SM_DBCSENABLED is/does. Console's DBCS check has always been based on the active code page (is equal to 932, 949, 950, 936) not that system metric.

I'll try to keep you posted as I figure this out. Sorry about that. A few of us have been working on trying to fit UTF-8 support into the console (not done yet) and it appears to have messed up quite a few DBCS routes.

Maximus5 · 2017-01-25T16:25:11Z

I used to check GetSystemMetrics(SM_DBCSENABLED) which actually was 1 only for Windows installations developed for China, Japan, Korea (CJK).
If SM_DBCSENABLED returns 0 that meant that CJK glyphs use only one cell in conhost, regardless of the codepage.
That was true before.
Now it is broken or changed.
What is correct behavior?

miniksa · 2017-01-25T16:35:59Z

I'll have to get back to you on that. Everything you are telling me about SM_DBCSENABLED is 100% new information to me. I don't really know if that particular metric used to be a part of the console code in XP/Vista/7/8. I can look. I also don't know what in the system turns that metric on or off.

From what I know about the console from Win 8.1 to today, the console always did its conversions and width calculations based on code page. It's just that prior to recently, it used to prohibit changing into a CJK codepage unless your system's non-Unicode region was set to a CJK language (Control Panel-->Region-->Administrative-->Language for non-Unicode programs). I've been trying to remove that restriction to allow anyone to swap into any codepage no matter their "non-Unicode region" because in today's editions of Windows (as opposed to the CJK-specific ones of the 1990s), you can add just about any language pack and IME and font to any language edition of Windows, so the "non-Unicode" region doesn't really matter like it used to several decades ago.

My plan is:

Go back into the DBCS tests and expand them significantly across these APIs against the v1 console (legacy) since everything was "fine" before we mucked around with it.
Correct anything in v2 that is no longer compliant with the DBCS tests (including all the APIs you listed above).
Get that set of fixes and tests shipping up toward the insiders build.
Get you some documentation/research on how all this is supposed to work (including that SM_DBCSENABLED flag in the console context) and potentially get that sort of information published to MSDN.

miniksa · 2017-02-07T18:12:25Z

So I've got through 1, 2, and 3 in MSFT: 10187355 which is checked in and will start shipping up to Insiders builds. Probably be there in a few weeks. I've basically restored the console's behavior to the same as what it was for the legacy console. If it works against the console with the legacy box checked, it will work again against the updated one once the Insider build updates.

For part 4, I'm still working on it. I basically need to write up the way that the v1/legacy console did it and publish that.

rprichard · 2017-03-17T04:08:37Z

@miniksa @Maximus5 FWIW, this VSCode/winpty issue seems related: microsoft/vscode#19665. ConEmu is broken in exactly the same way (screenshot in this comment, microsoft/vscode#19665 (comment)). I wrote a test case demonstrating the new (broken?) behavior as of Win10 v15048.

bao-qian · 2017-06-30T04:07:48Z

hi
I have no such problem in previous windows build (15063.413) for simplified Chinese.
I only noticed such issue after latest stable windows build 15063.447 rolled out:
alpha build works almost fine with new console.

stable and preview build works find with legacy console

faiz-lisp · 2020-03-17T01:26:10Z

I try Chinese on the UTF8 version of Newlisp.
https://github.com/kosh04/newlisp/blob/develop/nl-utf8.c
It works well.

https://stackoverflow.com/questions/3911536/utf-8-unicode-whats-with-0xc0-and-0x80
(I hope it could help.)

Maximus5 added the drawing-cjk label Nov 10, 2016

Maximus5 mentioned this issue Jan 7, 2017

chinese word repeat in bash on windows 10 rs1 14393 #813

Open

Maximus5 closed this as completed Jan 25, 2017

Maximus5 reopened this Jan 25, 2017

Maximus5 added a commit that referenced this issue Jan 26, 2017

gh-945: Temporary fix for doubled CJK on non-CJK Win10 14959+.

ec85b37

Maximus5 added a commit that referenced this issue Feb 27, 2017

gh-945: Temporary fix for doubled CJK on non-CJK Win10 14959+.

9bdf360

terepanda mentioned this issue Apr 18, 2017

When I input Japanese characters, cursor was shown far away from the text I inputted. #1111

Closed

HerringtonDarkholme mentioned this issue Sep 30, 2017

Cmder duplicate character 'Á' cmderdev/cmder#1481

Closed

DanielRosenwasser mentioned this issue Oct 31, 2017

Chinese characters are repeated in localized diagnostics microsoft/TypeScript#19616

Closed

HBelusca mentioned this issue Jan 27, 2020

[CONSRV] Miscellaneous console fixes for CJK support and screenbuffer iteration. reactos/reactos#2278

Merged

burgerrg mentioned this issue Mar 16, 2020

Issue about the update of Chez Scheme 9.5.3 (10 Jan, 2020) cisco/ChezScheme#504

Open

Zeroes1 mentioned this issue Jul 12, 2022

trouble with "Fullwidth-aware rendering" option in FAR #2458

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WriteConsoleW used with ConEmu duplicates Chinese characters output #945

WriteConsoleW used with ConEmu duplicates Chinese characters output #945

Nelson-numerical-software commented Nov 9, 2016

Maximus5 commented Nov 10, 2016

Nelson-numerical-software commented Nov 10, 2016

Maximus5 commented Nov 10, 2016

miniksa commented Nov 10, 2016

miniksa commented Nov 15, 2016

Maximus5 commented Nov 15, 2016

miniksa commented Nov 21, 2016

miniksa commented Jan 20, 2017

ncihnegn commented Jan 24, 2017

miniksa commented Jan 24, 2017

Maximus5 commented Jan 25, 2017 •

edited

Loading

Maximus5 commented Jan 25, 2017

miniksa commented Jan 25, 2017

Maximus5 commented Jan 25, 2017

miniksa commented Jan 25, 2017

miniksa commented Feb 7, 2017

rprichard commented Mar 17, 2017

bao-qian commented Jun 30, 2017 •

edited

Loading

faiz-lisp commented Mar 17, 2020 •

edited

Loading

WriteConsoleW used with ConEmu duplicates Chinese characters output #945

WriteConsoleW used with ConEmu duplicates Chinese characters output #945

Comments

Nelson-numerical-software commented Nov 9, 2016

Versions

Problem description

Steps to reproduce

Actual results

Expected results

Additional files

Maximus5 commented Nov 10, 2016

Nelson-numerical-software commented Nov 10, 2016

Maximus5 commented Nov 10, 2016

miniksa commented Nov 10, 2016

miniksa commented Nov 15, 2016

Maximus5 commented Nov 15, 2016

miniksa commented Nov 21, 2016

miniksa commented Jan 20, 2017

ncihnegn commented Jan 24, 2017

miniksa commented Jan 24, 2017

Maximus5 commented Jan 25, 2017 • edited Loading

Now the 15014.

Maximus5 commented Jan 25, 2017

miniksa commented Jan 25, 2017

Maximus5 commented Jan 25, 2017

miniksa commented Jan 25, 2017

miniksa commented Feb 7, 2017

rprichard commented Mar 17, 2017

bao-qian commented Jun 30, 2017 • edited Loading

faiz-lisp commented Mar 17, 2020 • edited Loading

Maximus5 commented Jan 25, 2017 •

edited

Loading

bao-qian commented Jun 30, 2017 •

edited

Loading

faiz-lisp commented Mar 17, 2020 •

edited

Loading