Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadConsole does not work with utf-8 codepage #16020

Closed
tholp opened this issue Sep 22, 2023 · 9 comments
Closed

ReadConsole does not work with utf-8 codepage #16020

tholp opened this issue Sep 22, 2023 · 9 comments
Labels
Area-Input Related to input processing (key presses, mouse, etc.) Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Conhost For issues in the Console codebase Resolution-Duplicate There's another issue on the tracker that's pretty much the same thing.

Comments

@tholp
Copy link

tholp commented Sep 22, 2023

Windows Terminal version

No response

Windows build number

No response

Other Software

No response

Steps to reproduce

Use SetConsoleCP to use utf-8 (or if in utf8 beta-mode in windows).
Read using ReadConsole

Inputting "abcæøå" the input buffer will contain 0x61 0x62 0x63 0x00 0x00 0x00 0xD 0x0A

Expected Behavior

utf-8 characters for the letters æøå.

Actual Behavior

nulls

@tholp tholp added Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Sep 22, 2023
@zadjii-msft
Copy link
Member

Are you using ReadConsoleA or ReadConsoleW/? (does your main take chars or wchar_ts?)

I'd bet this is the thing that @lhecker just re-wrote the entire input buffer to fix ☺️

@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Sep 22, 2023
@tholp
Copy link
Author

tholp commented Sep 22, 2023

I use ReadConsoleW and actually I have no problems using utf-16 or Codepage 1252. ReadConsoleW reads into to a buffer and I have written what bytes the buffer contains, i.e. without interpretating anything as characters. So the types char and wchar_t are irrelevant.

@microsoft-github-policy-service microsoft-github-policy-service bot added Needs-Attention The core contributors need to come back around and look at this ASAP. and removed Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something labels Sep 22, 2023
@tholp
Copy link
Author

tholp commented Sep 22, 2023

By the way, the problem is the same both for the Windows Terminal and the Windows Console Host.

@lhecker
Copy link
Member

lhecker commented Sep 22, 2023

@tholp Some parts of your comments must be incorrect. You write that your buffer contains:

0x61 0x62 0x63 0x00 0x00 0x00 0xD 0x0A

And you said (emphasis mine):

ReadConsoleW reads into to a buffer and I have written what bytes the buffer contains

If you had truly used the W variant it would've filled your buffer with 16-bit characters which would result in a byte-wise buffer like this:

0x61 0x00, 0x62 0x00, 0x63 0x00, 0x00 0x00, 0x00 0x00, 0x00 0x00, 0x0D 0x00, 0x0A 0x00

= There should be 0x00 high bytes for each 16-bit integer.

So either you used the A variant or you wrote what 16-bit integers your buffer contains. Do you know which one is it?


Furthermore, when you say:

Inputting "abcæøå"

how did you enter that string? Did you enter it regularly with your keyboard or did you use WriteConsoleInput? Depending on your answer I think I know which PR fixed your issue. 🙂

@tholp
Copy link
Author

tholp commented Sep 25, 2023

Sorry, you are right. My code actually obtains the (bad) bytes using ReadFile.
I don't know if that problem should be reported here or somewhere else?

Running echo abcæøå | myProgram gives the correct buffer contents, but inputting it using the keyboard gives the null-bytes.

@lhecker
Copy link
Member

lhecker commented Sep 25, 2023

Ah I see... In that case your issue is fixed by the combination of #14745 and #15783. The former fixes UTF-8 not working via ReadFile and is fixed in 1.18 and later. The latter fixes interactive Unicode input and is fixed in 1.19 and later. 1.18 is currently available via Windows Terminal Preview. (It'll be stable in a few weeks.)

I'll mark it as a /duplicate of #14745, because that one affects you more than the other.

@lhecker lhecker closed this as completed Sep 25, 2023
@lhecker lhecker added Resolution-Duplicate There's another issue on the tracker that's pretty much the same thing. Product-Conhost For issues in the Console codebase Area-Input Related to input processing (key presses, mouse, etc.) Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) and removed Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Attention The core contributors need to come back around and look at this ASAP. labels Sep 25, 2023
@microsoft-github-policy-service
Copy link
Contributor

Hi! We've identified this issue as a duplicate of another one that already exists on this Issue Tracker. This specific instance is being closed in favor of tracking the concern over on the referenced thread. Thanks for your report!

@lhecker
Copy link
Member

lhecker commented Sep 25, 2023

Oh and I should say: It's quite likely that Windows Terminal Preview 1.18 already fixes your issue. If you get a chance, please try it out! 🙂 (If you compare your issue description with #4551 for instance, you'll see that your description is very similar, and #4551 has been fixed in 1.18.)

@tholp
Copy link
Author

tholp commented Sep 26, 2023

Yes, that sounds exactly like my issue. Thank you, and once more: sorry for the confusion with ReadConsole (I apparently didn't understand our own (old) code and should have made a pure test).

Not sure I can test it until the change is in the public version (I have in any case updated our software to get the right result).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Input Related to input processing (key presses, mouse, etc.) Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Conhost For issues in the Console codebase Resolution-Duplicate There's another issue on the tracker that's pretty much the same thing.
Projects
None yet
Development

No branches or pull requests

3 participants