Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read Russian text in console #47133

Open
cocahonka opened this issue Sep 7, 2021 · 9 comments
Open

Read Russian text in console #47133

cocahonka opened this issue Sep 7, 2021 · 9 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. os-windows

Comments

@cocahonka
Copy link

cocahonka commented Sep 7, 2021

How can Russian text be read from the console? I tried like this

var answer = stdin.readLineSync(encoding: Encoding.getByName('utf-8')!)!;

and like this

stdin.transform(utf8.decoder);

but if you try to print a string using print, then nothing will be output.

Additional:

answer.codeUnits.forEach(print);
// output
// 0
// 0
// 0
// 0

I enter simple words:

привет
нет
да

@lrhn
Copy link
Member

lrhn commented Sep 7, 2021

Most likely your terminal input is not UTF-8. It could be Windows-1251. In that case, all the characters you input have codes >= 128, and most >= 160, which means that the bytes are likely to be invalid UTF-8. UTF-8 decoding then either fails, or if being lenient, gives you a replacement character, which could explain the 0 values.

Try decoding with a Cyrillic code page instead of UTF-8, possibly latinCyrillic from package:convert. Or maybe you need to create a code page table for Windows 1251 (which appears to be different from ISO-8859-5 Latin-Cyrillic).

(Or maybe it's something else, but try this first, or show us the actual bytes you receive from stdin).

@cocahonka
Copy link
Author

Most likely your terminal input is not UTF-8. It could be Windows-1251. In that case, all the characters you input have codes >= 128, and most >= 160, which means that the bytes are likely to be invalid UTF-8. UTF-8 decoding then either fails, or if being lenient, gives you a replacement character, which could explain the 0 values.

Try decoding with a Cyrillic code page instead of UTF-8, possibly latinCyrillic from package:convert. Or maybe you need to create a code page table for Windows 1251 (which appears to be different from ISO-8859-5 Latin-Cyrillic).

(Or maybe it's something else, but try this first, or show us the actual bytes you receive from stdin).

My windows terminal (cmd, powershell) is running in (utf-8 65001) mode, so this shouldn't be a problem.

I also looked at the actual bytes that come to me from stdin, all the same zeros. I tried setting the latinCyrillic mode as you wrote, but still nothing works, I also tested it with other tables.

BUT I decided to open not the Windows terminal - git bash, and run my code, it did not work with the latinCyrillic parameter, and when I replaced it back with utf-8, it worked! It turns out a problem in the Windows terminal, although the same utf-8 is set in its settings, I will figure it out, thank you very much for the answer.

When I find out I will write here and close the issue, or can you help me if you know the answer :)

@cocahonka
Copy link
Author

As it turned out, I have a problem with all windows terminals, even if I put internal git bash in vs code, then nothing will work, it will only work if I open external git bash

@greymag
Copy link

greymag commented Dec 3, 2021

Hey! @cocahonka Please reopen the issue - it's still there. Not only you have problem - no one on Windows can read input in cyrillic. It's quite critical for cross-platform CLI tools.

@cocahonka cocahonka reopened this Dec 4, 2021
@lrhn lrhn added area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. os-windows labels Dec 6, 2021
@a-siva
Copy link
Contributor

a-siva commented Dec 7, 2021

//cc @aam

@aam
Copy link
Contributor

aam commented Dec 7, 2021

I don't think this issue has much to do with Dart rather with very complicated story of how encoding works in Windows Command line prompt/Powershell. You can check https://stackoverflow.com/questions/57131654/using-utf-8-encoding-chcp-65001-in-command-prompt-windows-powershell-window to get a sense of what's involved in using utf8 in windows shell.

@greymag
Copy link

greymag commented Dec 7, 2021

@aam Thanks for the answer and link! Encoding in Windows are truly horrible, but I'm still think that's a Dart problem. That why:

  1. When you try stdin.readByteSync() - you just got zeros for Cyrillic symbols.
  2. If you pass Cyrillic string as an argument: utils.dart -t"Русская строка" - it's works.
  3. If you try interactive input in any other CLI, for example written in Python - it's works.
  4. Any described by link actions doesn't make any difference: you can change chcp, system settings, environment variables - still zeros.

So it's just unusable on Windows and we can't get it working from Dart code or system/environment settings.

@aam
Copy link
Contributor

aam commented Dec 7, 2021

  1. When you try stdin.readByteSync() - you just got zeros for Cyrillic symbols.

Right, and it has little to do with Dart. You will get the same if you use copy con file.txt and enter some Cyrillic characters: file.txt will have nulls instead of characters.
Note if you pass file content as a stdin to the Dart program, the bytes will arrive to Dart program correctly.

Microsoft Windows [Version 10.0.18363.1916]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\src\dart-sdk\sdk>type kuka.txt
привет

C:\src\dart-sdk\sdk>chcp 65001
Active code page: 65001

C:\src\dart-sdk\sdk>type kuka.txt
привет

C:\src\dart-sdk\sdk>type kuka.txt | out\ReleaseX64\dart line.dart
[208, 191, 209, 128, 208, 184, 208, 178, 208, 181, 209, 130, 13, 10]

C:\src\dart-sdk\sdk>out\ReleaseX64\dart line.dart
привет
[0, 0, 0, 0, 0, 0, 13, 10]
^Z

C:\src\dart-sdk\sdk>type line.dart
import 'dart:io';

main() async {
        stdin.listen(
                (line) { print(line); }
        );
}
C:\src\dart-sdk\sdk>

@JoCat
Copy link

JoCat commented May 8, 2022

Faced a similar situation while rewriting one console program from node js to dart.
I have 2 problems:

  1. There is no analogue of Completer Function as Node Readline, as well as an analogue of such an interface in dart, solved by workaround (Is there a way to check whether a key has been pressed without blocking the flow of the program? timsneath/dart_console#42 (comment))
  2. Lack of Cyrillic support. In nodejs, this works correctly, even though I'm running the code on Win 10

In nodejs app:
image

In dart app:
use stdin.transform(utf8.decoder).transform(LineSplitter())
image

stdin.readLineSync(encoding: utf8)
image
stdin.readByteSync()
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. os-windows
Projects
None yet
Development

No branches or pull requests

6 participants