Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serial monitor character encoding option #1728

Open
Ivan-Perez opened this issue Jan 19, 2016 · 3 comments · May be fixed by arduino/Arduino#8660
Open

Serial monitor character encoding option #1728

Ivan-Perez opened this issue Jan 19, 2016 · 3 comments · May be fixed by arduino/Arduino#8660
Assignees
Labels
topic: code Related to content of the project itself topic: serial monitor Related to the Serial Monitor type: enhancement Proposed improvement

Comments

@Ivan-Perez
Copy link

Ivan-Perez commented Jan 19, 2016

Describe the request

It would be great if in the serial monitor had an option to change the encoding used.

In my case, my sketches are using UTF-8, so print messages use that encoding. By default, serial monitor is using ISO-8859 (probably the default one of Windows 7), so those print messages are not shown properly:

serial-monitor-encoding-problems

Additional context

I've found other issues talking about this problem. Instead of detecting the encoding used, it might be easier if the user could select the charset he wants to use. The option (a selectable list with the most used character sets) may be put in the bottom right corner, left of the two existing selects (baud rate and line feed).


Additional requests:

@Ivan-Perez Ivan-Perez changed the title Change serial monitor encoding Serial monitor encoding option Jan 19, 2016
@Ivan-Perez Ivan-Perez changed the title Serial monitor encoding option Serial monitor character encoding option Jan 19, 2016
@lmihalkovic
Copy link

Problem identified in 2014 with editor arduino/Arduino#2430, but May 2015 comment contains reference to limitation with serial monitor encoding support

@aknrdureegaesr
Copy link

I did some research on UTF-8 behavior today. I found this code in arduino-core/src/processing/app/Serial.java which does the conversion from the incoming bytes to the strings displayed by the UI:

  public synchronized void serialEvent(SerialPortEvent serialEvent) {
    if (serialEvent.isRXCHAR()) {
      try {
        byte[] buf = port.readBytes(serialEvent.getEventValue());
        if (buf.length > 0) {
          String msg = new String(buf);
          char[] chars = msg.toCharArray();
          message(chars, chars.length);
        }
      } catch (SerialPortException e) {
        errorMessage("serialEvent", e);
      }
    }
  }

The String-constructor used decodes the incoming bytes according to the particular platform's default charset, its documentation says.

I think the standard should be to always use UTF-8. So a sketch developed and programmed on a Mac works with a Linux or Windows box and vice versa. If we later want to add HEX display, as proposed by #1727, here would be one place to do it.

There is a catch here: In our context, there is no guarantee whatsoever that the bytes that come in do us the favor to split neatly at character boundaries. It is quite feasible they include only the first byte of a character that is encoded into several bytes, and the other bytes come later, with the next call.

In this case, the String-constructor is documented to exhibit undocumented behavior :-) .

In actual tests, I indeed see erratic output whenever I use non-ASCII characters in my sketches. Sometimes it works, sometimes it doesn't. (This is Linux, and I'm almost certain the platform encoding is UTF-8.)

The String-constructor documentation advises to use a CharsetDecoder instead.

I think that's good advice. This would give control over the encoding used. Clean UTF-8 decoding even in the split character case is a feature included in CharsetDecoder.

@AtosNicoS
Copy link

Hi,
I also recently stumbled across that limitation. The Serial Monitor now supports reading in UTF-8, so writing should also default to UTF-8.

Pieter12345 referenced this issue in Pieter12345/Arduino Mar 15, 2019
- Add "send as <encoding>" dropdown menu.
- Add "receive as <encoding>" dropdown menu.

Sending and receiving can be done in any encoding specified in StandardCharsets with the additional option to send comma-separated bytes and receive newline-separated bytes directly.

This fixes #4452 and offers an easy implementation for issue #4632.
Pieter12345 referenced this issue in Pieter12345/Arduino Mar 18, 2019
- Add "send as <encoding>" dropdown menu.
- Add "receive as <encoding>" dropdown menu.

Sending and receiving can be done in any encoding specified in StandardCharsets with the additional option to send comma-separated bytes and receive newline-separated bytes directly.

This fixes #4452 and offers an easy implementation for issue #4632.
Pieter12345 referenced this issue in Pieter12345/Arduino Mar 29, 2019
- Add "send as <encoding>" dropdown menu.
- Add "receive as <encoding>" dropdown menu.

Sending and receiving can be done in any encoding specified in StandardCharsets with the additional option to send comma-separated bytes and receive newline-separated bytes directly.

This fixes #4452 and offers an easy implementation for issue #4632.
@per1234 per1234 transferred this issue from arduino/Arduino Dec 1, 2022
@per1234 per1234 added type: enhancement Proposed improvement topic: code Related to content of the project itself topic: serial monitor Related to the Serial Monitor labels Dec 1, 2022
@per1234 per1234 self-assigned this Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: code Related to content of the project itself topic: serial monitor Related to the Serial Monitor type: enhancement Proposed improvement
Projects
None yet
5 participants