Skip to content

Support unicode characters longer than a byte #60

@microbit-carlos

Description

@microbit-carlos

At the moment the String.charCodeAt() method is used to get the user script character codes, and it's return value is stored in a Uint8Array.
If the value returned is larger than a byte, as it would happen with some UTF-8 characters, then some of that information is lost and the wrong character is encoded into the hex file.

// add header, pad to multiple of 16 bytes
data = new Uint8Array(4 + script.length + (16 - (4 + script.length) % 16));
data[0] = 77; // 'M'
data[1] = 80; // 'P'
data[2] = script.length & 0xff;
data[3] = (script.length >> 8) & 0xff;
for (var i = 0; i < script.length; ++i) {
data[4 + i] = script.charCodeAt(i);
}

This is easy to reproduce, simply create a hex file with a UTF-8 character larger than a byte, download the hex, and load it back into the editor.

# UFT-8 character longer than a byte: Σ

Becomes:

# UFT-8 character longer than a byte: £

As the chracter 0x03A3 (Σ) has been encoded as 0xA3 (£)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions