Unicode library has trouble with "\0" #1545

mpmxyz · 2015-12-04T19:01:03Z

It looks like the library code only gets the part before the "\0".

local s = "abc\0abc"
print(string.len(s)) -> 7
print(unicode.len(s)) -> 3
print(unicode.wlen(s)) -> 3
print(unicode.lower(s)) -> "abc"
print(string.lower(s)) -> "abc\0abc"

print(unicode.wtrunc(s, 6)) -> Error: String index out of range: 3

The text was updated successfully, but these errors were encountered:

MaHuJa · 2015-12-05T06:57:30Z

To me, this looks fairly much as expected.

The zero byte should not be present in text - it has no meaning, unlike a letter, or a space - and when you pass it to unicode libraries, you're saying it's text. More to the point I suspect the unicode library expects C-style zero terminators, which means that fixing this could involve replacing (with what?) or rewriting the unicode library.

Lua strings are - have to be - type agnostic. Text or byte sequence, it has to handle them both. Therefore you keep the \0 for a long time in normal string functions.

Technically, the specification could allow string.lower to return abc as well, but that could be surprising behavior when it didn't return string.len characters.

mpmxyz · 2015-12-05T09:37:41Z

https://en.m.wikipedia.org/wiki/Null_character
It is a valid unicode character and it is not interpreted as a magic end of string marker in Lua. E.g. the whole string is drawn if you call gpu.set. Leaving that how it is makes the functions wlen and wtrunc useless because they don't relate to the output on a screen anymore. You would have to add special code in a lot of places to avoid that.

Kubuxu · 2015-12-05T11:33:37Z

It might be problem with how pushString works (https://github.com/MightyPirates/OpenComputers/blob/master-MC1.7.10/src/main/scala/li/cil/oc/server/machine/luac/UnicodeAPI.scala#L24). It is my wild guess that pushing this string as UTF-8 encoded bytearray might fix it.

mpmxyz · 2015-12-05T12:45:01Z

But lua.checkString might also make trouble since unicode.len returned incorrect values:
https://github.com/MightyPirates/OpenComputers/blob/master-MC1.7.10/src/main/scala/li/cil/oc/server/machine/luac/UnicodeAPI.scala#L18
On the other hand: How does it work with gpu.set?

OpenComputers/src/main/scala/li/cil/oc/server/component/GraphicsCard.scala

Line 227 in efcd32d

val value = args.checkString(2)

OpenComputers/src/main/scala/li/cil/oc/common/component/TextBuffer.scala

Line 319 in d861552

def set(col: Int, row: Int, s: String, vertical: Boolean): Unit =

@fnuecke: Slightly offtopic, but why is it proxy.onBufferSet(x, row, truncated, vertical) and not proxy.onBufferSet(x, y, truncated, vertical)? (

OpenComputers/src/main/scala/li/cil/oc/common/component/TextBuffer.scala

Line 333 in d861552

proxy.onBufferSet(x, row, truncated, vertical)

)

fnuecke · 2015-12-12T09:39:42Z

Assuming this happens in JNLua's checkString I suppose the unicode API could be adjusted to read the byte array and build a string out of that. If it happens in Java's byte array to UTF-8 conversion or even Java's string operations I'd say this is a limitation you'll have to live with :P

As for the row vs. y thing... because it's a bug, probably! Will double check and fix.

fnuecke · 2015-12-27T12:39:48Z

As of #1539 at least term.write avoids issues by just stripping the \0s. I'd say whereever else this is an issue (which frankly, should not be often, if ever) just do the same.

mpmxyz mentioned this issue Dec 24, 2015

Added term windows and multiscreen support to keyboard lib #1539

Merged

fnuecke closed this as completed Dec 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode library has trouble with "\0" #1545

Unicode library has trouble with "\0" #1545

mpmxyz commented Dec 4, 2015

MaHuJa commented Dec 5, 2015

mpmxyz commented Dec 5, 2015

Kubuxu commented Dec 5, 2015

mpmxyz commented Dec 5, 2015

fnuecke commented Dec 12, 2015

fnuecke commented Dec 27, 2015

Unicode library has trouble with "\0" #1545

Unicode library has trouble with "\0" #1545

Comments

mpmxyz commented Dec 4, 2015

MaHuJa commented Dec 5, 2015

mpmxyz commented Dec 5, 2015

Kubuxu commented Dec 5, 2015

mpmxyz commented Dec 5, 2015

fnuecke commented Dec 12, 2015

fnuecke commented Dec 27, 2015