Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode library has trouble with "\0" #1545

Closed
mpmxyz opened this issue Dec 4, 2015 · 6 comments
Closed

Unicode library has trouble with "\0" #1545

mpmxyz opened this issue Dec 4, 2015 · 6 comments

Comments

@mpmxyz
Copy link
Contributor

mpmxyz commented Dec 4, 2015

It looks like the library code only gets the part before the "\0".

local s = "abc\0abc"
print(string.len(s)) -> 7
print(unicode.len(s)) -> 3
print(unicode.wlen(s)) -> 3
print(unicode.lower(s)) -> "abc"
print(string.lower(s)) -> "abc\0abc"

print(unicode.wtrunc(s, 6)) -> Error: String index out of range: 3
@MaHuJa
Copy link

MaHuJa commented Dec 5, 2015

To me, this looks fairly much as expected.

The zero byte should not be present in text - it has no meaning, unlike a letter, or a space - and when you pass it to unicode libraries, you're saying it's text. More to the point I suspect the unicode library expects C-style zero terminators, which means that fixing this could involve replacing (with what?) or rewriting the unicode library.

Lua strings are - have to be - type agnostic. Text or byte sequence, it has to handle them both. Therefore you keep the \0 for a long time in normal string functions.

Technically, the specification could allow string.lower to return abc as well, but that could be surprising behavior when it didn't return string.len characters.

@mpmxyz
Copy link
Contributor Author

mpmxyz commented Dec 5, 2015

https://en.m.wikipedia.org/wiki/Null_character
It is a valid unicode character and it is not interpreted as a magic end of string marker in Lua. E.g. the whole string is drawn if you call gpu.set. Leaving that how it is makes the functions wlen and wtrunc useless because they don't relate to the output on a screen anymore. You would have to add special code in a lot of places to avoid that.

@Kubuxu
Copy link
Contributor

Kubuxu commented Dec 5, 2015

It might be problem with how pushString works (https://github.com/MightyPirates/OpenComputers/blob/master-MC1.7.10/src/main/scala/li/cil/oc/server/machine/luac/UnicodeAPI.scala#L24). It is my wild guess that pushing this string as UTF-8 encoded bytearray might fix it.

@mpmxyz
Copy link
Contributor Author

mpmxyz commented Dec 5, 2015

But lua.checkString might also make trouble since unicode.len returned incorrect values:
https://github.com/MightyPirates/OpenComputers/blob/master-MC1.7.10/src/main/scala/li/cil/oc/server/machine/luac/UnicodeAPI.scala#L18
On the other hand: How does it work with gpu.set?


def set(col: Int, row: Int, s: String, vertical: Boolean): Unit =

@fnuecke: Slightly offtopic, but why is it proxy.onBufferSet(x, row, truncated, vertical) and not proxy.onBufferSet(x, y, truncated, vertical)? (
proxy.onBufferSet(x, row, truncated, vertical)
)

@fnuecke
Copy link
Member

fnuecke commented Dec 12, 2015

Assuming this happens in JNLua's checkString I suppose the unicode API could be adjusted to read the byte array and build a string out of that. If it happens in Java's byte array to UTF-8 conversion or even Java's string operations I'd say this is a limitation you'll have to live with :P

As for the row vs. y thing... because it's a bug, probably! Will double check and fix.

@fnuecke
Copy link
Member

fnuecke commented Dec 27, 2015

As of #1539 at least term.write avoids issues by just stripping the \0s. I'd say whereever else this is an issue (which frankly, should not be often, if ever) just do the same.

@fnuecke fnuecke closed this as completed Dec 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants