New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve text encoding API #564
Conversation
std/assembly/encoding.ts
Outdated
|
||
// @ts-ignore: decorator | ||
@unsafe | ||
export function decodeRaw(buf: usize, len: i32): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this could be internal because decode
already provide more general interface. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that someone who gets just a str
and a str_len
from C has a way to make a string
from it, as the higher level decode
function wants an ArrayBuffer
which must have a runtime header with payloadLength.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So may be makes sense add optional len
to decode
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd look like UTF8.decode(changetype<ArrayBuffer>(str), false, str_len)
then. Not sure, feels somewhat wrong and would do different things depending of whether str_len
is given or not. For instance, the function does not know whether it is called with an actual ArrayBuffer or not, so it doesn't know when it can length-check and when it can't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTF8.decode(changetype<ArrayBuffer>(str), 0, true)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had something similar in mind initially, but gave up on it when I wasn't able to min(len, buf.byteLength)
because buf
could be a c pointer. Essentially, if len
is given, it must not check byteLength
because buf
might be something unsafe, in turn making the entire function unsafe when specifying buf
as an ArrayBuffer plus len
. Hence I made a dedicated unsafe function.
I split this into |
Great! Now much better |
Usage with a C-API would be: // having 'str' and 'str_len'
var theString = UTF8Decoder.decodeUnsafe(str, str_len);
// having 'str' (null terminated)
var theString = UTF8Decoder.decodeNullTerminatedUnsafe(str);
// having 'theString', wanting 'str' and 'str_len'
var buffer = UTF8Encoder.encode(theString);
myCAPI(changetype<usize>(buffer), buffer.byteLength);
// having 'theString', wanting 'str' (null terminated)
var buffer = UTF8Encoder.encode(theString, true);
myCAPI(changetype<usize>(buffer)); |
Superseded by #679 |
This PR reworks the text encoding API on top of the runtime branch so a user can specify how to handle null termination.
Also splits the API into safe and unsafe parts. Note that where an
ArrayBuffer
is returned, an unsafe user can simplychangetype<usize>(theBuffer)
due to new headerless buffer (and string) layout and pass it to a C-API. The alternative is to returnusize
from the___Raw
functions, which can't reuse the new runtime API for allocating/reallocation as well as just returning anArrayBuffer
can. Thoughts?(I don't particularly like the
TextEncoder
API btw because itnew
s)