Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve text encoding API #564

Closed
wants to merge 3 commits into from
Closed

Improve text encoding API #564

wants to merge 3 commits into from

Conversation

dcodeIO
Copy link
Member

@dcodeIO dcodeIO commented Mar 28, 2019

This PR reworks the text encoding API on top of the runtime branch so a user can specify how to handle null termination.

Also splits the API into safe and unsafe parts. Note that where an ArrayBuffer is returned, an unsafe user can simply changetype<usize>(theBuffer) due to new headerless buffer (and string) layout and pass it to a C-API. The alternative is to return usize from the ___Raw functions, which can't reuse the new runtime API for allocating/reallocation as well as just returning an ArrayBuffer can. Thoughts?

(I don't particularly like the TextEncoder API btw because it news)


// @ts-ignore: decorator
@unsafe
export function decodeRaw(buf: usize, len: i32): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this could be internal because decode already provide more general interface. WDYT?

Copy link
Member Author

@dcodeIO dcodeIO Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that someone who gets just a str and a str_len from C has a way to make a string from it, as the higher level decode function wants an ArrayBuffer which must have a runtime header with payloadLength.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So may be makes sense add optional len to decode as well?

Copy link
Member Author

@dcodeIO dcodeIO Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd look like UTF8.decode(changetype<ArrayBuffer>(str), false, str_len) then. Not sure, feels somewhat wrong and would do different things depending of whether str_len is given or not. For instance, the function does not know whether it is called with an actual ArrayBuffer or not, so it doesn't know when it can length-check and when it can't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTF8.decode(changetype<ArrayBuffer>(str), 0, true) ?

Copy link
Member Author

@dcodeIO dcodeIO Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had something similar in mind initially, but gave up on it when I wasn't able to min(len, buf.byteLength) because buf could be a c pointer. Essentially, if len is given, it must not check byteLength because buf might be something unsafe, in turn making the entire function unsafe when specifying buf as an ArrayBuffer plus len. Hence I made a dedicated unsafe function.

@dcodeIO
Copy link
Member Author

dcodeIO commented Mar 28, 2019

I split this into UTF8Encoder and UTF8Decoder now (UTF16 respectively) and implemented the static methods. That better?

@MaxGraey
Copy link
Member

Great! Now much better

@dcodeIO
Copy link
Member Author

dcodeIO commented Mar 28, 2019

Usage with a C-API would be:

// having 'str' and 'str_len'
var theString = UTF8Decoder.decodeUnsafe(str, str_len);

// having 'str' (null terminated)
var theString = UTF8Decoder.decodeNullTerminatedUnsafe(str);

// having 'theString', wanting 'str' and 'str_len'
var buffer = UTF8Encoder.encode(theString);
myCAPI(changetype<usize>(buffer), buffer.byteLength);

// having 'theString', wanting 'str' (null terminated)
var buffer = UTF8Encoder.encode(theString, true);
myCAPI(changetype<usize>(buffer));

@dcodeIO
Copy link
Member Author

dcodeIO commented Jun 20, 2019

Superseded by #679

@dcodeIO dcodeIO closed this Jun 20, 2019
@dcodeIO dcodeIO deleted the runtime-encoding branch September 20, 2019 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants