Skip to content

Use WTF-8 as string encoding in binjs file and escape lone surrogate while processing (fixes #190)#193

Merged
arai-a merged 1 commit into
binast:masterfrom
arai-a:escaped_wtf8
Nov 14, 2018
Merged

Use WTF-8 as string encoding in binjs file and escape lone surrogate while processing (fixes #190)#193
arai-a merged 1 commit into
binast:masterfrom
arai-a:escaped_wtf8

Conversation

@arai-a
Copy link
Copy Markdown
Collaborator

@arai-a arai-a commented Oct 23, 2018

Applied the following changes:

  • introduces escaped_wtf8 module, which converts between WTF-8 and UTF-8 + escape sequence (\x7F + XXXX)
  • escape the generated JSON in Node side in shift.rs (we could output WTF-8 from Node side and decode/escape WTF-8 in Rust side, but I didn't want to add more module dependency in node side)
    • this means all string is escaped while processing in Rust side
      (but of course no effect if the input has no lone surrogate)
  • unescape while writing string to mutlipart file
    • which effectively changed the string encoding of multipart binjs file to WTF-8
    • this needs the change in JS engine side (I have patch, will post shortly)
  • escape while reading string from multipart file
  • added formatter (escaped_wtf8::for_print) for binjs_dump and used there, in order to see the escaped string in (incorrect but) more similar-to-usual way

@arai-a arai-a requested a review from Yoric October 23, 2018 07:33
@arai-a
Copy link
Copy Markdown
Collaborator Author

arai-a commented Oct 23, 2018

patch for SpiderMonkey side https://bugzilla.mozilla.org/show_bug.cgi?id=1501155

@Yoric
Copy link
Copy Markdown
Collaborator

Yoric commented Nov 6, 2018

@SimonSapin has agreed to review this patch. Let's see how I can add him to the mix.

Copy link
Copy Markdown
Collaborator

@SimonSapin SimonSapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I didn’t find any bug. I left inline comments about small nits that could make the code a bit more idiomatic or a bit more efficient (though maybe that’s premature optimization). Feel free to ignore any or all of them.

Comment thread crates/binjs_io/src/escaped_wtf8.rs Outdated
Comment thread crates/binjs_io/src/escaped_wtf8.rs Outdated
Comment thread crates/binjs_io/src/escaped_wtf8.rs
Comment thread crates/binjs_io/src/escaped_wtf8.rs Outdated
Comment thread crates/binjs_io/src/escaped_wtf8.rs Outdated
Comment thread crates/binjs_io/src/escaped_wtf8.rs Outdated
Comment thread crates/binjs_io/src/escaped_wtf8.rs Outdated
Comment thread crates/binjs_io/src/escaped_wtf8.rs Outdated
Comment thread crates/binjs_io/src/multipart/write.rs Outdated
Comment thread crates/binjs_io/src/escaped_wtf8.rs Outdated
@arai-a arai-a merged commit 80017a1 into binast:master Nov 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants