Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(node): add string_decoder module #11095

Closed
wants to merge 4 commits into from

Conversation

sgtcoolguy
Copy link
Contributor

JIRA: https://jira.appcelerator.org/browse/TIMOB-27286

Description:
Follow on to #11084 which adds Buffer

This adds the string_decoder module, which is sort of a safer way to interact with Buffers when dealing with character data.

Basically a string decoder will temporarily cache/buffer up incomplete multi-byte characters until they're completed before outputting them. This is handy if you expect multi-byte utf8/utf16/base64 data and want to spit it out as you go.

Caveats
V8 and Node handle invalid UTF-8 multi-byte characters by replacing invalid/incomplete multi-byte sequences with '\ufffd'. This implementation does not - and on a platform like iOS can result in undefined being returned. I've left comments about this in here for now - but the "fix" here is 2-fold:

  • Have iOS/Windows explicitly handle UTF-8 by injecting the replacement characters (there's a link to a gist that shows this for iOS, but it's a fair amount of code) - Note that this would fix our Ti.Buffer handling of invalid data as well.
  • Have this module do the checking/replacement. Given that the data lives across the JS/native boundary this could be performance-heavy. If our Buffer module got changed to use Uint8Array in the majority of cases, this may not be such an issue - but being backed by a Ti.Buffer and traversing the bridge for every byte means it would be pretty expensive to iterate through to find and validate every multi-byte char sequence.

The only "invalid" case we do handle is when you call decoder.end() and it has a buffered incomplete multi-byte character.

@sgtcoolguy
Copy link
Contributor Author

⚠️ Note that this should be held until #11084 is merged (and then this should be rebased)

@build build added this to the 8.2.0 milestone Jul 31, 2019
* - charLength: expected number of bytes for the incomplete character
* - index: index in the buffer where the incomplete character begins
* @param {Buffer} _buffer Buffer we are checking to see if it has an incompelte "character" at the end
* @returns {IncompleteCharObject}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ⚠️ common/Resources/ti.internal/extensions/node/string_decoder.js line 104 – Unexpected @returns tag; function has no return statement. (valid-jsdoc)

@build
Copy link
Contributor

build commented Jul 31, 2019

Messages
📖

💾 Here's the generated SDK zipfile.

📖

✅ All tests are passing
Nice one! All 4400 tests are passing.
(There are 474 tests skipped)

📖 ✊ The commits in this PR match our conventions! Feel free to Rebase and Merge this PR when ready.

Generated by 🚫 dangerJS against b23069a

@sgtcoolguy sgtcoolguy modified the milestones: 8.2.0, 8.3.0 Sep 5, 2019
Copy link
Contributor

@janvennemann janvennemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just added two minor notes in the test suite but nothing important.

I guess we have to see if and how the caveats have any impact on third-party modules that use string_decoder. Regarding a fix for this i'd vote for the actual fix in Ti.Buffer on iOS to get the correct behavior natively, assuming that we don't make the switch to UIInt8Array for our buffers anytime soon.

});
});

// FIXME: I'm using ES6 classes which doesn't allow non-new constructor!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still an issue? Looks like you are just delegating to the ES6 classes so it should work fine?

should(decoder.write(Buffer.from([]))).eql('');
should(decoder.end()).eql('');
// now writing an empty buffer doesn't append anything!
// testEnd('base64', Buffer.of(0x61), Buffer.of(), 'YQ==');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these a placeholder for some missing tests?

@sgtcoolguy
Copy link
Contributor Author

Merged manually to master.

@sgtcoolguy sgtcoolguy closed this Sep 17, 2019
@sgtcoolguy sgtcoolguy deleted the node-string-decoder branch September 17, 2019 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants