Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project C++ enums to Intl.js, implement Intl.Collator's "usage" option #5651

Merged

Conversation

jackhorton
Copy link
Contributor

@jackhorton jackhorton commented Aug 28, 2018

Also, some assorted fixes and cleanup:

  • We now correctly set the initSlotCapacity of the Intl native interface object
  • Moved around some of the asserts in GetLocaleData to be more defensive. None hit, just noticed they had gaps.
  • Added caching to Number.prototype.toLocaleString, which is now ~O(1) when calling with default arguments (locally on a debug build, time spent calling toLocaleString 100 times went from 140ms to 20ms)

Fixes #5648
Fixes #4007

@jackhorton jackhorton added Intl-ICU Bytecode-Update This PR updates bytecode and will cause merge conflicts with other PRs with this label labels Aug 28, 2018
@jackhorton jackhorton force-pushed the intl/projected-enums-and-usage branch from f590d1b to ee1b45a Compare August 28, 2018 23:19
Copy link
Contributor

@sethbrenith sethbrenith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Collaborator

@jefgen jefgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Jack.
cc: @Microsoft/target-dev as FYI.

int collationsCount = uenum_count(collations, &status);

// we expect collationsCount to have at least "standard" and "search" in it
ICU_ASSERT(status, collationsCount > 2);
Copy link
Collaborator

@jefgen jefgen Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't if it is guaranteed that all locales will have 2 (unique to the locale) collation types. I think that in some cases the "search" might be aliased to the "default/standard" collation type.
However, that said, theres fallback to the root data well, so they might always have at least two, but based on falling back to root.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While debugging, I printed out all of the collation values for a few different locales and they all had at least two; if they don't, then our logic below could try to create a JavascriptArray of -1 elements (only if it has 0 collation values, which maybe is fine?), which would just failfast elsewhere (I hope...)

The Intl spec says that every locale has SearchLocaleData and SortLocaleData and usage must accept both search and sort for all locales, so I think the API was designed with the assumption that everyone would have both. It seems like a rare enough case, at least, that I would be interested in being alerted via a defensive failfast if at some point it doesn't hold.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jack. To clarify I mean that not every individual locale (in and of itself) would necessarily have multiple collation types. (Like for example "en" might not anything specific to that locale at all).
Since there is fallback to the root data, the net effect is that yes I think everybody would have multiple.

const char16 searchString[] = _u("search");
if (usage->BufferEquals(searchString, _countof(searchString) - 1)) // minus the null terminator
{
uloc_setKeywordValue("collation", "search", localeID, _countof(localeID), &status);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, it is probably better to only set the "search" type, that way the defaults for the locale would be used for the "standard" type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the intl spec isn't super clear about whether usage=search or -u-co takes precendence since its working at a different abstraction level with the SortLocaleData and SearchLocaleData; if nothing else, SpiderMonkey refuses to accept -u-co when usage == search

Copy link
Collaborator

@jefgen jefgen Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does seem a bit odd.. I was wondering about the following scenario, though perhaps it is an edge case.
For example, what if (somehow) the default user locale was set to something other than the standard type. Then if the usage is "standard", would it make sense to change it to force the standard type? If yes, then how would you get the "default" type? Seems a bit odd.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I guess you have to include the locale in the parameters first, before you can specify the options -- so maybe my scenario doesn't make sense, unless you can somehow pass "und" as the locale, and it picks up the host OS default locale.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also good point about what takes precedence as well.
Ex:
Intl.Collator('de-DE-u-co-phonebk', { usage: 'search' })

Will this give "search" type collation, or "phonebook" type collation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for any confusion -- it was more of a question to @jackhorton about how the code/change currently works in the pull-request at the moment.

Right now the code only looks to see if the caller of the Intl API user passed in the "usage: search" in the options bag. If the caller passes in "usage: sort" then it doesn't explicitly set anything, and will use the locale/language tag "as-is".

My question was perhaps it should also explicitly set the collation keyword in the "usage: sort" case as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that to ICU, en@collation=standard was ~equivalent to en, so we don't need to do anything special in that case (since standard === sort === the default for both Intl and ICU)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, what about something like this:
new Intl.Collator('es-u-co-trad', { usage: 'sort' })

If "sort === standard", and the options bag takes precedence, it sounds like in the above case you should be effectively getting 'es-u-co-standard' instead of 'traditional' -- right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... everyone (chakra/v8/sm) currently agrees that results in locale === es-u-co-trad and usage === sort. But I see your point that if usage is specified it should override -u-co, even if the specified value is sort... I can bring it up on the original tc39 issue that prompted me to implement usage, or bring it up at the meeting.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can bring it up on the original tc39 issue that prompted me to implement usage, or bring it up at the meeting.

Yeah, would be good to discuss it. I guess the issue is that we don't want to alter unicode extensions you passed as we give them back in resolvedOptions. But maybe we should remove them? (those are the locales we matched, and those are the options we matched)?

Copy link
Contributor

@dilijev dilijev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chakrabot chakrabot merged commit 962624d into chakra-core:master Sep 6, 2018
chakrabot pushed a commit that referenced this pull request Sep 6, 2018
…l.Collator's "usage" option

Merge pull request #5651 from jackhorton:intl/projected-enums-and-usage

Also, some assorted fixes and cleanup:
- We now correctly set the initSlotCapacity of the Intl native interface object
- Moved around some of the asserts in GetLocaleData to be more defensive. None hit, just noticed they had gaps.
- Added caching to Number.prototype.toLocaleString, which is now ~O(1) when calling with default arguments (locally on a debug build, time spent calling toLocaleString 100 times went from 140ms to 20ms)

Fixes #5648
Fixes #4007
@jackhorton jackhorton deleted the intl/projected-enums-and-usage branch September 6, 2018 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bytecode-Update This PR updates bytecode and will cause merge conflicts with other PRs with this label Intl-ICU
Projects
None yet
7 participants