fix(intl): implement calendar algorithm data validation using AnyCalendarKind#4790
Conversation
Test262 conformance changes
Tested PR commit: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4790 +/- ##
===========================================
+ Coverage 47.24% 57.26% +10.02%
===========================================
Files 476 554 +78
Lines 46892 60489 +13597
===========================================
+ Hits 22154 34641 +12487
- Misses 24738 25848 +1110 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Well... kind of. This would be correct if we only needed the calendar resolution data, which is kind of optional since it's just for optimization purposes. Unfortunately for us, we actually need the formatter data, containing the localized names of the years, months, weekdays and day periods in that calendar. After a fair bit of digging I created a very ugly utility function to check if a provider has localized data for a calendar in the specified locale fn has_calendar_data_for_locale<C: CldrCalendar>(
locale: &Locale,
provider: &(impl DynamicDryDataProvider<<C as CldrCalendar>::YearNamesV1> + ?Sized),
) -> bool {
let info = C::YearNamesV1::INFO;
let locale = info.make_locale(LocalePreferences::from(locale));
let req = DataRequest {
metadata: {
let mut md = DataRequestMetadata::default();
md.silent = true;
md
},
id: DataIdentifierBorrowed::for_marker_attributes_and_locale(&marker_attrs::ABBR, &locale),
};
let Ok(md) = provider.dry_load_data(info, req) else {
return false;
};
// In a non-fallbacking provider, `None` represents that the provider fully supports a locale.
// In a fallbacking provider, Some("und") represents that the provider doesn't support a locale.
md.locale.map_or(true, |loc| !loc.is_unknown())
} |
|
@jedel1043 , I'm trying to implement the way you did, will update you soon with the progress. I've already traced the path using icu_datetime::provider::neo::{ |
Yeah I kind of expect this at least. I think it's fine because we're kind of using unstable APIs from ICU4X, and having to match against all calendar variants is not that bad really |
|
@jedel1043 , Updated the implementation to use the yearname markers for each calendar and validate via dry_load_data. Matching all AnyCalendarKind variants explicitly and checking the returned locale metadata to ensure formatter data exists for that calendar in the requested locale. |
| use icu_datetime::provider::neo::{ | ||
| DatetimeNamesYearBuddhistV1, DatetimeNamesYearChineseV1, DatetimeNamesYearCopticV1, | ||
| DatetimeNamesYearDangiV1, DatetimeNamesYearEthiopianV1, DatetimeNamesYearGregorianV1, | ||
| DatetimeNamesYearHebrewV1, DatetimeNamesYearHijriV1, DatetimeNamesYearIndianV1, | ||
| DatetimeNamesYearJapaneseV1, DatetimeNamesYearJapanextV1, DatetimeNamesYearPersianV1, | ||
| DatetimeNamesYearRocV1, marker_attrs, |
There was a problem hiding this comment.
If you use the calendar types instead of AnyCalendar, you kinda don't need all the markers, since those are accessible through the CldrCalendar trait as C::YearNamesV1
So I would suggest doing it like this instead (based off of my hacky definition of has_calendar_data_for_locale:
match kind {
CalendarAlgorithm::Buddhist => has_calendar_data_for_locale::<Buddhist>(locale, provider),
CalendarAlgorithm::Chinese => has_calendar_data_for_locale::<Chinese>(locale, provider),
// rest of the calendars
}There was a problem hiding this comment.
This would be on the self.calendar_algorithm.take().filter line btw
There was a problem hiding this comment.
Sure, will refactor this
5474fd5 to
f61f877
Compare
f61f877 to
48ac8b9
Compare
|
@jedel1043 Addressed the review comment, refactored to use CldrCalendar types directly. Also is there any other issue I can work on? Most of the Intl issues seem to be blocked by ICU4X limitations. |
|
We kinda don't have more "issues" to pick up, but you can always try to optimize for performance :) Right now our Intl code is very "straightforward", and we haven't put much effort into performance and memory usage. If you want to still work on this part of the engine, you can start by adding benches for Intl APIs |
|
FWIW, If you're enjoying this area of work (INTL related) in general. There are some open bugs on boa-dev/temporal that you could take a stab at. Although, temporal is more intl adjacent then full intl |
|
@jedel1043 @nekevss Thanks for the guidance! |
|
@jedel1043 regarding the Intl benchmarks, what aspects should we focus on measuring? |
@jedel1043 I looked at the existing benchmark scripts like benches/scripts/strings/slice.js to understand the pattern something like constants outside main(), iteration counts etc... Based on that I've started with writing benchmarks for DateTimeFormat and NumberFormat across a few locales (en, de, ja, ar to cover different script systems), and a format() benchmark with a fixed date. Let me know if you think other aspects should be covered first or do i need to follow a completely different approach? Regarding the memory usage I looked at benches/benches/scripts.rs and the current infrastructure uses Criterion which only measures execution time. so i don't get any lead regarding this. |
…menter, DateTimeFormat and NumberFormat (#4898) As suggested in [#4790](#4790 (comment)), adds performance benchmarks for Intl APIs (` Intl.DateTimeFormat, Intl.NumberFormat `) to establish baselines for future optimization work. **Note** The `format()` benchmark uses a fixed date (`new Date(2024, 0, 1)`) for deterministic results. **Cache Optimization** Also caches the DateTimeFormatter on the DateTimeFormat struct instead of recreating it on every format() call. Previously, every format() invocation loaded locale data from the provider from scratch. <img width="836" height="194" alt="image" src="https://github.com/user-attachments/assets/b0bbcf65-45d0-4f37-8fc2-cc48d40487f1" />
Description:
Resolves the TODO in ServicePreferences::validate for the LDML unicode key ca (calendar algorithm).
To find the correct way to validate calendar data, similar to how numbering system validation uses validate_extension with DecimalSymbolsV1. However, calendars don't expose a DataMarkerAttributes-based validation path like numbering systems do — attempting dry_load with calendar markers like DatetimePatternsDateGregorianV1 consistently returned IdentifierNotFound regardless of request format.
Approach
After digging into DateTimeFormatter::try_new_internal in ICU4X, I found that CalendarAlgorithm implements TryInto, and AnyCalendar::try_new_with_buffer_provider performs the actual provider-level data validation. If the provider lacks data for the requested calendar kind, it returns an error and we clear calendar_algorithm, falling back to the locale default — consistent with how other extension keys are handled.
Output:
