Skip to content

fix(intl): implement calendar algorithm data validation using AnyCalendarKind#4790

Merged
jedel1043 merged 4 commits intoboa-dev:mainfrom
Lokesh0224:fix-intl-dtf-calendar-validation
Mar 5, 2026
Merged

fix(intl): implement calendar algorithm data validation using AnyCalendarKind#4790
jedel1043 merged 4 commits intoboa-dev:mainfrom
Lokesh0224:fix-intl-dtf-calendar-validation

Conversation

@Lokesh0224
Copy link
Contributor

@Lokesh0224 Lokesh0224 commented Mar 1, 2026

Description:
Resolves the TODO in ServicePreferences::validate for the LDML unicode key ca (calendar algorithm).

To find the correct way to validate calendar data, similar to how numbering system validation uses validate_extension with DecimalSymbolsV1. However, calendars don't expose a DataMarkerAttributes-based validation path like numbering systems do — attempting dry_load with calendar markers like DatetimePatternsDateGregorianV1 consistently returned IdentifierNotFound regardless of request format.

Approach
After digging into DateTimeFormatter::try_new_internal in ICU4X, I found that CalendarAlgorithm implements TryInto, and AnyCalendar::try_new_with_buffer_provider performs the actual provider-level data validation. If the provider lacks data for the requested calendar kind, it returns an error and we clear calendar_algorithm, falling back to the locale default — consistent with how other extension keys are handled.

Output:
image

@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Test262 conformance changes

Test result main count PR count difference
Total 52,862 52,862 0
Passed 49,627 49,627 0
Ignored 2,263 2,263 0
Failed 972 972 0
Panics 0 0 0
Conformance 93.88% 93.88% 0.00%

Tested PR commit: 48ac8b9ee376629f821a12e3cf7235475d2fa210

@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

❌ Patch coverage is 4.16667% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.26%. Comparing base (6ddc2b4) to head (48ac8b9).
⚠️ Report is 751 commits behind head on main.

Files with missing lines Patch % Lines
...gine/src/builtins/intl/date_time_format/options.rs 4.16% 23 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4790       +/-   ##
===========================================
+ Coverage   47.24%   57.26%   +10.02%     
===========================================
  Files         476      554       +78     
  Lines       46892    60489    +13597     
===========================================
+ Hits        22154    34641    +12487     
- Misses      24738    25848     +1110     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jedel1043
Copy link
Member

jedel1043 commented Mar 1, 2026

Well... kind of.

This would be correct if we only needed the calendar resolution data, which is kind of optional since it's just for optimization purposes.

Unfortunately for us, we actually need the formatter data, containing the localized names of the years, months, weekdays and day periods in that calendar.

After a fair bit of digging I created a very ugly utility function to check if a provider has localized data for a calendar in the specified locale

fn has_calendar_data_for_locale<C: CldrCalendar>(
    locale: &Locale,
    provider: &(impl DynamicDryDataProvider<<C as CldrCalendar>::YearNamesV1> + ?Sized),
) -> bool {
    let info = C::YearNamesV1::INFO;
    let locale = info.make_locale(LocalePreferences::from(locale));

    let req = DataRequest {
        metadata: {
            let mut md = DataRequestMetadata::default();
            md.silent = true;
            md
        },
        id: DataIdentifierBorrowed::for_marker_attributes_and_locale(&marker_attrs::ABBR, &locale),
    };

    let Ok(md) = provider.dry_load_data(info, req) else {
        return false;
    };

    // In a non-fallbacking provider, `None` represents that the provider fully supports a locale.
    // In a fallbacking provider, Some("und") represents that the provider doesn't support a locale.
    md.locale.map_or(true, |loc| !loc.is_unknown())
}

@Lokesh0224
Copy link
Contributor Author

@jedel1043 , I'm trying to implement the way you did, will update you soon with the progress. I've already traced the path using icu_datetime::provider::neo::{
DatetimeNamesYearBuddhistV1, DatetimeNamesYearChineseV1,.... markers with dry_load_data — the main challenge is that since CldrCalendar is in a public module but the generic approach requires matching each AnyCalendarKind variant to its corresponding marker manually. Also since this is an intermediate implementation (calendar validation), the test scripts are expecting resolvedOptions() for the test cases to pass, so I can't fully verify whether my approach is right. It's taking more time than expected .

@jedel1043
Copy link
Member

the main challenge is that since CldrCalendar is in a public module but the generic approach requires matching each AnyCalendarKind variant to its corresponding marker manually

Yeah I kind of expect this at least. I think it's fine because we're kind of using unstable APIs from ICU4X, and having to match against all calendar variants is not that bad really

@Lokesh0224
Copy link
Contributor Author

@jedel1043 , Updated the implementation to use the yearname markers for each calendar and validate via dry_load_data. Matching all AnyCalendarKind variants explicitly and checking the returned locale metadata to ensure formatter data exists for that calendar in the requested locale.

Comment on lines +619 to +624
use icu_datetime::provider::neo::{
DatetimeNamesYearBuddhistV1, DatetimeNamesYearChineseV1, DatetimeNamesYearCopticV1,
DatetimeNamesYearDangiV1, DatetimeNamesYearEthiopianV1, DatetimeNamesYearGregorianV1,
DatetimeNamesYearHebrewV1, DatetimeNamesYearHijriV1, DatetimeNamesYearIndianV1,
DatetimeNamesYearJapaneseV1, DatetimeNamesYearJapanextV1, DatetimeNamesYearPersianV1,
DatetimeNamesYearRocV1, marker_attrs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the calendar types instead of AnyCalendar, you kinda don't need all the markers, since those are accessible through the CldrCalendar trait as C::YearNamesV1

So I would suggest doing it like this instead (based off of my hacky definition of has_calendar_data_for_locale:

match kind {
        CalendarAlgorithm::Buddhist => has_calendar_data_for_locale::<Buddhist>(locale, provider),
        CalendarAlgorithm::Chinese => has_calendar_data_for_locale::<Chinese>(locale, provider),
        // rest of the calendars
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be on the self.calendar_algorithm.take().filter line btw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will refactor this

@jedel1043 jedel1043 added A-Enhancement New feature or request C-Intl Changes related to the `Intl` implementation labels Mar 4, 2026
@jedel1043 jedel1043 added this to the v1.0.0 milestone Mar 4, 2026
@jedel1043 jedel1043 added the Waiting On Author Waiting on PR changes from the author label Mar 4, 2026
@Lokesh0224 Lokesh0224 force-pushed the fix-intl-dtf-calendar-validation branch from 5474fd5 to f61f877 Compare March 4, 2026 09:59
@Lokesh0224 Lokesh0224 force-pushed the fix-intl-dtf-calendar-validation branch from f61f877 to 48ac8b9 Compare March 4, 2026 10:49
@Lokesh0224
Copy link
Contributor Author

@jedel1043 Addressed the review comment, refactored to use CldrCalendar types directly. Also is there any other issue I can work on? Most of the Intl issues seem to be blocked by ICU4X limitations.

Copy link
Member

@jedel1043 jedel1043 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks great!

@jedel1043
Copy link
Member

We kinda don't have more "issues" to pick up, but you can always try to optimize for performance :)

Right now our Intl code is very "straightforward", and we haven't put much effort into performance and memory usage. If you want to still work on this part of the engine, you can start by adding benches for Intl APIs

@nekevss
Copy link
Member

nekevss commented Mar 5, 2026

FWIW, If you're enjoying this area of work (INTL related) in general. There are some open bugs on boa-dev/temporal that you could take a stab at. Although, temporal is more intl adjacent then full intl

@jedel1043 jedel1043 added this pull request to the merge queue Mar 5, 2026
Merged via the queue into boa-dev:main with commit d0630f0 Mar 5, 2026
19 checks passed
@Lokesh0224
Copy link
Contributor Author

@jedel1043 @nekevss Thanks for the guidance!
I'll start looking into adding benchmarks for the Intl APIs. Also curious about the boa-dev/temporal bugs, once I settle with these benchmarks, I'll start working on that.

@Lokesh0224
Copy link
Contributor Author

@jedel1043 regarding the Intl benchmarks, what aspects should we focus on measuring?
for ex:
Construction performance something like creating DateTimeFormat, NumberFormat instances? or
Format operation performance (calling format(), formatToParts())?
so it will be helpful to narrow down.

@Lokesh0224
Copy link
Contributor Author

Right now our Intl code is very "straightforward", and we haven't put much effort into performance and memory usage. If you want to still work on this part of the engine, you can start by adding benches for Intl APIs

@jedel1043 I looked at the existing benchmark scripts like benches/scripts/strings/slice.js to understand the pattern something like constants outside main(), iteration counts etc... Based on that I've started with writing benchmarks for DateTimeFormat and NumberFormat across a few locales (en, de, ja, ar to cover different script systems), and a format() benchmark with a fixed date. Let me know if you think other aspects should be covered first or do i need to follow a completely different approach?

Regarding the memory usage I looked at benches/benches/scripts.rs and the current infrastructure uses Criterion which only measures execution time. so i don't get any lead regarding this.

github-merge-queue bot pushed a commit that referenced this pull request Mar 12, 2026
…menter, DateTimeFormat and NumberFormat (#4898)

As suggested in
[#4790](#4790 (comment)),
adds performance benchmarks for Intl APIs (` Intl.DateTimeFormat,
Intl.NumberFormat `) to establish baselines for future optimization
work.

**Note**
The `format()` benchmark uses a fixed date (`new Date(2024, 0, 1)`) for
deterministic results.

**Cache Optimization**
Also caches the DateTimeFormatter on the DateTimeFormat struct instead
of recreating it on every format() call. Previously, every format()
invocation loaded locale data from the provider from scratch.

<img width="836" height="194" alt="image"
src="https://github.com/user-attachments/assets/b0bbcf65-45d0-4f37-8fc2-cc48d40487f1"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Enhancement New feature or request C-Intl Changes related to the `Intl` implementation Waiting On Author Waiting on PR changes from the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants