Skip to content

Date32 doesn't parse date with large year #9960

@swanandx

Description

@swanandx

Describe the bug

Date32Type::parse does not parse date strings whose year is outside chrono::NaiveDate's range (Jan 1, 262145 BCE to Dec 31, 262143 CE )

Ideally Date32 should be able to represent much larger range (i32 days from epoch; years ≈ ±5,881,580).

the extended-year branch in parse_date calls chrono::NaiveDate::from_ymd_opt, limiting the supported range.

return NaiveDate::from_ymd_opt(year, month, day);

In the end, all we need is i32, so we can avoid NaiveDate detour

To Reproduce

Try to parse date outside NaiveDate supported range

use arrow_array::types::Date32Type;
use arrow_cast::parse::Parser;

fn main() {
    // Works: year is within chrono::NaiveDate's supported range.
    assert_eq!(Date32Type::parse("+29349-01-26"), Some(10_000_000));

    // Fails today: returns None. This should be accepted because the resulting
    // day offset is still representable by Date32.
    assert_eq!(Date32Type::parse("+2739877-01-03"), Some(1_000_000_000));
}

Expected behavior

All valid Date32 should be parsed successfully

Additional context

arrow-array = "58.2.0"
arrow-cast = "58.2.0"

Easy approach would be:
The Gregorian calendar repeats exactly every 400 years (146,097 days), so we calculate current era ( era = y.div_euclid(400) ) and year in current era ( yoe = y.rem_euclid(400) ). Year in era would be 0..=399, so always within chrono's supported range, which we can use to calculate days / validation

let nd = NaiveDate::from_ymd_opt(yoe, month, day)?;
let in_era = (nd.num_days_from_ce() - EPOCH_DAYS_FROM_CE) as i64;
days =  i32::try_from(era * 146_097 + in_era).ok();

Other can be to update parse_date signature itself? or something else?

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions