Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does chrono::NaiveDate::parse_from_str("2020-01-01 17", "%Y-%m-%d %H") pass? #1075

Open
MarcoGorelli opened this issue May 21, 2023 · 15 comments

Comments

@MarcoGorelli
Copy link

MarcoGorelli commented May 21, 2023

Here's a reproducible example:

use chrono;
fn main() {
    println!("{:?}", chrono::NaiveDate::parse_from_str("2020-01-01 17", "%Y-%m-%d %H"));
    println!("{:?}", chrono::NaiveDate::parse_from_str("2020-01", "%Y-%m"));
}

This outputs

Ok(2020-01-01)
Err(ParseError(NotEnough))

My question is - would it be possible to introduce a new ParseError, and have the first one throw ParseError(TooMany), instead of silently dropping '%H'?


As an aside, thanks for your work on this awesome project

@jtmoon79
Copy link
Contributor

@MarcoGorelli would you be willing to submit a PR for the 0.4.x branch?

@MarcoGorelli
Copy link
Author

totally! I'll give this a go, thanks, this would be useful in polars

@MarcoGorelli
Copy link
Author

closing per discussion in #1079

@MarcoGorelli
Copy link
Author

reopening as per #1079 (comment)

@MarcoGorelli MarcoGorelli reopened this May 23, 2023
@MarcoGorelli
Copy link
Author

MarcoGorelli commented May 23, 2023

I don't know what to propose as a solution, but what would be really useful in polars would be if there was a way to tell if the format string contains more elements than will end up in the result, so that chrono::NaiveDate::parse_from_str("2020-01-01 17", "%Y-%m-%d %H") won't silently drop the hour component

@jtmoon79
Copy link
Contributor

a way to tell if the format string contains more elements than will end up in the result

This is a safer approach to creating dates/datetimes so I like it.

Off-hand, I can think of three solutions:

  1. a flexible solution would be a parameter allow_unused: bool; when true it silently drops unused components, when false it returns None. (off-hand idea: Perhaps this new allow_unused: boolcould be default argument with help of crate default_args?)
  2. another solution is different functions; parse_from_str retains current behavior and new function parse_from_str_strict does not allow unused components and returns None if there are any.
  3. the least flexible is to change the behavior to return None in presence of unprocessed/unusable specifiers. This is the easiest to debug and upkeep.

@djc
Copy link
Contributor

djc commented May 24, 2023

We're definitely not going to depend on the default_args crate in chrono. I think we can have NaiveDate::parse_from_str() yield a ParseError with a new ParseErrorKind, you just can't change public API to get there.

@MarcoGorelli
Copy link
Author

I think we can have NaiveDate::parse_from_str() yield a ParseError with a new ParseErrorKind

That would be good, but in #1079 (comment) it was brought up that it might be a valid use case to only parse the date part

@pitdicker
Copy link
Collaborator

Related issue: #528

@MarcoGorelli
Copy link
Author

In case it helps, here's a minimal reproducer of the larger issue in polars:

use chrono;

fn parser(ts: &str, fmt: &str) -> Option<i64> {
    let res = chrono::NaiveDateTime::parse_from_str(ts, fmt)
        .ok()
        .map(|ndt| ndt.timestamp_millis());
    let res = res.or_else(|| {
        chrono::NaiveDate::parse_from_str(ts, fmt)
            .ok()
            .map(|nd| nd.and_hms_opt(0, 0, 0).unwrap().timestamp_millis())
    });
    res
}

fn main() {
    println!("res: {:?}", parser("2020-01-01 10", "%Y-%m-%d %H"));
    println!("res: {:?}", parser("2020-01-01", "%Y-%m-%d"));
}

This outputs

res: Some(1577836800000)
res: Some(1577836800000)

but ideally, in polars, the first one would need to be None.

On the polars side, would erroring if the given fmt contains ('%H' or '%I') and not '%M' be dangerous in any way?

Thanks all for your engagement here, much appreciated 🙏

@pitdicker
Copy link
Collaborator

pitdicker commented May 25, 2023

Ok. I have started playing with this, and most of it is easy to adjust.

  • When parsing a NaiveTime seconds and nanoseconds may already be omitted. With the note that seconds may only be omitted if nanoseconds are also omitted. It is easy to extend this to minutes.

  • When parsing DateTime we can default to the first day of the month when parsing year-month-day, or default to the first day of the week when parsing year-week-weekday.

  • Turning just a year into a full NaiveDate or NaiveDateTime seems a bit too much to me.

  • Parsing NaiveDateTime reuses the parts for DateTime and NaiveTime. It can be adjusted to default to midnight when parsing NaiveTime fails with NOT_ENOUGH. Edit: This is a bit more involved, it should instead check none of the time fields are populated.

Next I hit a guarantee in the documentation of NaiveDateTime::parse_from_str that seems a bit unfortunate:

Offset is ignored for the purpose of parsing.

assert_eq!(parse_from_str("2014-5-17T12:34:56+09:30", "%Y-%m-%dT%H:%M:%S%z"),
           Ok(NaiveDate::from_ymd_opt(2014, 5, 17).unwrap().and_hms_opt(12, 34, 56).unwrap()));

I would say input like this should just be parsed as DateTime<FixedOffset>. @djc Would it be akay to break this?

@pitdicker
Copy link
Collaborator

pitdicker commented May 25, 2023

One step further... When parsing NaiveDateTime it seems fine to add defaults. But when parsing a DateTime<FixedOffset> I would say that everything except seconds and nanoseconds need to be present. Defining an offset while the input doesn't contain minutes or even days doesn't make sense.

So better make it configurable. This can be done with an extra field on Parsed.

@djc
Copy link
Contributor

djc commented May 26, 2023

I would say input like this should just be parsed as DateTime<FixedOffset>. @djc Would it be akay to break this?

While I conceptually agree, I feel like semver does not allow us to break this documented guarantee in 0.4.x.

The rest of your proposed logic sounds good to me. Should clearly document that...

@MarcoGorelli
Copy link
Author

In addition to the above, would you consider parsing into NaiveDateTime if "hour" but no "minute" directive is provided?

This comes up sometimes in polars, e.g. pola-rs/polars#10178

@djc
Copy link
Contributor

djc commented Aug 12, 2023

@MarcoGorelli see #1094.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants