Skip to content

Conversation

@omus
Copy link
Member

@omus omus commented Dec 21, 2020

As reported in JuliaTime/TimeZones.jl#303 when users use Z without using the TimeZones package then the DateFormat treats Z as a delimiter:

julia> using Dates

julia> DateTime("2001-02-03T04:05:06Z", dateformat"yyyy-mm-ddTHH:MM:SSZ")
2001-02-03T04:05:06

When using the TimeZones package the same code will fail as Z is treated as a DatePart to be parsed:

julia> using Dates, TimeZones

julia> DateTime("2001-02-03T04:05:06Z", dateformat"yyyy-mm-ddTHH:MM:SSZ")
ERROR: ArgumentError: Unable to parse date time. Expected directive DatePart(Z) at char 20

Ever since Dates parsing/formatting was made extensible there has also existed a workaround to this problem by escaping any characters meant to be treated as literal characters (delimiters) instead of interpreted as character codes (e.g. dateformat"yyyy-mm-dd\THH:MM:SS\Z"). However, most users seem unaware of this functionality and are confused by the behaviour change when a package is imported.

This PR reserves alphabetic characters ([A-Za-z]) for use as character codes. If a user specifies one of these characters without a defined behavior an exception is now raised informing the user to either escape the character or import the package that defines the behavior.

Note: In the process of updating this I noticed a particular test ensuring that a character code used twice would result in an exception. Unfortunately the test was flawed and once addressed it revealed the bug:

julia> DateTime("18/05/2009 16:12", "mm/dd/yyyy HH:mm")  # Error handling, used "mm" for months AND minutes (note months are specified to be 18 and 12)
2009-12-05T16:00:00

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the change; is it breaking though? I don't quite understand the implications of the change wrt to whether it breaks existing code.

@mgkuhn
Copy link
Contributor

mgkuhn commented Dec 24, 2020

The current documentation for Dates.Format says:

When creating a format you can use any non-code characters as a separator. For example to generate the string "1996-01-15T00:00:00" you could use format: "yyyy-mm-ddTHH:MM:SS". Note that if you need to use a code character as a literal you can use the escape character backslash. The string "1996y01m" can be produced with the format "yyyy\ymm\m".

So this is clearly a breaking change; it even breaks an example given in the documentation. It is also a very useful change, but it ought to be introduced gradually.

Suggestion: Starting in release 1.6, change merely the documentation to

When creating a format you can use any non-code characters as a separator, however ASCII characters A–za–z are reserved as code characters and need to be escaped with a preceding backslash when used as separators. For example to generate the string "1996-01-15T00:00:00Z" you could use format: "yyyy-mm-dd\THH:MM:SS\Z".

Then add a deprecation warning in 1.7, and finally make it an error in 1.8

@mgkuhn
Copy link
Contributor

mgkuhn commented Dec 24, 2020

I would suggest to turn this into three separate PRs:

  • one for just the documentation change and test/bug fix (should be backported to 1.6)
  • one adding a depreciation warning (can be merged to 1.7)
  • one turning the depreciation warning into an exception (a breaking change to be merged later)

@Keno Keno added minor change Marginal behavior change acceptable for a minor release triage This should be discussed on a triage call labels Dec 25, 2020
@Keno
Copy link
Member

Keno commented Dec 25, 2020

There are no deprecation cycles in the 1.x series. If this is considered a "major change" (i.e. existing users are very likely to run into it), then it will have to wait until 2.x. The behavior could be introduced with a keyword argument though.

@@ -0,0 +1,9 @@
struct UndefDateFormatCode <: Exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick check.
Right now if open a repl and run subtypes(Exception)
there are 61 types shown.
The only ones that don't end with Exception or Error are:
SegmentationFault, and DimensionMismatch

Should we instead name this

Suggested change
struct UndefDateFormatCode <: Exception
struct DateFormatCodeUndefinedException <: Exception

@mgkuhn mgkuhn added the dates Dates, times, and the Dates stdlib module label Apr 20, 2021
@JeffBezanson
Copy link
Member

Triage thinks this is the right thing to do in theory, but if there are common date formats (e.g. including T) that will turn into errors we probably can't do it now.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Jul 1, 2021
@PallHaraldsson
Copy link
Contributor

If this is better but the only problem is it's breaking, and the PR works otherwise, can this be slightly changed with a keyword argument to get this into 1.10.x? Might be to late for 1.10.0, but it probably will not become LTS anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dates Dates, times, and the Dates stdlib module minor change Marginal behavior change acceptable for a minor release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants