Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dates: Documentation, tests, and changed arithmetics for DateTime #50816

Merged
merged 2 commits into from Aug 22, 2023

Conversation

barucden
Copy link
Contributor

@barucden barucden commented Aug 7, 2023

Resolves #50785 EDIT: the PR changed over time, see the discussion

This is a proposal for fixing the linked issue. The PR consists of two commits.

The first commit just adds a few tests and one sentence to the docs of DateTime. I think this could be merged.

The second commit changes addition/subtraction of a DateTime and Microsecond/Nanosecond.
Before:

julia> DateTime(2023, 08, 07) + Microsecond(1) # truncates
2023-08-07T00:00:00

julia> DateTime(2023, 08, 07) + Nanosecond(1) # truncates
2023-08-07T00:00:00

julia> DateTime(2023, 08, 07) + Microsecond(10^3)
2023-08-07T00:00:00.001

julia> DateTime(2023, 08, 07) + Nanosecond(10^6)
2023-08-07T00:00:00.001

After:

julia> DateTime(2023, 08, 07) + Microsecond(1) # throws
ERROR: InexactError: +(Millisecond, 1 microsecond)

julia> DateTime(2023, 08, 07) + Nanosecond(1) # throws
ERROR: InexactError: +(Millisecond, 1 nanosecond)

julia> DateTime(2023, 08, 07) + Microsecond(10^3)
2023-08-07T00:00:00.001

julia> DateTime(2023, 08, 07) + Nanosecond(10^6)
2023-08-07T00:00:00.001

The second commit is breaking so I am not sure if it can be merged.

cc @jariji

@timholy timholy added the kind:breaking This change will break code label Aug 7, 2023
@jariji
Copy link
Contributor

jariji commented Aug 7, 2023

If the incorrect method can't be removed, at least it can be @deprecated, right?

@barucden
Copy link
Contributor Author

barucden commented Aug 8, 2023

Yes, I suppose we can at least deprecate the Nanosecond & Microsecond methods if throwing an error is deemed breaking.

What is the protocol for breaking changes anyway? Is any breaking change prohibited? Or do we run nanosoldier to see if any packages rely on the current behavior, and if not, then we can merge the change?

@Seelengrab
Copy link
Contributor

There is no formal protocol, though if something is explicitly documented in the manual (a docstring is not sufficient), we try very hard not to break the behavior. If possible, deprecation for (upcoming) breaking changes should be done rather than doing the breaking change.

@barucden
Copy link
Contributor Author

barucden commented Aug 9, 2023

Alright, thank you. How to proceed here then? I did not remove any methods, only added a potential throw. So I should comment out the throw for now and insert a depwarn instead?

@Seelengrab
Copy link
Contributor

Changing a non-error to an error is breaking, so my feeling is that the error path should not happen for now. Since DateTime handles millisecond resolution only, it seems good to deprecate the micro-/nanosecond path.

@barucden
Copy link
Contributor Author

barucden commented Aug 9, 2023

Can you see the new version, please? Instead of throwing an error, the methods now issue a deprecation warning. I put instructions into comments on what to do when the deprecation may turn into an error.

@Seelengrab
Copy link
Contributor

Seelengrab commented Aug 9, 2023

I'd just deprecate the entire Micro-/Nanosecond method. Only every 1.000/1.000.000 micro/nanoseconds are perfectly convertible to milliseconds, and a one in a thousand or one in a million chance to NOT get an error from addition (when it seems preferable to convert explicitly in all cases either way) seems better.

But then again, we don't have something like Nanosecond(1) % Millisecond or even round(Millilsecond, Nanosecond(1)), so I'm a bit torn on this.. Perhaps that's an argument for adding that, though not in this PR.

@barucden barucden force-pushed the datetime-arithmetics branch 2 times, most recently from 2184bef to b9835ba Compare August 9, 2023 09:59
@barucden
Copy link
Contributor Author

barucden commented Aug 9, 2023

Yes, that makes sense. It is done.

@Seelengrab
Copy link
Contributor

LGTM!

@Seelengrab Seelengrab added kind:deprecation This change introduces or involves a deprecation and removed kind:breaking This change will break code labels Aug 9, 2023
@barucden
Copy link
Contributor Author

barucden commented Aug 9, 2023

Thank you!

I see the tests passing locally, but the CI fails: https://buildkite.com/julialang/julia-master/builds/26597#0189d9f2-4590-4579-8d55-9a9208e78d17/775-1116

Do you have any idea what might be the reason?

@Seelengrab
Copy link
Contributor

Likely toms(y) should be Dates.toms(y), since the toms name is not exported from Dates.

@barucden
Copy link
Contributor Author

barucden commented Aug 9, 2023

But toms(y) is used from within Dates, so it should be available, right?

@Seelengrab
Copy link
Contributor

Good point - Maybe we'll be able to see the actual error if you modify the test to

conv = @test_deprecated (dt + Microsecond(10^3))
@test conv == Dates.DateTime(1999, 12, 27, 0, 0, 0, 1)

as currently there's an ErrorException being compared.

Looking at the rest of the file, it's a bit curious that the other deprecations in there are all commented out 🤔

@Seelengrab
Copy link
Contributor

Seeing the discussion in #29509, seems like the deprecations were just left like that for later 😬

@barucden
Copy link
Contributor Author

barucden commented Aug 9, 2023

Oh, and also see #37814. I could uncomment the depwarns (#37814 (comment)) in this PR, too?

I should also annotate the deprecated methods with @noinline (#29509 (comment))

@Seelengrab
Copy link
Contributor

Seelengrab commented Aug 9, 2023

Yes, since Jeff mentioned to just uncomment them, go for it. Do make sure to replace the nothing with the correct function symbol though, otherwise the deprecation warning will be confusing. You may need to add @test_deprecated for the methods too. Also, once uncommented, mention the other issue in your original comment as being resolved, so github closes it when this PR is merged.

@barucden
Copy link
Contributor Author

barucden commented Aug 9, 2023

I think I know what's happening. The buildkite tests run with --depwarn=error. In that case, depwarn throws an ErrorException instead of issuing a warning, and x = @test_deprecated expr produces x = @test_throws ErrorException expr. This test passes, and x is assigned the string "Test Passed [...]", which is then compared with the expected DateTime. See

# start with julia --depwarn=error
julia> using Test

julia> x = @test_deprecated Base.depwarn("", nothing);

julia> x
Test Passed
      Thrown: ErrorException

I am not sure how to use @test_deprecated correctly.
Edit: There is an issue for this #36167

@barucden
Copy link
Contributor Author

barucden commented Aug 9, 2023

The tests should pass now. We are testing that the deprecated methods are deprecated, but not their results. I think that's the best we can do right now since --depwarn=error makes it complicated.

Because of the same reason, uncommenting the other depwarns makes the test suite fail in several places. I will look into that in another PR.

@Seelengrab
Copy link
Contributor

Ah, that's a shame 😔 But well, nothing we can do about in this PR. Thanks for looking into it!

The windows failure seems to come from SparseArrays, so should be unrelated to this PR.

@Seelengrab
Copy link
Contributor

I've said my peace here, will let others discuss there

It would be great if you could cross post your stance to discourse, both so I don't accidentally misrepresent it and so that others can follow your argument without having to read through this very long PR.

@jariji
Copy link
Contributor

jariji commented Aug 20, 2023

I think the Dates docs introductory section needs to be rewritten if this gets merged. For example

Both Date and DateTime are basically immutable Int64 wrappers.

combined with /'s

/(x, y)

Right division operator: multiplication of x by the inverse of y on the right. Gives floating-point results for integer arguments.

to my eyes implies that Date division returns a floating-point value, and that's not what this PR does.

That whole section, with the analogy to Java, all the references to Int64, etc seem misleading and at least in spirit inconsistent with the new interpretation as fixed point.

@Seelengrab
Copy link
Contributor

Seelengrab commented Aug 20, 2023

If the new interpretation is inconsistent with existing, documented behavior, to the degree that it would require a large rewrite of that documentation, isn't that a good indicator for this not being a good/correct interpretation at all? Are we allowed to change the interpretation of any datatype in a non-breaking release, as long as we also "fix" the documentation to remove any mention of the old interpretation?

How much is the stability of our docs worth here?

@anowacki
Copy link
Contributor

If Date would truncate just like DateTime does, Julia would claim that you have to meet them at midnight, because Date wraps a UTInstant{Day} and simply can't represent values with non-integral Day. Luckily, Julia errors, saving you from a potentially disastrous bug that's silently giving the wrong result in a calendar application.

Nice example! There is no reason we couldn't define this to be +(::Date, ::Hour) -> DateTime

@timholy From my point of view, the reason not to do this is that a my understanding of a Date is that it is a specific day in history, not any particular time on that day (i.e., not 00:00). That is why currently you do DateTime(Day(2000, 1, 1)) + Hour(1), and I don't think that should change. If you wanted to represent midnight at the start of that day, then you can do DateTime(2000, 1, 1) and instead get a type which represents a full day and time.

I am not cognoscent of all the issues around maintaining a stdlib, but it does strike me that changing truncation to rounding will change people's programs, so I feel merging this carries some risk of breaking code. I agree that getting feedback from heavy users of Dates is a good course of action. I don't know that I qualify as a 'heavy user', however.

@timholy timholy merged commit a4309ca into JuliaLang:master Aug 22, 2023
5 of 7 checks passed
@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

Thanks, @barucden!

Note that all this does is switch from truncation to rounding. So now() + Microsecond(10^3 - 1) will be 1ms ahead of now().

@Seelengrab
Copy link
Contributor

Seelengrab commented Aug 22, 2023

Come on, there was a HUGE amount of discussion on discourse, you yourself said you want to remove yourself from this and now you just merge this? I thought we arrived at a consensus that increasing precision of DateTime was the best course of action!

Potentially rounding up is even worse than truncation, because now you can't guarantee that addition won't overcompensate! That is a big problem..

@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

Triage discussed it and made a decision. This implemented the decision. A consensus by people who have been contributing to Julia for years, and who have an established track record in designing large, coherent software stacks trumps any discourse discussion.

It is too bad that there is no way to make everyone happy here, but the amount of discussion over a tiny change is a bit excessive. Compare #31922, which also changed answers that Julia delivers for some operations but which somehow avoided becoming a big deal.

@Seelengrab
Copy link
Contributor

Seelengrab commented Aug 22, 2023

And that decision by triage is completely intransparent, it's unknown who participated, what actually was discussed and how the conclusion was reached, whether the arguments brought forth in this PR have been addressed, whether there was anyone with your requested DateTime experience present.. Not to mention that you explicitly said you wanted feedback from that discourse thread (which you got!). This is a very weak justification. I didn't have time last week to make triage - I can see if I make it next week.

@Seelengrab
Copy link
Contributor

I also want to point out that you added the triage label 3 hours before triage happened on that very same day. That is not at all enough time to figure out if people who care about this have time to join.

@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

Not to mention that you explicitly said you wanted feedback from that discourse thread (which you got!).

As far as I can tell, the impetus for the discourse discussion came from you: #50816 (comment). My only mention of the discourse discussion was in #50816 (comment). I guess if "few" = "3 or more" I should have waited until tomorrow, apologies.

But really, the amount of discussion here is excessive. Switching from truncation to rounding switches the asymptotic error from scaling like O(n) to O(sqrt(n)) without significant cost. It's very non-controversial that this is an improvement.

@Seelengrab
Copy link
Contributor

Seelengrab commented Aug 22, 2023

There shouldn't be any error at all though, which is why the discourse discussion settled on "let's improve the precision, to capture Nanosecond and prevent the problem entirely"! That's also what the original issue (#50785) this PR was meant to fix was about - making sure this arithmetic is exact. So this PR doesn't end up solving any issue, but just makes new ones, because now we potentially overcompensate in the other direction. That is strictly worse behavior.

@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

Sure, that's why I gave you a thumbs up on reopening that issue. You're welcome to work on a PR that fixes it for good. This is a manifest improvement from where we are now, and merging it does nothing to prevent future improvements.

@Seelengrab
Copy link
Contributor

No, it is not an improvement. Truncation is preferable to rounding - there certainly is an expectation that adding something to a DateTime won't make the result larger than that step. Why would I ever want to have timestamp arithmetic that gives me more than I asked for?

@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

Because if you add a million Microsecond(rand(0:999)) you won't advance at all? Whereas with rounding you'll advance approximately 0.5s, which is approximately the right answer?

@Seelengrab
Copy link
Contributor

Timestamps are exact points in time (up to a given precision). They are not "approximate". "Approximately the right answer" is not ok for timestamps, especially if that error can accumulate to more than the stated precision of the timestamp.

@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

I'm afraid there's very little we seem to agree on. I wish it were different. I do hope you can channel your frustration into making further improvements to Dates that ensure users get something truly awesome before the 1.11 feature freeze forces this choice on them.

@adienes
Copy link
Contributor

adienes commented Aug 22, 2023

I still believe that this option carries the the most mental burden to the user out of all presented. Whether or not it was the right choice to merge regardless, I think this

it's very non-controversial that this is an improvement.

is clearly not true. Multiple users have weighed in on this thread and the Discourse one (even over half?) suggesting that they think rounding is not the right call (some disfavoring truncation as well, preferring an error)

  1. User laborg

I don’t like that various operations with Micro- and Nanoseconds are available without throwing a warning. Its feels like wrong advertising. If Julia allows to add periods to dates the result is expected to be accurate or the operation should error.

  1. User colintbowers

Option 3 definitely for me. I know to be careful of inexact results when I’m working with floating point numbers. For pretty much everything else I tend to assume (right or wrong) that arithmetic is exact. So basically for the reasons you said, I don’t like option 1

  1. User jules

I also find it very confusing that you can add microseconds but they get lost. I know that dates are implemented with integers under the hood (and you notice that because you cannot instantiate Second(1.2), for example). So I would assume that if addition doesn’t error, it’s exact

  1. User anowacki

Secondly, if DateTime remains a type which represents a whole number of fractions of a second (i.e., if it remains resolved to ms, µs, ns, etc.), then silent rounding should not happen when they are added to. Instead, an error should be thrown telling you to appropriately round any Dates time periods to the smallest-resolved time period that DateTime supports. This avoids bugs due to not knowing how the Dates library works in great detail, which you currently need in order to work with it.

  1. User mihalybaci

yeah, it seems like a hybrid of 2 and 3 would be be best way to increase correctness/precision without breaking things (at least from my limited understanding).

  1. User dlakelan

Performance is not the issue when using DateTime, consistency and accuracy etc are. A typical use would be for things like time-stamping events or calculating interest payments.

  1. User tbeason

As for truncation versus rounding, it is somewhat important to people in this space. However, this debate only concerns what happens when the timestamp runs out of precision [, but in any case] DateTime is pretty much currently unusable in the area when nanosecond precision is required.

  1. User JeffreySarnoff

Millisecond truncates. That is the appropriate behavior, so it should not warn (nor should it deprecate).

  1. User jariji

Silently rounding after some arbitrary level of precision chosen without regard to the user's needs is not appropriate for a general-purpose simplicity-oriented model of a quotidian concept.

and of course 9. and 10. you know how I and @Seelengrab feel



In fact, as far as I can tell, there are essentially zero proponents of rounding except for whoever was on the triage call?

@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

Almost all of those are complaints that we lose data. I don't like that either. But without making a breaking change or promoting to a more-precise time type, there's literally nothing we can do about that---we can't actually disable those methods until Dates 2.0.

But let's be very, very precise here. None of those comments address the following question: if you have to live in a world you hate, which is worse, truncation or rounding? This is a very narrow question, and the one that actually faces us in this PR. While all of us might rather not live in that world, this is what we've got to deal with. This PR does nothing except decrease the rate at which we lose data, and thus is orthogonal to all of the comments you quoted. #50785 is still open and I support that.

Think of it this as a battle plan:

  1. Make an easy change that makes things slightly less horrible than where we are now
  2. Wait for some hero to really fix the situation

We just completed step 1. That doesn't mean we're not waiting for PRs to implement step 2. But depending on whether the solution is breaking or non-breaking, that could be multiple release cycles away.

So rather than be held hostage to a never-ending discussion, let's get a simple win now and we can keep talking about the longer-term picture. There's no reason @barucden should have to suffer through all this.

@adienes
Copy link
Contributor

adienes commented Aug 22, 2023

None of those comments address the following question: if you have to live in a world you hate, which is worse, truncation or rounding?

I don't wish to put words in their mouths, but I suspect that from among that list at least tbeason and possibly JeffreySarnoff would prefer truncation. I know I would

But anyway, you are right that the decision has been made and further back-and-forth probably won't be useful. Even though we don't agree in the end, I do appreciate that you took the issue seriously and took the time to respond in depth to raised concerns.

@Seelengrab
Copy link
Contributor

None of those comments address the following question: if you have to live in a world you hate, which is worse, truncation or rounding?

Barring breaking changes (hence my initial suggestion during review to use a deprecation instead), my position continues to be that truncation is better than rounding. I think I've written that twice in this PR alone, and likely mentioned it indirectly on discourse.

Regardless, I can see that this won't change your mind, just as lots of people on discourse don't, so this will be the last of me on this PR.

@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

I do know that some people have expressed a preference for truncation over rounding, and I'm genuinely sorry if you're disappointed by my decision here. But contrary to some of the expressed sentiments, this narrow issue is unaddressed by most of the discussion on discourse: a few people who have chimed in (mostly or exclusively here) on a specific technical issue and many more on the broader problems with Dates.

We've gone over this before, and presumably there is no point in rehashing it again, but to try one more time: by my reading the technical arguments for truncation are not as unambiguous as they are for rounding. Terms like "simple mental model" and "worse to increase than decrease" seem subjective and thus open to debate. In contrast, reducing the rate of error accumulation is not subjective; you can argue whether that metric has any utility ("is that what we really should be minimizing here?"), and indeed such debates can be extremely important. There are indeed cases where one can come up with two objectively-good things and have them not be simultaneously achievable (e.g., https://en.wikipedia.org/wiki/Arrow%27s_impossibility_theorem), and in that case the discussion has to turn about which of these objective goals has greater value, placing it back in the realm of the subjective.

But this is not such a case. So far, the only objective criterion that has been advanced in this discussion strongly favors rounding. It's quite hard to argue that a subjective preference should overrule an objective criterion, as subjective preferences are well-documented to depend strongly on past experience and two people coming to the same table may have very different past experiences. If someone proposes an objective criterion that obviously favors rounding, it would be very appropriate to reconsider this decision.

This decision is also consistent with what Julia does elsewhere. When we compare log2(BigFloat(x)) to log2(Float64(x)), what we ask is that the digits in the Float64 result agree with the BigFloat result up to rounding rather than truncation. Indeed, we think of any other answer as a bug, and we merge PRs that fix it. That's all I did here. It's just that most such PRs don't seem to generate such heated debate. (I should acknowledge that a lot of them did get merged before Julia 1.0, when breaking changes were allowed, but the process continues in some cases as I pointed out above.)

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Aug 22, 2023

I always fly merged airways 🍭.

@laborg
Copy link
Contributor

laborg commented Aug 23, 2023

I'm quite unhappy about the course of this PR and do support the previous behaviour of truncating (best: error or depwarn). Date times represent starting points of intervals not single points in time. Let's say it's the last day of January and you are asked for the current month you would not round up and say "February", right?

@timholy
Copy link
Sponsor Member

timholy commented Aug 23, 2023

In German, if it's 6:30 you say "halb sieben," which translates as "half before 7." This is why I say such views are subjective, not objective.

Nevertheless, the interval-perspective might be the most interesting countervailing argument anyone has raised yet. Deserves thought. The thing I hesitate about is that this perspective suggests some element of rigorous analysis. Truncation can guarantee the left edge but not the right edge; you'd be wrong to say that t + 0.8ms + 0.8ms < t + 1ms. Why is the left edge important but not the right? To continue your line of thinking, say you get paid on Jan 31st: if you make a bunch of financial plans to spend all that money in the month of January, and you get started spending it Jan 1, you're in big trouble, right?

@timholy
Copy link
Sponsor Member

timholy commented Aug 23, 2023

Thinking about this a bit more, the bottom line is there is no good solution with what we have now. I'd urge anyone who is sufficiently unhappy with the outcome here to just go ahead and implement PrecisionDateTime that's 128-bit and submit a PR. My reading of the discourse thread is that we can't make DateTime 128-bit by default---the costs in data size are too large. But we can have a PrecisionDateTime, and we can promote +(::DateTime, ::Union{Microsecond,Nanosecond}) to return a PrecisionDateTime. I think that's non-breaking as long as PrecisionDateTime behaves the same way as DateTime; it's a bit of a gray area in terms of whether changing return type is allowed, but we tend to say "types are an implementation detail" and it's behavior that we're protected. The fact that we're already pretty borked here may also justify it essentially as a bugfix.

That still won't fix range(now()-Year(5), stop=now(), length=1001), but that can be treated separately.

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Aug 23, 2023 via email

@LilithHafner LilithHafner removed the status:triage This should be discussed on a triage call label Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:dates Dates, times, and the Dates stdlib module kind:deprecation This change introduces or involves a deprecation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DateTime + Nanosecond loses information