Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TimeDelta format quantity_str for Year Day Hour Minute Sec string #15264

Merged
merged 24 commits into from Oct 19, 2023

Conversation

taldcroft
Copy link
Member

@taldcroft taldcroft commented Sep 1, 2023

Description

This pull request adds a new string TimeDelta format "quantity_str" that represents the time delta as a string with one or more Quantity components. This format provides a human-readable multi-scale string representation of a time delta. It is convenient for applications like a configuration file or a command line option. It is NOT intended for high-precision applications since the internal calculations are done using 64-bit floating point numbers.

The driver for this is that it can be convenient to have a human-readable string representing a time delta which has natural scales so that delta from seconds to years are manageable. A string representation is useful in configuration files and application command line inputs.

This PR defines a new format which is like "[-+]? 1yr 2d 3hr 4min 5.6s". The format is specified as follows:

  • The string is a sequence of one or more components.
  • Each component is a number (float or int) followed by an astropy unit of time.
  • For input, whitespace within the string is allowed but optional.
  • For output, there is a single space between components.
  • The allowed components are listed below.
  • The order (yr, d, hr, min, s) is fixed but individual components are optional.

The allowed component units are shown below:

  • "yr": years (365.25 days)
  • "d": days (24 hours)
  • "hr": hours (60 minutes)
  • "min": minutes (60 seconds)
  • "s": seconds

These definitions correspond to physical units of time and are NOT calendar date intervals. Thus adding "1yr" to "2000-01-01 00:00:00" will give "2000-12-31 06:00:00" instead of "2001-01-01 00:00:00".

The format is defined internally as a single regular expression that could be used in other non-Python implementations for parsing.

Output subformats

The quantity_str format supports output sub-formats "multi", "yr", "d", "hr", "min", "s" to allow specifying the string output as a single unit or in the default multi-unit format.

The precision attribute (default=3) applies to the specified subformat. For example TimeDelta("100.0d 1.0123456789012345s", precision=9, out_subfmt="d").value is "100.000011717d".

Why not re-use an existing standard?

The only other existing standard that I am aware of for time intervals is defined in ISO 8601 durations that represent the duration between two dates. The ISO 8601 duration format is: P(n)Y(n)M(n)DT(n)H(n)M(n)S. The fundamental problem with this format is that it explicitly depends on the reference start date as a calendar date in Year Month day hour min sec. So this is not suitable for physical time deltas that are independent of the reference date.

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@github-actions github-actions bot added the time label Sep 1, 2023
@github-actions
Copy link

github-actions bot commented Sep 1, 2023

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • Is a milestone set? Milestone must be set but we cannot check for it on Actions; do not let the green checkmark fool you.
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@github-actions
Copy link

github-actions bot commented Sep 1, 2023

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@pllim pllim added this to the v6.0 milestone Sep 1, 2023
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a tricky format. It is definitely most logical to use a year of 365.25 days (note the docstring states 365 or 366!). But will users really expect the following?

In [4]: td = TimeDelta(np.arange(364, 368), format='jd')

In [5]: td.ydhms
Out[5]: array(['+364d', '+365d', '+1yr 18hr', '+1yr 1d 18hr'], dtype='<U12')

I noticed that datetime.timedelta does not have years (or months), perhaps for this reason (it does have weeks). Though that format is pretty weird, with the sign applying just to the days, so one gets

In [9]: print(timedelta(hours=-6))
-1 day, 18:00:00

Anyway, that format we have already:

In [13]: np.vectorize(str)(td.datetime)
Out[13]: 
array(['364 days, 0:00:00', '365 days, 0:00:00', '366 days, 0:00:00',
       '367 days, 0:00:00'], dtype='<U17')

Overall, my tendency would be to not use years at all (at least not by default). I think I'd also shorten the notation, to use 01h02m03.04s for the time at least (similar to what is done for RA), and maybe just prepend sNNNd.

Also, I wonder a bit about the name: TimeYMDHMS returns a recarray - that would suggest TimeDeltaYDHMS should do the same. Though then the sign becomes annoying, like for angle tuples we just deprecated... In analogy with TimeYearDay (format yday), maybe TimeDeltaYearDay?

Anyway, not really sure what is the right solution! I do think an unambiguous string format would be nice to have, but wish there was a sensible standard we could simply implement, rather than rolling our own.

p.s. Looking a bit further, I found https://learn.microsoft.com/en-us/dotnet/standard/base-types/standard-timespan-format-strings - so, microsoft uses ±ddd.hh:mm:ss.ffff. Rather weird...

@taldcroft
Copy link
Member Author

@mhvk - interesting about the microsoft format. It just seems pretty unreadable, and I don't understand why they made the different flavors. Especially one with a period after the days and one with a colon after days. This format also does not formally support time deltas to astropy precision since it is limited to microsec.

About not supporting year and just use days. My thinking there was that year is a pretty common time unit in astronomy. Imagine a config file for some simulation that needs a time scale in years.

One alternate idea is to simply support any single float and astropy unit of time with space being optional, so e.g. 1e8yr, 2d, 3 wk, 3.1312311 wk, 1e18s. The main question would be the output representation. This could be controlled by out_subfmt to specify a unit, or by default choose the largest unit where the number is greater than 1.0 (or something like that?). This is certainly simpler and doesn't feel like inventing a new format.

@mhvk
Copy link
Contributor

mhvk commented Sep 4, 2023

A single unit string would be a possibility. Right now we have

In [6]: td.to(u.s).to_string()
Out[6]: '[31449600. 31536000. 31622400. 31708800.] s'

In [7]: td.to(u.yr).to_string()
Out[7]: '[0.99657769 0.99931554 1.00205339 1.00479124] yr'

But this is not an array of string. Possibly, Quantity should gain that option... Especially since Angle does have it (an inconsistency that has long annoyed me...). So, right now one would have to do,

In [9]: [str(_t) for _t in td.to(u.yr)]
Out[9]: 
['0.9965776865160848 yr',
 '0.999315537303217 yr',
 '1.002053388090349 yr',
 '1.0047912388774811 yr']

Of course, this would still loose precision, so I think it makes sense to just expand our long-string ability from TimeNumeric, which already works for format='jd' and 'sec':

In [17]: td = TimeDelta(np.arange(364, 368), 1e-10, format='jd')

In [18]: td.to_value('jd', subfmt='str')
Out[18]: 
array(['364.0000000001', '365.0000000001', '366.0000000001',
       '367.0000000001'], dtype='<U14')

One option would be to recognize td.to_value(unit, subfmt='str') (this currently errors, so we are free to adjust it). Not passing in a unit with subfmt='str' could in principle use a "logical" unit is chosen. Or one could put in u.physical.time for as a signal that a "good" unit should be chosen.

Although a proper format would make round-tripping easier. The only annoyance then is that somehow the format instance has to carry the unit. As you note, subfmt might be good for that, but that then looses the option to use it to select str, bytes, longdouble, etc.

@taldcroft
Copy link
Member Author

taldcroft commented Sep 4, 2023

@mhvk - it seems your focus has been on leveraging existing pathways for generating string representations of a TimeDelta, but for me the primary driver is instantiating a TimeDelta from a string. For that I think that a new format is the only way.

For me this is really a string format so I don't think TimeNumeric has any applicability here. Allowing bytes input is easy, but I think if the user needs bytes output they can do that themselves.

Fundamentally I think the key decision is having a multi-component format or something that looks basically like the string repr of Quantity.

Multi-component like -3d2h14m12.232342s (with format details TBD)

  • In this case precision would apply to the seconds component by default and so precision to 1e-9 sec is possible for arbitrarily long durations.
  • More complicated.
  • It's really a new format "standard" for time delta representation, though with plenty of precedent in astronomy.
  • The format you suggested with h, m, s is potentially ambiguous with an hour angle (if there is no other context). That's part of why I originally suggested the longer format with the labels being exactly the astropy unit strings.
  • out_subfmt=<unit> could force output in a single unit, with precision applying to that unit.
  • No need to infer out_subfmt based on the input, which seems cleaner to me.
  • Setting out_subfmt="yr" and precision=5 would give an output like 12.12345y (or 12.12345 yr if we used the longer unit string labels, thus giving an output that looks like a Quantity str.
  • We could also define a special out_subfmt that will force using a single unit in the output string which is chosen for each element according to the same rule that the Quantity-like. E.g. ["12.12345 yr", "8.5 hr", "3.0 s"].

Quantity-like 12.12345 yr

  • Simpler.
  • Strictly speaking this would be a new format "standard", but the overlap with Quantity representation makes it seem like not really new.
  • Precision would apply to the unit, so precision=5 could give 12.12345 yr.
  • It's a little messy with the out_subfmt. Imagine TimeDelta(["12.12345 yr", "0.5 s"]). Probably the answer is to infer the longest applicable unit but whatever choice we make will be a bit arbitrary and may surprise someone.

All in all I think multi-component is more flexible since it includes Quantity-like as a option, but there is always an argument for "simple is better than complex". Thoughts?

@mhvk
Copy link
Contributor

mhvk commented Sep 4, 2023

Agreed that reading the format back in is a requirement, and a separate format is indeed probably the best route.

My sense would be to start with the simple, single-unit case, in part because I don't see any ambiguity in how it is defined, while for multi-unit both the definition of year and how signs are interpreted are potentially confusing.

That said, I think the breaking up in days and times is something that would be nice to have, so we should try to ensure we leave the use of subfmt flexible enough that it can be added later.

@taldcroft
Copy link
Member Author

taldcroft commented Sep 4, 2023

I think if we make this format explicitly tied to astropy Quantity and the corresponding unit names then there is no ambiguity about the year definition even for the multi-component form. We are already accepting that implicit definition for the Quantity-like version 12.2 yr.

Likewise I don't really think the sign issue is a blocker for astronomers who all understand how to read -12d 30m 42.5s.

Having said that, I'm willing to go with Quantity-like, but only if we are agreed on not doing the multi-component later. I think it would be a mistake to patch Quantity-like later to handle multi-component with a sub-format since that would be messy [1]. Instead if we start with multi-component now then Quantity-like is a natural and simple sub-format of multicomponent (restricting to just one possible unit instead of all of them).

[1] The format name might even be quantity since it would be natural and nice to exactly share the string repr. Expanding to multi-component just wouldn't work. Also, "sub-format" really implies narrowing the format, not expanding it.

@taldcroft
Copy link
Member Author

And this might be the right point to invite comment on astropy-dev. I had expected this might be controversial.

@mhvk
Copy link
Contributor

mhvk commented Sep 4, 2023

@taldcroft - I'm not sure I see the problem with possible later expansion, but will admit I have not thought this through very well (am on holidays and just occasionally checking what's new). I think it is a good idea to ask at astropy-dev - in the end, we should be driven by the use cases people have!

@taldcroft
Copy link
Member Author

I'm not sure I see the problem with possible later expansion

Maybe I'm being a bit pedantic, but -3d 2hr 14min 12.232342s does not seem like a sub-format of a format like 3.12312 d. It feels more like a super-format or something entirely new.

@mhvk
Copy link
Contributor

mhvk commented Sep 5, 2023

I guess I saw it as effectively something we discussed for Angles too, where one has a "format string" that includes options like "hms". But let's see whether we get input on what is preferred first!

@taldcroft
Copy link
Member Author

@mhvk - having received no input from the astropy-dev solicitation, I am pressing ahead with the full-featured implementation to see how it would look. The code is in a better state now and I have uploaded a notebook that I am using to play around with this:
https://gist.github.com/taldcroft/8ef284f25fc8671bfd878793310f0bb8

You'll notice some creep into core functionality changes (you know how that goes). We can always separate those out into separate PR's if needed.

@taldcroft taldcroft marked this pull request as ready for review October 10, 2023 16:03
@taldcroft taldcroft changed the title Add TimeDelta format for Year Day Hour Minute Sec string Add TimeDelta format quantity_str for Year Day Hour Minute Sec string Oct 10, 2023
@taldcroft taldcroft requested a review from mhvk October 10, 2023 16:30
@taldcroft
Copy link
Member Author

@mhvk - this is ready for review. I've updated the PR description, added tests, and fixed some initial issues. The only thing obviously missing are the What's New and change log.

@taldcroft
Copy link
Member Author

BTW the Python 3.9 failure looks to be unrelated.

@pllim
Copy link
Member

pllim commented Oct 10, 2023

SAMP timeout expired failure in oldest deps job is indeed unrelated. I think it is transient.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sold on including years in the multi-format, and feel strongly the default should be days, hours, minutes, and seconds. I suggest inline to have "dhms" for this as a subfmt. I'm OK, though, with adding "ydhms" as a non-default format - for people who understand what they are doing, and that 1yr = 365.25 d - this will not be what most people think when they see it (and they won't read the manual to check).

Otherwise, this looks all pretty good, and my main comments are that one might as well try to keep precision, as it is relatively trivial to do so!

astropy/time/core.py Show resolved Hide resolved
astropy/time/core.py Show resolved Hide resolved
astropy/time/core.py Show resolved Hide resolved
astropy/time/formats.py Show resolved Hide resolved
astropy/time/formats.py Outdated Show resolved Hide resolved
astropy/time/formats.py Outdated Show resolved Hide resolved
astropy/time/formats.py Outdated Show resolved Hide resolved
astropy/time/tests/test_basic.py Outdated Show resolved Hide resolved
astropy/time/tests/test_delta.py Show resolved Hide resolved
docs/time/index.rst Outdated Show resolved Hide resolved
@taldcroft
Copy link
Member Author

taldcroft commented Oct 12, 2023

I'm still not sold on including years in the multi-format

Let me try to sell you on this. 😄

1yr = 365.25 d - this will not be what most people think when they see it.

Do you have data to back this assertion? I think the opposite, that most astronomers would pick 365.25 as the mean number of days in a year for doing calculations (but I also have no data). This format (via the name "quantity_str") explicitly makes the connection to astropy Quantity where indeed yr is 365.25d. Has there been feedback that this is a problem for Quantity?

(and they won't read the manual to check).

Even though I understand the realities, people will have to discover this format somehow and the year to days conversion will be clear from that. We can ensure that sphinx doc examples highlight this case, so that any search result shows this behavior.

I suggest inline to have "dhms" for this as a subfmt. I'm OK, though, with adding "ydhms" as a non-default format.

I think this adds complexity and confusion.

  • Code becomes more complex.
  • Why do we allow "ydmhs" and "dhms" but not "hms", and "ms"?
  • Explicitly enumerating the expected components "d", "h", "m", and "s" implies that they should all be in the output, bringing ambiguity to the format. Is it "1d 0h 0m 10.0s" or "1d 10.0s"?
  • Users like me would wonder why "dhms" is first. Seems arbitrary.

This would definitely surprise me:

>>> TimeDelta("1yr").value
'365d 6hr'

Given that this format supports yr as a component, I would expect to get back '1yr'. So we have to carefully explain that year is special-cased because we think it would be too confusing. Which would confuse some users.

comp = int(comp)
comps.append(f"{comp}{name}")

sec = np.round(jd * 86400.0, self.precision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related to the larger question, but this rounding at the end means you can wind up with 60.0s - do you want to special-case that? As is,

TimeDelta(365.9999999999, format='jd').quantity_str
# '1yr 17hr 59min 60.0s'

To avoid duplication, maybe consider using some of the code in astropy/coordinates/angles/formats.py; as is,

from astropy.coordinates.angles import formats
formats.hours_to_string(.99999999999*24, precision=3, sep=['hr ', 'min ', 's'])
# '24hr 00min 00.000s'

I think the rounding bit could be quite easily factored out of sexagesimal_to_string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that corner case this morning and agree this should be done right. Thanks for the heads-up on existing code, I'll have a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mhvk
Copy link
Contributor

mhvk commented Oct 12, 2023

Hmm, you're last example certainly makes a good point... Perhaps I am too worried (though I so seem to recall us having had some issues raised about it before).

But the note about omitting zeros also makes me wonder: is that generically what would be wanted? For a table of this, I think I'd rather see all of the entries, as it is easier to compare (like we do with angles). In that sense, there is perhaps a case for dhms (and ydhms) as you interpreted it: give me the value in (year +) day + sexagesimal.

@taldcroft
Copy link
Member Author

@mhvk - the idea of making this like sexagesimal opens up a new question of 1yr 2d 3hr 0min 0.0s (minimal) or 1yr 002d 03h 00min 00.000s (fixed format)? Only the latter fixed format version would format nicely in a table column.

All in all, I think it's worth remembering that the biggest driver for this format is a way to input time deltas as a string in a flexible and readable format.

For the output since there are questions, maybe we could mark this feature API as experimental and see if there is any feedback from the community in the first release?

@mhvk
Copy link
Contributor

mhvk commented Oct 12, 2023

Yes, seems fair to see what people actually want! (My thought was indeed the fixed-format case.)

@mhvk
Copy link
Contributor

mhvk commented Oct 17, 2023

Sounds reasonable. Note that I'm stuck writing my next 5-year research grant (due Nov 1st) and will have very limited time for review. I think this was in good shape already, and am happy to try to review, but I better hold off looking further until you ping me explicitly.

@taldcroft taldcroft requested a review from mhvk October 18, 2023 11:24
@taldcroft
Copy link
Member Author

@mhvk - as far as I can see I have addressed all the review comments, fixed coverage issues, etc. The CI just started however so I need to see if that is good.

@taldcroft
Copy link
Member Author

BTW I definitely prefer to have the commits in this PR squashed. This has wandered a bit and there are a lot of small commits with little value on their own.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I think we could keep precision for the single-unit case fairly easily but think it is fine to postpone that to follow-up, so I'll approve now.

astropy/time/formats.py Show resolved Hide resolved

def parse_string(self, timestr):
"""Read time from a single string"""
# Datetime components required for conversion to JD by ERFA, along
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks like a copy & paste error...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

comps = self.get_multi_comps(jd1, jd2)

else:
value = (jd * u.day).to_value(subfmt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, we don't have to loose precision here...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another day. 😄

@@ -1257,11 +1261,22 @@ Use of the |TimeDelta| object is illustrated in the few examples below::
>>> t1 + 1 * u.hour
<Time object: scale='utc' format='iso' value=2010-01-01 01:00:00.000>

# Human-readable multi-scale format for string representation of a time delta.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take the comment out of the block so there can be a proper link to TimeDeltaQuantityStr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pllim
Copy link
Member

pllim commented Oct 19, 2023

Is this merge-able?

@mhvk
Copy link
Contributor

mhvk commented Oct 19, 2023

OK, let's get this in. Thanks, @taldcroft!

p.s. Not sure what is up with code coverage these days - so many lines marked uncovered even though they clearly are.

@mhvk mhvk merged commit 082c6b5 into astropy:main Oct 19, 2023
25 of 26 checks passed
@pllim
Copy link
Member

pllim commented Oct 19, 2023

I think the codecov upload failed. It can be flaky. Don't worry about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants