New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TimeDelta format quantity_str
for Year Day Hour Minute Sec string
#15264
Conversation
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
👋 Thank you for your draft pull request! Do you know that you can use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a tricky format. It is definitely most logical to use a year of 365.25 days (note the docstring states 365 or 366!). But will users really expect the following?
In [4]: td = TimeDelta(np.arange(364, 368), format='jd')
In [5]: td.ydhms
Out[5]: array(['+364d', '+365d', '+1yr 18hr', '+1yr 1d 18hr'], dtype='<U12')
I noticed that datetime.timedelta
does not have years (or months), perhaps for this reason (it does have weeks). Though that format is pretty weird, with the sign applying just to the days, so one gets
In [9]: print(timedelta(hours=-6))
-1 day, 18:00:00
Anyway, that format we have already:
In [13]: np.vectorize(str)(td.datetime)
Out[13]:
array(['364 days, 0:00:00', '365 days, 0:00:00', '366 days, 0:00:00',
'367 days, 0:00:00'], dtype='<U17')
Overall, my tendency would be to not use years at all (at least not by default). I think I'd also shorten the notation, to use 01h02m03.04s
for the time at least (similar to what is done for RA), and maybe just prepend sNNNd
.
Also, I wonder a bit about the name: TimeYMDHMS
returns a recarray
- that would suggest TimeDeltaYDHMS
should do the same. Though then the sign becomes annoying, like for angle tuples we just deprecated... In analogy with TimeYearDay
(format yday
), maybe TimeDeltaYearDay
?
Anyway, not really sure what is the right solution! I do think an unambiguous string format would be nice to have, but wish there was a sensible standard we could simply implement, rather than rolling our own.
p.s. Looking a bit further, I found https://learn.microsoft.com/en-us/dotnet/standard/base-types/standard-timespan-format-strings - so, microsoft uses ±ddd.hh:mm:ss.ffff
. Rather weird...
@mhvk - interesting about the microsoft format. It just seems pretty unreadable, and I don't understand why they made the different flavors. Especially one with a period after the days and one with a colon after days. This format also does not formally support time deltas to astropy precision since it is limited to microsec. About not supporting year and just use days. My thinking there was that year is a pretty common time unit in astronomy. Imagine a config file for some simulation that needs a time scale in years. One alternate idea is to simply support any single float and astropy unit of time with space being optional, so e.g. |
A single unit string would be a possibility. Right now we have
But this is not an array of string. Possibly,
Of course, this would still loose precision, so I think it makes sense to just expand our long-string ability from
One option would be to recognize Although a proper format would make round-tripping easier. The only annoyance then is that somehow the format instance has to carry the unit. As you note, |
@mhvk - it seems your focus has been on leveraging existing pathways for generating string representations of a For me this is really a string format so I don't think Fundamentally I think the key decision is having a multi-component format or something that looks basically like the string repr of Quantity. Multi-component like
|
Agreed that reading the format back in is a requirement, and a separate format is indeed probably the best route. My sense would be to start with the simple, single-unit case, in part because I don't see any ambiguity in how it is defined, while for multi-unit both the definition of year and how signs are interpreted are potentially confusing. That said, I think the breaking up in days and times is something that would be nice to have, so we should try to ensure we leave the use of |
I think if we make this format explicitly tied to astropy Likewise I don't really think the sign issue is a blocker for astronomers who all understand how to read -12d 30m 42.5s. Having said that, I'm willing to go with [1] The format name might even be |
And this might be the right point to invite comment on astropy-dev. I had expected this might be controversial. |
@taldcroft - I'm not sure I see the problem with possible later expansion, but will admit I have not thought this through very well (am on holidays and just occasionally checking what's new). I think it is a good idea to ask at astropy-dev - in the end, we should be driven by the use cases people have! |
Maybe I'm being a bit pedantic, but |
I guess I saw it as effectively something we discussed for |
@mhvk - having received no input from the astropy-dev solicitation, I am pressing ahead with the full-featured implementation to see how it would look. The code is in a better state now and I have uploaded a notebook that I am using to play around with this: You'll notice some creep into core functionality changes (you know how that goes). We can always separate those out into separate PR's if needed. |
54035db
to
fa25751
Compare
quantity_str
for Year Day Hour Minute Sec string
@mhvk - this is ready for review. I've updated the PR description, added tests, and fixed some initial issues. The only thing obviously missing are the What's New and change log. |
BTW the Python 3.9 failure looks to be unrelated. |
SAMP timeout expired failure in oldest deps job is indeed unrelated. I think it is transient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sold on including years in the multi-format, and feel strongly the default should be days, hours, minutes, and seconds. I suggest inline to have "dhms" for this as a subfmt. I'm OK, though, with adding "ydhms" as a non-default format - for people who understand what they are doing, and that 1yr = 365.25 d - this will not be what most people think when they see it (and they won't read the manual to check).
Otherwise, this looks all pretty good, and my main comments are that one might as well try to keep precision, as it is relatively trivial to do so!
Let me try to sell you on this. 😄
Do you have data to back this assertion? I think the opposite, that most astronomers would pick 365.25 as the mean number of days in a year for doing calculations (but I also have no data). This format (via the name "quantity_str") explicitly makes the connection to astropy
Even though I understand the realities, people will have to discover this format somehow and the year to days conversion will be clear from that. We can ensure that sphinx doc examples highlight this case, so that any search result shows this behavior.
I think this adds complexity and confusion.
This would definitely surprise me:
Given that this format supports |
astropy/time/formats.py
Outdated
comp = int(comp) | ||
comps.append(f"{comp}{name}") | ||
|
||
sec = np.round(jd * 86400.0, self.precision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly related to the larger question, but this rounding at the end means you can wind up with 60.0s
- do you want to special-case that? As is,
TimeDelta(365.9999999999, format='jd').quantity_str
# '1yr 17hr 59min 60.0s'
To avoid duplication, maybe consider using some of the code in astropy/coordinates/angles/formats.py
; as is,
from astropy.coordinates.angles import formats
formats.hours_to_string(.99999999999*24, precision=3, sep=['hr ', 'min ', 's'])
# '24hr 00min 00.000s'
I think the rounding bit could be quite easily factored out of sexagesimal_to_string
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that corner case this morning and agree this should be done right. Thanks for the heads-up on existing code, I'll have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Hmm, you're last example certainly makes a good point... Perhaps I am too worried (though I so seem to recall us having had some issues raised about it before). But the note about omitting zeros also makes me wonder: is that generically what would be wanted? For a table of this, I think I'd rather see all of the entries, as it is easier to compare (like we do with angles). In that sense, there is perhaps a case for |
@mhvk - the idea of making this like sexagesimal opens up a new question of All in all, I think it's worth remembering that the biggest driver for this format is a way to input time deltas as a string in a flexible and readable format. For the output since there are questions, maybe we could mark this feature API as experimental and see if there is any feedback from the community in the first release? |
Yes, seems fair to see what people actually want! (My thought was indeed the fixed-format case.) |
266405f
to
f07ade2
Compare
Sounds reasonable. Note that I'm stuck writing my next 5-year research grant (due Nov 1st) and will have very limited time for review. I think this was in good shape already, and am happy to try to review, but I better hold off looking further until you ping me explicitly. |
@mhvk - as far as I can see I have addressed all the review comments, fixed coverage issues, etc. The CI just started however so I need to see if that is good. |
BTW I definitely prefer to have the commits in this PR squashed. This has wandered a bit and there are a lot of small commits with little value on their own. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I think we could keep precision for the single-unit case fairly easily but think it is fine to postpone that to follow-up, so I'll approve now.
astropy/time/formats.py
Outdated
|
||
def parse_string(self, timestr): | ||
"""Read time from a single string""" | ||
# Datetime components required for conversion to JD by ERFA, along |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment looks like a copy & paste error...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
comps = self.get_multi_comps(jd1, jd2) | ||
|
||
else: | ||
value = (jd * u.day).to_value(subfmt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, we don't have to loose precision here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another day. 😄
docs/time/index.rst
Outdated
@@ -1257,11 +1261,22 @@ Use of the |TimeDelta| object is illustrated in the few examples below:: | |||
>>> t1 + 1 * u.hour | |||
<Time object: scale='utc' format='iso' value=2010-01-01 01:00:00.000> | |||
|
|||
# Human-readable multi-scale format for string representation of a time delta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we take the comment out of the block so there can be a proper link to TimeDeltaQuantityStr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Is this merge-able? |
OK, let's get this in. Thanks, @taldcroft! p.s. Not sure what is up with code coverage these days - so many lines marked uncovered even though they clearly are. |
I think the codecov upload failed. It can be flaky. Don't worry about it. |
Description
This pull request adds a new string
TimeDelta
format"quantity_str"
that represents the time delta as a string with one or more Quantity components. This format provides a human-readable multi-scale string representation of a time delta. It is convenient for applications like a configuration file or a command line option. It is NOT intended for high-precision applications since the internal calculations are done using 64-bit floating point numbers.The driver for this is that it can be convenient to have a human-readable string representing a time delta which has natural scales so that delta from seconds to years are manageable. A string representation is useful in configuration files and application command line inputs.
This PR defines a new format which is like
"[-+]? 1yr 2d 3hr 4min 5.6s"
. The format is specified as follows:The allowed component units are shown below:
These definitions correspond to physical units of time and are NOT calendar date intervals. Thus adding "1yr" to "2000-01-01 00:00:00" will give "2000-12-31 06:00:00" instead of "2001-01-01 00:00:00".
The format is defined internally as a single regular expression that could be used in other non-Python implementations for parsing.
Output subformats
The
quantity_str
format supports output sub-formats"multi", "yr", "d", "hr", "min", "s"
to allow specifying the string output as a single unit or in the default multi-unit format.The
precision
attribute (default=3) applies to the specified subformat. For exampleTimeDelta("100.0d 1.0123456789012345s", precision=9, out_subfmt="d").value
is"100.000011717d"
.Why not re-use an existing standard?
The only other existing standard that I am aware of for time intervals is defined in ISO 8601 durations that represent the duration between two dates. The ISO 8601 duration format is: P(n)Y(n)M(n)DT(n)H(n)M(n)S. The fundamental problem with this format is that it explicitly depends on the reference start date as a calendar date in Year Month day hour min sec. So this is not suitable for physical time deltas that are independent of the reference date.