Error in num2date cinversion #134

rkouznetsov · 2019-11-20T14:46:23Z

Originally posted to Unidata/netcdf4-python#981, but is still reproducible with freshly-cloned cftime.

  #!/usr/bin/env python
 import numpy as np
 import cftime as nc4
 iarr=np.arange(86400, dtype=np.int32)
 units='hours since 2018-01-01 00:00:00 UTC'
 dates1 = nc4.num2date(iarr,units)[-5:]
 dates2 = nc4.num2date(iarr[-5:],units)
 print dates1 == dates2

Results in
[ True False True True False]
Should be
[ True True True True True]

Environment:
Ubuntu 18.04, python-numpy 1:1.13.3-2ubuntu1, cftime (revision ea68823).

Thank you!

The text was updated successfully, but these errors were encountered:

jswhit · 2019-11-20T16:45:34Z

Microseconds are now included in the comparison (#118) - and the 2nd and 5th dates are off by 13 microseconds. The calculation itself has an accuracy of about 100 milliseconds. Perhaps we should modify the comparison to ignore differences of less than a specified tolerance? @davidhassell - what do you think?

rkouznetsov · 2019-11-20T20:20:10Z

Hi, Thank you for prompt response. I do not think that the comparison is an issue. I can imagine a case when 13 microseconds difference matters, so comparison did a good job. I would say, that in given example the issue is in handling dates as float/double. The integer number of seconds from midnight should never result in non-zero microseconds. In unix the internal format for time (struct timeval) is two integers: seconds and microseconds. It would be great to have something like that in cftime...

jswhit · 2019-11-21T18:38:42Z

cftime time uses julian day/fractional julian day for it's arithmetic - so it's subject to usual issues of floating point precision/comparison. Probably a better way to compare dates given a specified tolerance would be something like this:

import numpy as np
import cftime as nc4
iarr=np.arange(86400, dtype=np.int32)
units='hours since 2018-01-01 00:00:00 UTC'
dates1 = nc4.num2date(iarr,units)[-5:]
dates2 = nc4.num2date(iarr[-5:],units)
datediff = [diff.microseconds for diff in dates1-dates2]
print(np.abs(datediff) < 100) # tolerance of 100 microseconds

[ True  True  True  True  True]

rkouznetsov · 2019-11-21T20:12:17Z

Thank you for the workaround! I have already implemented something like this, but such solutions in my applications tend to strike back after some time... So, please, keep the comparison. It works perfectly well.

If floating-point julian day is something that is not easy to fix, probably the dates produced by num2date could be rounded internally to, say, nearest millisecond? Then a millisecond accuracy could be declared as a feature of num2date. Makes sense?

jswhit · 2019-11-21T20:39:21Z

This is a very complicated issue that has been discussed often - there are no easy solutions. Rounding to the nearest millisecond will not give you millisecond accuracy, since a 64 bit julian day is only accurate to about O(10) milliseconds. There are certainly more accurate algorithms out there, but none that I have found so far that work will all the calendars available in the CF metadata standard.

rkouznetsov · 2019-11-22T07:56:27Z

Thanks! Indeed, climatologists are quite creative in inventing oteher-planet calendars, and there even a few-days per year accuracy ("360_day" calendar) would be tolerable by design. But for calendars like proleptic_georgian accuracy does matter. Actually, I would be fine even with one-second accuracy, as long as it results in exact values when it is possible.

The issue in the example above is that num2date returns different values of datetime depending on the length of the input array it is fed with. The input values are exactly the same...

With the latest cftime even the returned objects are different.

In [1]: import numpy as np
   ...: import cftime as nc4
   ...: iarr=np.arange(86400, dtype=np.int32)
   ...: units='hours since 2018-01-01 00:00:00 UTC'
   ...: dates1 = nc4.num2date(iarr,units)[-5:]
   ...: dates2 = nc4.num2date(iarr[-5:],units)
   ...: print dates1 == dates2
   ...: 
[ True False  True  True False]

In [2]: dates1
Out[2]: 
array([real_datetime(2027, 11, 9, 19, 0),
       real_datetime(2027, 11, 9, 20, 0, 0, 13),
       real_datetime(2027, 11, 9, 21, 0),
       real_datetime(2027, 11, 9, 22, 0),
       real_datetime(2027, 11, 9, 23, 0, 0, 13)], dtype=object)

In [3]: dates2
Out[3]: 
array([datetime.datetime(2027, 11, 9, 19, 0),
       datetime.datetime(2027, 11, 9, 20, 0),
       datetime.datetime(2027, 11, 9, 21, 0),
       datetime.datetime(2027, 11, 9, 22, 0),
       datetime.datetime(2027, 11, 9, 23, 0)], dtype=object)

In [4]: nc4.__version__
Out[4]: '1.0.4.2'

I wonder if it is desirable behaviour...

jswhit · 2019-11-22T13:58:17Z

That is caused by this

cftime/cftime/_cftime.pyx

Line 501 in cb16118

# round to nearest second if within ms_eps microseconds

which was intended to fix another problem (#78).

If you comment out the if-block here

cftime/cftime/_cftime.pyx

Line 513 in cb16118

if indxms.any():

you will see that all the dates are the same, but the calendar formatting may be messed up because the date will not land exactly on midnight.

The question is - which behaviour is less desirable?

jswhit · 2019-11-22T14:16:48Z

Sorry, I was mistaken - the cause is actually the fact that python datetime calculations are used in one instance and cftime calculations in the other. If you use only_use_cftime_datetimes=True in num2date, you will get the same answer, including the extra 13 ms. The python datetime module is used for the calculations whenever possible. Since the python datetime module only allows for positive times, the first calculation cannot use it. The logic is here

cftime/cftime/_cftime.pyx

Line 272 in cb16118

    
           elif postimes and ((calendar == 'proleptic_gregorian' and basedate.year >= MINYEAR) or \

The issue with negative times was discovered here

Unidata/netcdf4-python#659

Arguably, only_use_cftime_datetimes=True should be the default to avoid surprises like this.

rkouznetsov · 2019-11-23T11:20:18Z

Looks like a mess... Certainly, returning different objects depending on subtle differences in the input is confusing. Having two time-handling libraries, one precise, but limited in time, another imprecise, but more flexible and universal, i would prefer to explicitly force use of one or another depending on my application, may be even with different functoins. In the current implementation one can force cftime, but, unfortunately, one can not force using datetime. That seems to be a missing feature.

Automatic choice and/or fallbacks is a way to an unexpected behaviour, so if datetime is forced, an exception should be generated.

Makes sense?

rkouznetsov · 2019-11-23T11:28:00Z

Since the python datetime module only allows for positive times, the first calculation cannot use it.

That is something I did not understand. The check for postimes checks for positive interval rather than positive time. The datetime module is perfectly capable of using negative intervals (timedeltas) or multiply positive intervals with negative numbers.

The reason for this check was, probably, to guard against getting into before the datetime's MINYEAR with negative offsets.

jswhit · 2019-11-23T15:11:33Z

Yes, that was the reason.

rkouznetsov · 2019-12-19T16:48:16Z

Thank you! Just coming back to the issue. Is there a way (or can it be implemented?) to force num2date to return datetime.datetime objects?

jswhit · 2019-12-24T23:10:12Z

If use_only_cftime_datetimes=False, python datetime instances will be returned where possible (but not for negative times - extra logic would need to be added check datetime's MINYEAR for that to work).

rkouznetsov · 2019-12-25T14:40:26Z

Thank you! Indeed we regularly deal with negative times. I seek for much simpler option to get datetime instance where possible, and an exception when resulting date does not fit the datetime range. Not much logic is needed for that.

jswhit · 2019-12-28T20:14:39Z

Are you looking for a use_only_python_datetimes option for num2date and date2num that returns an exception of calendar != propleptic_gregorian and always returns python datetime instances?

Or, perhaps instead of adding a new kwarg, we could use use_only_cftimes=None.

rkouznetsov · 2019-12-30T06:46:22Z

Yep. That's what i was after. I guess, it should also work for calendar == 'standard'.

date2num is of no issue for cftime vs datetime choice. Currently it makes double, and np.round works fine for the range of datetime if integer output needed.

While it is possible to instruct num2date what is the needed return type, i wonder if there is any way for num2date to consistently guess what is the right object to return. Consider processing a bunch of files with different times. If the intelligent guess based on time values is implemented, it is perfectly possible to imagine a set of files that would result in different time objects. Such inconsistency might lead to an unexpected behaviour that is quite difficult to debug.

jswhit · 2019-12-30T16:17:20Z

We don't want to have different objects returned depending on the input - as you suggest, that leads to surprises and hard to debug errors. I can add have num2date return python datetime objects if use_only_python_datetimes=True as long as calendar=proleptic_gregorian or calendar=standard and the reference date is after 1582.

rkouznetsov · 2019-12-31T12:05:34Z

Great! That should be fine for my applications.
I guess, gregorian should be fine even before 1582 (if someone needs it). In any case, i would check resulting values after conversion rather than just the reference date..

jswhit · 2019-12-31T19:08:01Z

@rkouznetsov - could you test PR #139 (branch issue134) to make sure it does what you expect?

jswhit2 · 2020-01-31T13:18:55Z

@rkouznetsov - have you had a chance to try PR #139 (branch issue134) yet?

rkouznetsov · 2020-01-31T13:22:23Z

Yep. Sorry. Got a bit confused with github interface. Please see my comment at #139

jswhit · 2020-02-04T14:50:16Z

PR #139 merged

rkouznetsov mentioned this issue Nov 20, 2019

Error in num2date cinversion Unidata/netcdf4-python#981

Closed

This was referenced Nov 22, 2019

switch only_use_cftime_datetimes to True by default #135

Merged

always create cftime datetime instances by default in num2date #136

Closed

jswhit mentioned this issue Dec 30, 2019

add use_only_python_datetimes kwarg to num2date #139

Merged

jswhit closed this as completed Feb 4, 2020

rkouznetsov mentioned this issue Mar 19, 2020

Ncrcat misinterprets time units nco/nco#179

Closed

sebhahn mentioned this issue Apr 30, 2020

read_dates returns dtype('O') instead of dtype('<M8[ns]') TUW-GEO/pynetcf#49

Closed

spencerkclark mentioned this issue May 10, 2020

Decoding times in num2date exactly with timedelta arithmetic #171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in num2date cinversion #134

Error in num2date cinversion #134

rkouznetsov commented Nov 20, 2019

jswhit commented Nov 20, 2019

rkouznetsov commented Nov 20, 2019

jswhit commented Nov 21, 2019 •

edited

Loading

rkouznetsov commented Nov 21, 2019

jswhit commented Nov 21, 2019

rkouznetsov commented Nov 22, 2019

jswhit commented Nov 22, 2019

jswhit commented Nov 22, 2019 •

edited

Loading

rkouznetsov commented Nov 23, 2019

rkouznetsov commented Nov 23, 2019 •

edited

Loading

jswhit commented Nov 23, 2019

rkouznetsov commented Dec 19, 2019

jswhit commented Dec 24, 2019

rkouznetsov commented Dec 25, 2019

jswhit commented Dec 28, 2019 •

edited

Loading

rkouznetsov commented Dec 30, 2019

jswhit commented Dec 30, 2019 •

edited

Loading

rkouznetsov commented Dec 31, 2019

jswhit commented Dec 31, 2019

jswhit2 commented Jan 31, 2020

rkouznetsov commented Jan 31, 2020

jswhit commented Feb 4, 2020

Error in num2date cinversion #134

Error in num2date cinversion #134

Comments

rkouznetsov commented Nov 20, 2019

jswhit commented Nov 20, 2019

rkouznetsov commented Nov 20, 2019

jswhit commented Nov 21, 2019 • edited Loading

rkouznetsov commented Nov 21, 2019

jswhit commented Nov 21, 2019

rkouznetsov commented Nov 22, 2019

jswhit commented Nov 22, 2019

jswhit commented Nov 22, 2019 • edited Loading

rkouznetsov commented Nov 23, 2019

rkouznetsov commented Nov 23, 2019 • edited Loading

jswhit commented Nov 23, 2019

rkouznetsov commented Dec 19, 2019

jswhit commented Dec 24, 2019

rkouznetsov commented Dec 25, 2019

jswhit commented Dec 28, 2019 • edited Loading

rkouznetsov commented Dec 30, 2019

jswhit commented Dec 30, 2019 • edited Loading

rkouznetsov commented Dec 31, 2019

jswhit commented Dec 31, 2019

jswhit2 commented Jan 31, 2020

rkouznetsov commented Jan 31, 2020

jswhit commented Feb 4, 2020

jswhit commented Nov 21, 2019 •

edited

Loading

jswhit commented Nov 22, 2019 •

edited

Loading

rkouznetsov commented Nov 23, 2019 •

edited

Loading

jswhit commented Dec 28, 2019 •

edited

Loading

jswhit commented Dec 30, 2019 •

edited

Loading