Nanosecond precision loss in 'parseDouble' #12

Closed
lsilvest opened this Issue Jan 27, 2017 · 17 comments

Projects

None yet

2 participants

@lsilvest

parseDouble in some cases loses the last digits of the nanoseconds.

Example:

RcppCCTZ::parseDouble("2120-01-01T00:00:59.987654321+00:00")

##           [,1]      [,2]
## [1,] 4733510459 987653809

instead of the expected result:

##           [,1]      [,2]
## [1,] 4733510459 987654321

The problem comes from the coercion to double in parseDouble (utilities.cpp, lines 275-276):

double secs = std::trunc(nanoseconds / 1.0e9);
auto nanos = nanoseconds - static_cast<int64_t>(secs * 1e9);

Maybe the following works?

auto secs = nanoseconds / 1000000000;
auto nanos = nanoseconds - secs * 1000000000;

bug.report information:

Package: RcppCCTZ
 Version: 0.2.0
 Maintainer: Dirk Eddelbuettel <edd@debian.org>
 Built: R 3.3.1; x86_64-pc-linux-gnu; 2017-01-27 16:54:31 UTC; unix

R Version:
 platform = x86_64-pc-linux-gnu
 arch = x86_64
 os = linux-gnu
 system = x86_64, linux-gnu
 status = 
 major = 3
 minor = 3.1
 year = 2016
 month = 06
 day = 21
 svn rev = 70800
 language = R
 version.string = R version 3.3.1 (2016-06-21)
 nickname = Bug in Your Hair

Locale:
 LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

Search Path:
 .GlobalEnv, package:nanotime, ESSR, package:stats, package:graphics,
 package:grDevices, package:utils, package:datasets, package:methods,
 Autoloads, package:base
@eddelbuettel
Owner

Thanks for the report -- appreciate the detailed look at it.

Could you test your suggestion on a few input values and try a before/after?

@lsilvest
@eddelbuettel
Owner

I'm still at work and right now a little stunned by your ztsdb.org site. We'll probably email or message some more in the future.

What's your use case?

@eddelbuettel
Owner

Good suggestion for a fix:

R> library(RcppCCTZ)
R> packageVersion("RcppCCTZ")
[1] ‘0.2.0.1R> parseDouble("2120-01-01T00:00:59.987654321+00:00")
           [,1]      [,2]
[1,] 4733510459 987654321
R> 

Commited in fe8d609.

@lsilvest
@eddelbuettel
Owner
eddelbuettel commented Jan 28, 2017 edited

I work in finance and am aware of what's out there. This is rather interesting to me (of course because of the tooling).

Build just failed on 16.10 / amd64. It does not tolerate crpcut missing. Once installed, the parser does not build:

/home/edd/git/ztsdb/src/main_parser/../cow_ptr.hpp:138:30:   required from ‘arr::cow_ptr<T> arr::make_cow(unsigned int, Args&& ...) [with T = arr::Array<bool, std::less<bool> >; Args = {arr::Vector<long unsigned int, std::less<long unsigned int> >, arr::Vector<bool, std::less<bool> >, std::vector<arr::Vector<arr::ZString<128>, std::less<arr::ZString<128> > >, std::allocator<arr::Vector<arr::ZString<128>, std::less<arr::ZString<128> > > > >}]’
/home/edd/git/ztsdb/src/main_parser/../ast.hpp:404:47:   required from ‘EData<t, T>::EData(T, loc_t) [with Etype t = (Etype)5u; T = bool; loc_t = yy::location]’
parser.y:251:62:   required from here
/home/edd/git/ztsdb/src/main_parser/../array.hpp:269:19: error: ‘accumulate’ was not declared in this scope
src/main_parser/CMakeFiles/main_parser.dir/build.make:76: recipe for target 'src/main_parser/CMakeFiles/main_parser.dir/parser.cpp.o' failed
make[3]: *** [src/main_parser/CMakeFiles/main_parser.dir/parser.cpp.o] Error 1
CMakeFiles/Makefile2:187: recipe for target 'src/main_parser/CMakeFiles/main_parser.dir/all' failed
make[2]: *** [src/main_parser/CMakeFiles/main_parser.dir/all] Error 2
CMakeFiles/Makefile2:106: recipe for target 'src/CMakeFiles/ztsdb.dir/rule' failed
make[1]: *** [src/CMakeFiles/ztsdb.dir/rule] Error 2
Makefile:173: recipe for target 'ztsdb' failed
make: *** [ztsdb] Error 2
edd@brad:~/git/ztsdb/build(master)$ 

Can you enable issue tickers over at Gitlab?

@lsilvest
@eddelbuettel
Owner

I am not sure. Looks like a compiler error to me : "failing to create parser.cpp.o". But bison may be involved.

Very vanilla machine. If you could fire up Docker to replicate/chase I would appreciate it. If need be I could downgrade to another bison or flex version. I'll also try on my main box which is still 16.04.

And issue tickets at Gitlab, pretty please?

@lsilvest
@eddelbuettel
Owner

I looked more closely, it is a 'pickier newer compiler' issue. Once I added #include <numeric> to array.hpp, it passed that file.

Now choking on

Scanning dependencies of target ztsdb_lib
[ 20%] Building CXX object src/CMakeFiles/ztsdb_lib.dir/timezone/localtime.cpp.o
[ 20%] Building CXX object src/CMakeFiles/ztsdb_lib.dir/timezone/zone.cpp.o
/home/edd/git/ztsdb/src/timezone/zone.cpp: In member function ‘std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long int, std::ratio<1l, 1000000000l> > >::duration tz::Zone::getoffset(Global::dtime, int&, int) const’:
/home/edd/git/ztsdb/src/timezone/zone.cpp:138:17: error: ‘copysign’ is not a member of ‘std’
     auto sign = std::copysign(1, dir);
                 ^~~
/home/edd/git/ztsdb/src/timezone/zone.cpp:159:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^
src/CMakeFiles/ztsdb_lib.dir/build.make:93: recipe for target 'src/CMakeFiles/ztsdb_lib.dir/timezone/zone.cpp.o' failed
make[3]: *** [src/CMakeFiles/ztsdb_lib.dir/timezone/zone.cpp.o] Error 1
CMakeFiles/Makefile2:132: recipe for target 'src/CMakeFiles/ztsdb_lib.dir/all' failed
make[2]: *** [src/CMakeFiles/ztsdb_lib.dir/all] Error 2
CMakeFiles/Makefile2:106: recipe for target 'src/CMakeFiles/ztsdb.dir/rule' failed
make[1]: *** [src/CMakeFiles/ztsdb.dir/rule] Error 2
Makefile:173: recipe for target 'ztsdb' failed
make: *** [ztsdb] Error 2
edd@brad:~/git/ztsdb/build(master)$ 

Can you please open the ability to file bugs for YOUR project at YOUR repository rather than here?

I may switch to trying this on 16.04, looks like you are not ready yet for g++-6.2.0.

@lsilvest
@eddelbuettel
Owner

Double awesome. Building now, and will continue conversation over there...

@lsilvest

Ran the following test; it's not exhaustive, but it tests throughout
the representable date/time range including negative offsets:

library(RcppCCTZ)

res <- matrix(0, 0, 2, dimnames=list(NULL, c("nsec", "parse")))

nsec <- 1
for (y in 1678:2261)
    for (m in 1:12)
        for (d in 1:28) {
            s <- paste0(y,"-",sprintf("%02d", m), "-", sprintf("%02d", d),
                        "T00:00:01.", sprintf("%09d", nsec),"+00:00")
            
            if (s < "1970-01-01T00:00:00.000000000+00:00")
                res <- rbind(res, c(nsec, 1e9+parseDouble(s)[2]))
            else
                res <- rbind(res, c(nsec, parseDouble(s)[2]))
            
            nsec <- nsec + 1
        }

print(dim(res))
print(sum(res[,"parse"]-res[,"nsec"] != 0))

With the old code there were 50031 errors:

[1] 196224      2
[1] 50031

With the new code in branch 'bugfix/conversion' there were no errors:

[1] 196224      2
[1] 0

Also ran the test on a 32-bit VM and it was correct too.

@eddelbuettel
Owner
eddelbuettel commented Jan 29, 2017 edited

You are quite awesome, thank you. I was thinking I should add such a test to the regression. How long did this run?

(The test is a little strange as you do 196k rbinds leading to memory ops.)

But it's a start, and the result is encouraging.

@lsilvest

Even without the rbind it's too slow to be in a unit test.

This issue was only with the multiplication: 'secs * 1e9', for absolute values of secs sufficiently large that produce more significant digits than a double can hold.

So to test that case I would say that it's probably just as good of a test to take one negative and one positive that produce the loss of precision: for example "1680-07-17T00:00:01.000000000+00:00" and the previously mentioned "2120-01-01T00:00:59.987654321+00:00".

@eddelbuettel
Owner
eddelbuettel commented Jan 30, 2017 edited

Yes, and yes, we're on the same page. Was thinking I could just call set.seed() with some fixed value and then sample a number of possible offsets, use them before and after epoch -- and compare that the value is the same as the value we get from formatting and re-parsing.

The string creation and parsing will be moderately expensive but we could still try a few hundred or thousand points. When I get some time ...

@lsilvest

Yes, that would definitely be a better test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment