Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nanosecond precision loss in 'parseDouble' #12

Closed
lsilvest opened this issue Jan 27, 2017 · 17 comments
Closed

Nanosecond precision loss in 'parseDouble' #12

lsilvest opened this issue Jan 27, 2017 · 17 comments

Comments

@lsilvest
Copy link
Contributor

@lsilvest lsilvest commented Jan 27, 2017

parseDouble in some cases loses the last digits of the nanoseconds.

Example:

RcppCCTZ::parseDouble("2120-01-01T00:00:59.987654321+00:00")

##           [,1]      [,2]
## [1,] 4733510459 987653809

instead of the expected result:

##           [,1]      [,2]
## [1,] 4733510459 987654321

The problem comes from the coercion to double in parseDouble (utilities.cpp, lines 275-276):

double secs = std::trunc(nanoseconds / 1.0e9);
auto nanos = nanoseconds - static_cast<int64_t>(secs * 1e9);

Maybe the following works?

auto secs = nanoseconds / 1000000000;
auto nanos = nanoseconds - secs * 1000000000;

bug.report information:

Package: RcppCCTZ
 Version: 0.2.0
 Maintainer: Dirk Eddelbuettel <edd@debian.org>
 Built: R 3.3.1; x86_64-pc-linux-gnu; 2017-01-27 16:54:31 UTC; unix

R Version:
 platform = x86_64-pc-linux-gnu
 arch = x86_64
 os = linux-gnu
 system = x86_64, linux-gnu
 status = 
 major = 3
 minor = 3.1
 year = 2016
 month = 06
 day = 21
 svn rev = 70800
 language = R
 version.string = R version 3.3.1 (2016-06-21)
 nickname = Bug in Your Hair

Locale:
 LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

Search Path:
 .GlobalEnv, package:nanotime, ESSR, package:stats, package:graphics,
 package:grDevices, package:utils, package:datasets, package:methods,
 Autoloads, package:base
@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jan 27, 2017

Thanks for the report -- appreciate the detailed look at it.

Could you test your suggestion on a few input values and try a before/after?

@lsilvest
Copy link
Contributor Author

@lsilvest lsilvest commented Jan 27, 2017

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jan 27, 2017

I'm still at work and right now a little stunned by your ztsdb.org site. We'll probably email or message some more in the future.

What's your use case?

eddelbuettel added a commit that referenced this issue Jan 27, 2017
@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jan 27, 2017

Good suggestion for a fix:

R> library(RcppCCTZ)
R> packageVersion("RcppCCTZ")
[1] ‘0.2.0.1R> parseDouble("2120-01-01T00:00:59.987654321+00:00")
           [,1]      [,2]
[1,] 4733510459 987654321
R> 

Commited in fe8d609.

@lsilvest
Copy link
Contributor Author

@lsilvest lsilvest commented Jan 27, 2017

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jan 28, 2017

I work in finance and am aware of what's out there. This is rather interesting to me (of course because of the tooling).

Build just failed on 16.10 / amd64. It does not tolerate crpcut missing. Once installed, the parser does not build:

/home/edd/git/ztsdb/src/main_parser/../cow_ptr.hpp:138:30:   required from ‘arr::cow_ptr<T> arr::make_cow(unsigned int, Args&& ...) [with T = arr::Array<bool, std::less<bool> >; Args = {arr::Vector<long unsigned int, std::less<long unsigned int> >, arr::Vector<bool, std::less<bool> >, std::vector<arr::Vector<arr::ZString<128>, std::less<arr::ZString<128> > >, std::allocator<arr::Vector<arr::ZString<128>, std::less<arr::ZString<128> > > > >}]’
/home/edd/git/ztsdb/src/main_parser/../ast.hpp:404:47:   required from ‘EData<t, T>::EData(T, loc_t) [with Etype t = (Etype)5u; T = bool; loc_t = yy::location]’
parser.y:251:62:   required from here
/home/edd/git/ztsdb/src/main_parser/../array.hpp:269:19: error: ‘accumulate’ was not declared in this scope
src/main_parser/CMakeFiles/main_parser.dir/build.make:76: recipe for target 'src/main_parser/CMakeFiles/main_parser.dir/parser.cpp.o' failed
make[3]: *** [src/main_parser/CMakeFiles/main_parser.dir/parser.cpp.o] Error 1
CMakeFiles/Makefile2:187: recipe for target 'src/main_parser/CMakeFiles/main_parser.dir/all' failed
make[2]: *** [src/main_parser/CMakeFiles/main_parser.dir/all] Error 2
CMakeFiles/Makefile2:106: recipe for target 'src/CMakeFiles/ztsdb.dir/rule' failed
make[1]: *** [src/CMakeFiles/ztsdb.dir/rule] Error 2
Makefile:173: recipe for target 'ztsdb' failed
make: *** [ztsdb] Error 2
edd@brad:~/git/ztsdb/build(master)$ 

Can you enable issue tickers over at Gitlab?

@lsilvest
Copy link
Contributor Author

@lsilvest lsilvest commented Jan 28, 2017

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jan 28, 2017

I am not sure. Looks like a compiler error to me : "failing to create parser.cpp.o". But bison may be involved.

Very vanilla machine. If you could fire up Docker to replicate/chase I would appreciate it. If need be I could downgrade to another bison or flex version. I'll also try on my main box which is still 16.04.

And issue tickets at Gitlab, pretty please?

@lsilvest
Copy link
Contributor Author

@lsilvest lsilvest commented Jan 28, 2017

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jan 28, 2017

I looked more closely, it is a 'pickier newer compiler' issue. Once I added #include <numeric> to array.hpp, it passed that file.

Now choking on

Scanning dependencies of target ztsdb_lib
[ 20%] Building CXX object src/CMakeFiles/ztsdb_lib.dir/timezone/localtime.cpp.o
[ 20%] Building CXX object src/CMakeFiles/ztsdb_lib.dir/timezone/zone.cpp.o
/home/edd/git/ztsdb/src/timezone/zone.cpp: In member function ‘std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long int, std::ratio<1l, 1000000000l> > >::duration tz::Zone::getoffset(Global::dtime, int&, int) const’:
/home/edd/git/ztsdb/src/timezone/zone.cpp:138:17: error: ‘copysign’ is not a member of ‘std’
     auto sign = std::copysign(1, dir);
                 ^~~
/home/edd/git/ztsdb/src/timezone/zone.cpp:159:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^
src/CMakeFiles/ztsdb_lib.dir/build.make:93: recipe for target 'src/CMakeFiles/ztsdb_lib.dir/timezone/zone.cpp.o' failed
make[3]: *** [src/CMakeFiles/ztsdb_lib.dir/timezone/zone.cpp.o] Error 1
CMakeFiles/Makefile2:132: recipe for target 'src/CMakeFiles/ztsdb_lib.dir/all' failed
make[2]: *** [src/CMakeFiles/ztsdb_lib.dir/all] Error 2
CMakeFiles/Makefile2:106: recipe for target 'src/CMakeFiles/ztsdb.dir/rule' failed
make[1]: *** [src/CMakeFiles/ztsdb.dir/rule] Error 2
Makefile:173: recipe for target 'ztsdb' failed
make: *** [ztsdb] Error 2
edd@brad:~/git/ztsdb/build(master)$ 

Can you please open the ability to file bugs for YOUR project at YOUR repository rather than here?

I may switch to trying this on 16.04, looks like you are not ready yet for g++-6.2.0.

@lsilvest
Copy link
Contributor Author

@lsilvest lsilvest commented Jan 28, 2017

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jan 28, 2017

Double awesome. Building now, and will continue conversation over there...

@lsilvest
Copy link
Contributor Author

@lsilvest lsilvest commented Jan 29, 2017

Ran the following test; it's not exhaustive, but it tests throughout
the representable date/time range including negative offsets:

library(RcppCCTZ)

res <- matrix(0, 0, 2, dimnames=list(NULL, c("nsec", "parse")))

nsec <- 1
for (y in 1678:2261)
    for (m in 1:12)
        for (d in 1:28) {
            s <- paste0(y,"-",sprintf("%02d", m), "-", sprintf("%02d", d),
                        "T00:00:01.", sprintf("%09d", nsec),"+00:00")
            
            if (s < "1970-01-01T00:00:00.000000000+00:00")
                res <- rbind(res, c(nsec, 1e9+parseDouble(s)[2]))
            else
                res <- rbind(res, c(nsec, parseDouble(s)[2]))
            
            nsec <- nsec + 1
        }

print(dim(res))
print(sum(res[,"parse"]-res[,"nsec"] != 0))

With the old code there were 50031 errors:

[1] 196224      2
[1] 50031

With the new code in branch 'bugfix/conversion' there were no errors:

[1] 196224      2
[1] 0

Also ran the test on a 32-bit VM and it was correct too.

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jan 29, 2017

You are quite awesome, thank you. I was thinking I should add such a test to the regression. How long did this run?

(The test is a little strange as you do 196k rbinds leading to memory ops.)

But it's a start, and the result is encouraging.

@lsilvest
Copy link
Contributor Author

@lsilvest lsilvest commented Jan 30, 2017

Even without the rbind it's too slow to be in a unit test.

This issue was only with the multiplication: 'secs * 1e9', for absolute values of secs sufficiently large that produce more significant digits than a double can hold.

So to test that case I would say that it's probably just as good of a test to take one negative and one positive that produce the loss of precision: for example "1680-07-17T00:00:01.000000000+00:00" and the previously mentioned "2120-01-01T00:00:59.987654321+00:00".

@eddelbuettel
Copy link
Owner

@eddelbuettel eddelbuettel commented Jan 30, 2017

Yes, and yes, we're on the same page. Was thinking I could just call set.seed() with some fixed value and then sample a number of possible offsets, use them before and after epoch -- and compare that the value is the same as the value we get from formatting and re-parsing.

The string creation and parsing will be moderately expensive but we could still try a few hundred or thousand points. When I get some time ...

@lsilvest
Copy link
Contributor Author

@lsilvest lsilvest commented Jan 30, 2017

Yes, that would definitely be a better test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.