Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fread for very small or very large fp numbers. #4165

Merged
merged 2 commits into from Feb 16, 2020

Conversation

QuLogic
Copy link
Contributor

@QuLogic QuLogic commented Jan 9, 2020

On non-x86 architectures (armv7hl and ppc64le), test 1018 fails with a slightly differently parsed number. In base R, R_strtod handles small numbers by pre-dividing numerator and divisor before applying the exponent part (instead of dividing all together.) However, it does not use a lookup table.

For fread, trim the exponent lookup table from ±350 to ±300, and if anything is in that removed range, do two multiplications instead. This results in approximately the same effect as in base R.

Removing some of the range from the lookup table also fixes several warnings such as:

freadLookups.h:57:1: warning: floating constant truncated to zero [-Woverflow]
   57 | 1.0E-324L,
      | ^~~~~~~~~
freadLookups.h:690:1: warning: floating constant exceeds range of 'long double' [-Woverflow]
  690 | 1.0E309L,
      | ^~~~~~~~

Closes #3492 now that #4213 closed the other issue in it.
The first part of #4032 is the same issue (test 1018) fixed here. A build on those architectures can be found here.
Closes #4097 too.

On non-x86 architectures (armv7hl and ppc64le), test 1018 fails with a
slightly differently parsed number. In base R, `R_strtod` handles small
numbers by pre-dividing numerator and divisor before applying the
exponent part (instead of dividing all together.) However, it does not
use a lookup table.

For `fread`, trim the exponent lookup table from ±350 to ±300, and if
anything is in that removed range, do two multiplications instead. This
results in approximately the same effect as in base R.

Removing some of the range from the lookup table also fixes several
warnings such as:

```
freadLookups.h:57:1: warning: floating constant truncated to zero [-Woverflow]
   57 | 1.0E-324L,
      | ^~~~~~~~~
freadLookups.h:690:1: warning: floating constant exceeds range of 'long double' [-Woverflow]
  690 | 1.0E309L,
      | ^~~~~~~~
```

See Rdatatable#3492 and Rdatatable#4032.
@codecov
Copy link

@codecov codecov bot commented Jan 9, 2020

Codecov Report

Merging #4165 into master will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4165      +/-   ##
==========================================
+ Coverage   99.54%   99.61%   +0.07%     
==========================================
  Files          72       72              
  Lines       13937    13901      -36     
==========================================
- Hits        13873    13847      -26     
+ Misses         64       54      -10
Impacted Files Coverage Δ
R/merge.R 100% <ø> (ø) ⬆️
R/as.data.table.R 100% <ø> (ø) ⬆️
src/rbindlist.c 100% <100%> (ø) ⬆️
src/dogroups.c 100% <100%> (+3.14%) ⬆️
R/between.R 100% <100%> (ø) ⬆️
src/frank.c 100% <100%> (ø) ⬆️
R/frank.R 100% <100%> (ø) ⬆️
src/subset.c 100% <100%> (ø) ⬆️
src/freadR.c 100% <100%> (ø) ⬆️
src/nafill.c 100% <100%> (ø) ⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1aa990...dc7659f. Read the comment docs.

@jangorecki jangorecki requested a review from mattdowle Jan 9, 2020
@jangorecki
Copy link
Member

@jangorecki jangorecki commented Jan 9, 2020

no need to update manual? I see one example there related to precision

@QuLogic
Copy link
Contributor Author

@QuLogic QuLogic commented Jan 10, 2020

Which do you mean? Note this doesn't really change actual precision, just intermediary precision, so that final results are consistent.

@jangorecki
Copy link
Member

@jangorecki jangorecki commented Jan 10, 2020

The section in examples I meant

# Numerical precision :

@QuLogic
Copy link
Contributor Author

@QuLogic QuLogic commented Jan 30, 2020

Ah, I don't believe that section requires any changes. The only effect here is if the fread results were compared with expected values written in R code directly (and only on specific architectures), but there are no comparisons.

@mattdowle mattdowle added this to the 1.12.9 milestone Feb 16, 2020
// and then remove extra from e.
// This avoids having to store very small or very large constants that may
// fail to be encoded by the compiler, even though the values can actually
// be stored correctly.
Copy link
Member

@mattdowle mattdowle Feb 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear comment and wording. Perfect.
I invited you to be project member in your other PR #4213, and added you to contributor list there too.
Many thanks for investigating and fixing this!

@mattdowle mattdowle merged commit ff3e7d4 into Rdatatable:master Feb 16, 2020
4 checks passed
@QuLogic QuLogic deleted the fread-alt-arch branch Feb 17, 2020
@jangorecki jangorecki removed this from the 1.12.11 milestone May 26, 2020
@jangorecki jangorecki added this to the 1.12.9 milestone May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants