Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(r): Add support for bit64::integer64() conversions #293

Merged
merged 5 commits into from
Sep 8, 2023

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Sep 1, 2023

This PR implements support for bit64::integer64(). This vctr type is a mostly lossless representation of a 64-bit (signed) integer that is helpful for representing very large numbers (or uses like s2_cell() that just need a way to store 64 bits losslessly).

This type isn't part of the default conversions because it requires bit64, which has a few dependencies. It is also not the default because there are some limitations in R that cause that object type to behave unintuitively in certain places (e.g., when indexing a vector). Because convert_vector() always accepts a target vector type, users can (now) override the default behaviour with a safer option if that matters for their use case.

library(nanoarrow)
suppressPackageStartupMessages(library(bit64))

(vec <- as.integer64(c(-3:3, NA)))
#> integer64
#> [1] -3   -2   -1   0    1    2    3    <NA>
(array <- as_nanoarrow_array(vec))
#> <nanoarrow_array int64[8]>
#>  $ length    : int 8
#>  $ null_count: int 1
#>  $ offset    : int 0
#>  $ buffers   :List of 2
#>   ..$ :<nanoarrow_buffer validity<bool>[8][1 b]> `TRUE TRUE TRUE TRUE TRUE T...`
#>   ..$ :<nanoarrow_buffer data<int64>[8][64 b]> `-3.00e+00 -2.00e+00 -1.00e+0...`
#>  $ dictionary: NULL
#>  $ children  : list()
convert_array(array, integer64())
#> integer64
#> [1] -3   -2   -1   0    1    2    3    <NA>

Created on 2023-09-01 with reprex v2.0.2

@codecov-commenter
Copy link

codecov-commenter commented Sep 1, 2023

Codecov Report

Merging #293 (f68278f) into main (65bb552) will increase coverage by 0.00%.
The diff coverage is 90.80%.

@@           Coverage Diff            @@
##             main     #293    +/-   ##
========================================
  Coverage   86.89%   86.90%            
========================================
  Files          68       66     -2     
  Lines       10450    10179   -271     
========================================
- Hits         9081     8846   -235     
+ Misses       1369     1333    -36     
Files Changed Coverage Δ
r/R/convert-array.R 92.10% <ø> (ø)
r/src/materialize_int64.h 88.46% <88.46%> (ø)
r/R/as-array.R 94.65% <92.85%> (-0.32%) ⬇️
r/R/schema.R 95.93% <100.00%> (+0.03%) ⬆️
r/src/convert.c 92.17% <100.00%> (+0.05%) ⬆️
r/src/convert_array.c 86.92% <100.00%> (+0.10%) ⬆️
r/src/materialize.c 80.74% <100.00%> (+0.14%) ⬆️

... and 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

paleolimbot added a commit that referenced this pull request Sep 8, 2023
)

It looks like I had left this as a TODO after my initial push to get
most conversions implemented. The idea is to warn when converting very
large int64/uint64 values to double. This an important case to warn for
because double is the default conversion from int64 and values that
won't (or might not) roundtrip back to int64 are important to warn
about. See #293 implementing a safer conversion.

``` r
library(nanoarrow)

array <- as_nanoarrow_array(2^54, schema = na_int64())
convert_array(array, double())
#> Warning in convert_array.default(array, double()): 1 value(s) may have incurred
#> loss of precision in conversion to double()
#> [1] 1.80144e+16
```

<sup>Created on 2023-09-01 with [reprex
v2.0.2](https://reprex.tidyverse.org)</sup>
@paleolimbot paleolimbot merged commit 804630d into apache:main Sep 8, 2023
13 checks passed
@paleolimbot paleolimbot deleted the r-integer64 branch September 19, 2023 20:30
@nbenn nbenn mentioned this pull request Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants