Skip to content

ARROW-3794 [R]: Consider mapping INT8 to integer() not raw()#4507

Closed
romainfrancois wants to merge 3 commits intoapache:masterfrom
romainfrancois:ARROW-3794/INT8
Closed

ARROW-3794 [R]: Consider mapping INT8 to integer() not raw()#4507
romainfrancois wants to merge 3 commits intoapache:masterfrom
romainfrancois:ARROW-3794/INT8

Conversation

@romainfrancois
Copy link
Contributor

R vectors of raw types are converted to arrays of type uint8

arrow::array(as.raw(1:10))$type
#> arrow::UInt8 
#> uint8

but otoh, arrays of type uint8 are converted to R integer vectors.

library(arrow, warn.conflicts = FALSE)

a <- array((-128):127)$cast(int8())
str(a$as_vector())
#>  int [1:256] -128 -127 -126 -125 -124 -123 -122 -121 -120 -119 ...

a <- array(0:255)$cast(uint8())
str(a$as_vector())
#>  int [1:256] 0 1 2 3 4 5 6 7 8 9 ...

Created on 2019-06-10 by the reprex package (v0.3.0.9000)

@romainfrancois
Copy link
Contributor Author

Not sure about the first thing, perhaps raw vectors should just be converted to int32 arrays ?

@codecov-io
Copy link

codecov-io commented Jun 10, 2019

Codecov Report

Merging #4507 into master will decrease coverage by 13%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #4507       +/-   ##
===========================================
- Coverage   88.11%   75.11%   -13.01%     
===========================================
  Files         850       54      -796     
  Lines      105774     3110   -102664     
  Branches     1253        0     -1253     
===========================================
- Hits        93203     2336    -90867     
+ Misses      12326      774    -11552     
+ Partials      245        0      -245
Impacted Files Coverage Δ
r/src/arrowExports.cpp 72.7% <ø> (ø) ⬆️
r/src/array_from_vector.cpp 78.29% <100%> (ø) ⬆️
r/src/array__to_vector.cpp 78.66% <100%> (+0.7%) ⬆️
python/pyarrow/ipc.pxi
cpp/src/arrow/csv/chunker-test.cc
cpp/src/parquet/column_page.h
cpp/src/parquet/bloom_filter-test.cc
cpp/src/arrow/array/builder_decimal.cc
cpp/src/plasma/client.cc
cpp/src/arrow/io/test-common.h
... and 790 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48ee38f...19ff70b. Read the comment docs.

@nealrichardson
Copy link
Member

The original concern that int8 map to integer seems addressed here, but i'm wondering if uint8 is most appropriate for raw. Why not the binary type that Javier mentioned in the ticket? http://arrow.apache.org/docs/format/Metadata.html#utf8-and-binary

@romainfrancois
Copy link
Contributor Author

That does not look like the same thing to me. binary here looks more like a list of raw vectors rather than a single raw vector, which are closer to a fixed width (1) binary type: https://arrow.apache.org/docs/cpp/classarrow_1_1_fixed_size_binary_type.html

@nealrichardson
Copy link
Member

Alright, LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants