Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14169: [R] altrep for factors #11402

Closed

Conversation

romainfrancois
Copy link
Contributor

draft pull request, on top of #11369

encouraging:

library(arrow, warn.conflicts = FALSE)
#> See arrow_info() for available features

f <- Array$create(iris$Species)$as_vector()
.Internal(inspect(f))
#> @7fe3c95c3860 13 INTSXP g0c0 [OBJ,REF(65535),ATT] arrow::ChunkedArray<0x7fe3c75fffc8, dictionary<values=string, indices=int8, ordered=0>, 1 chunks, 0 nulls> len=150
#> ATTRIB:
#>   @7fe3c95c3518 02 LISTSXP g0c0 [REF(1)] 
#>     TAG: @7fe3be80d5e0 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000] "levels" (has value)
#>     @7fe3c95c3550 16 STRSXP g0c0 [REF(65535)] arrow::ChunkedArray<0x7fe3c75e92c8, string, 1 chunks, 0 nulls> len=3
#>     TAG: @7fe3be80d9d0 01 SYMSXP g1c0 [MARK,REF(46357),LCK,gp=0x6000] "class" (has value)
#>     @7fe3ca3eefc0 16 STRSXP g0c1 [REF(1)] (len=1, tl=0)
#>       @7fe3be88d3e8 09 CHARSXP g1c1 [MARK,REF(383),gp=0x61] [ASCII] [cached] "factor"
f
#>   [1] setosa     setosa     setosa     setosa     setosa     setosa    
#>   [7] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [13] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [19] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [25] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [31] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [37] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [43] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [49] setosa     setosa     versicolor versicolor versicolor versicolor
#>  [55] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [61] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [67] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [73] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [79] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [85] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [91] versicolor versicolor versicolor versicolor versicolor versicolor
#>  [97] versicolor versicolor versicolor versicolor virginica  virginica 
#> [103] virginica  virginica  virginica  virginica  virginica  virginica 
#> [109] virginica  virginica  virginica  virginica  virginica  virginica 
#> [115] virginica  virginica  virginica  virginica  virginica  virginica 
#> [121] virginica  virginica  virginica  virginica  virginica  virginica 
#> [127] virginica  virginica  virginica  virginica  virginica  virginica 
#> [133] virginica  virginica  virginica  virginica  virginica  virginica 
#> [139] virginica  virginica  virginica  virginica  virginica  virginica 
#> [145] virginica  virginica  virginica  virginica  virginica  virginica 
#> Levels: setosa versicolor virginica
.Internal(inspect(f))
#> @7fe3c95c3860 13 INTSXP g0c0 [OBJ,REF(65535),ATT] arrow::ChunkedArray<0x7fe3c75fffc8, dictionary<values=string, indices=int8, ordered=0>, 1 chunks, 0 nulls> len=150
#> ATTRIB:
#>   @7fe3c95c3518 02 LISTSXP g0c0 [REF(1)] 
#>     TAG: @7fe3be80d5e0 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000] "levels" (has value)
#>     @7fe3c95c3550 16 STRSXP g0c0 [REF(65535)] arrow::ChunkedArray<0x7fe3c75e92c8, string, 1 chunks, 0 nulls> len=3
#>     TAG: @7fe3be80d9d0 01 SYMSXP g1c0 [MARK,REF(46450),LCK,gp=0x6000] "class" (has value)
#>     @7fe3ca3eefc0 16 STRSXP g0c1 [REF(65535)] (len=1, tl=0)
#>       @7fe3be88d3e8 09 CHARSXP g1c1 [MARK,REF(385),gp=0x61] [ASCII] [cached] "factor"
identical(f, iris$Species)
#> [1] TRUE

Created on 2021-10-13 by the reprex package (v2.0.1.9000)

@github-actions
Copy link

@romainfrancois
Copy link
Contributor Author

This currently only deals with a single chunk so that we don't need to be concerned about unification as in Converter_Dictionary.

@pitrou
Copy link
Member

pitrou commented Nov 18, 2021

This probably needs rebasing now?

@romainfrancois
Copy link
Contributor Author

I cherry picked the relevant commit in #11738

@romainfrancois romainfrancois deleted the ARROW-14169_factors branch November 18, 2021 13:43
jonkeane added a commit that referenced this pull request Feb 3, 2022
replaces #11402

Closes #11738 from romainfrancois/ARROW_14169_factors_2

Lead-authored-by: Romain Francois <romain@rstudio.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request Feb 4, 2022
replaces apache#11402

Closes apache#11738 from romainfrancois/ARROW_14169_factors_2

Lead-authored-by: Romain Francois <romain@rstudio.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants