Skip to content

Commit

Permalink
ARROW-3282: [R] initial R functionality
Browse files Browse the repository at this point in the history
* Wrapping C++ pointers to arrow objects as R6 classes holding an R external pointer.
* Factory functions for the metadata types, int32(), ...
* Factory to create schemas and struct
* Create Array, RecordBatch, Table from R vectors and data frames. initially only support integer (int32), numeric (float64) and raw (int8) vectors.
* Reading and Writing record batches and Table to files.

Author: Romain Francois <romain@purrple.cat>

Closes #2596 from romainfrancois/r-dev-buffer and squashes the following commits:

9ab1882 <Romain Francois> mark Roxygen and Rcpp generated files
661f370 <Romain Francois> Using FirstTimeBitmapWriter instead of BitmapWriter.
e81b72b <Romain Francois> only set null_bitmap if null_count > 0
bfe853d <Romain Francois> using 0-based indices in the tests.
b391556 <Romain Francois> Also use arrow::internak::BitmapWriter
9e60555 <Romain Francois> name fixes. Using __ consistently
bf814bb <Romain Francois> Using arrow::internal::BitmapReader
c8aa703 <Romain Francois> Also use std::shared_ptr for MemoryPool.
2aa8a5f <Romain Francois> need dev version of `vctrs`
394bd33 <Romain Francois> 🐀 + RecordBatch$Slice
de93a4f <Romain Francois> RecordBatch tests
9d208a4 <Romain Francois> +Array$RangeEquals
f860063 <Romain Francois> Move each class to their own file
a89a9a8 <Romain Francois> Move RecordBatch impl to own file
a2f9f51 <Romain Francois> correctly handling offset()
8263c0d <Romain Francois> + tests for ChunkedArray
e02e24f <Romain Francois> +chunked_array and tests
b20e4b0 <Romain Francois> More tests
d11cda0 <Romain Francois> +R6 class ChunkedArray
29af2ea <Romain Francois> license headers
2f53ebf <Romain Francois> Additional tests for read_arrow / write_arrow
4237c32 <Romain Francois> Clear the bit for non NA.
ede8e44 <Romain Francois> Handle null buffer in R <-> Array conversions
a5b8190 <Romain Francois> update README with example of reading/writing arrow::Table
d951db8 <Romain Francois> "documentation" to quiet check()
908c2ac <Romain Francois> read_arrow and write_arrow now relate to arrow::Table.
110b00d <Romain Francois> resolving conflicts
ae55f8b <Romain Francois> ..
767e9d9 <Romain Francois> more generic print method
8d8cdd1 <Romain Francois> + read_arrow / write_arrow for now
c1385a0 <Romain Francois> export Array_as_vector, +Array$ToString
23fbd01 <Romain Francois> + column names
97659ff <Romain Francois> + as_tibble.arrow::RecordBatch
fa4ee22 <Romain Francois> + read_record_batch
f27eeba <Romain Francois> - MakeArray
4977bb2 <Romain Francois> no need to make ArrayData directly
ef7cda1 <Romain Francois> class constructors only take the external pointers, logic moved to factory functions
81e059a <Romain Francois> rebasing
421e471 <Romain Francois> +macro R_ERROR_NOT_OK similar to RETURN_NOT_OK but that Rcpp::stop()s
f5e3eff <Romain Francois> attempt RecordBatch$to_file
79205fb <Romain Francois> initial stab at arrow::table(data.frame)
f6f1775 <Romain Francois> s/data/.data/
b9c215b <Romain Francois> "document" array and record_batch
edf6098 <Romain Francois> Need to install `vctrs` from github for now
6aecdce <Romain Francois> skip using rpath linker option
b8dac54 <Romain Francois> +RecordBatch$schema
1fc3cc2 <Romain Francois> no longer need this
05da931 <Romain Francois> initial stab at record_batch
f4d0a34 <Romain Francois> must include arrow_types.h first
aee2d0a <Romain Francois> initial stab at arrow::array
a6ae2f3 <Romain Francois> cleanup
e14b546 <Romain Francois> follow up from @wesm comments on #2489
36e9801 <Romain Francois> + installation instructions
108caf9 <Romain Francois> not checking for headers on these files
b829bdf <Romain Francois> initial R 📦 with travis setup and testthat suite, that links to arrow c++ library and calls arrow::int32()
26e712d <Romain Francois> Initial work for type metadata, with tests.
e251299 <Romain Francois> + installation instructions
a9a8bbb <Romain Francois> not checking for headers on these files
e0a7eff <Romain Francois> initial R 📦 with travis setup and testthat suite, that links to arrow c++ library and calls arrow::int32()
b1c1109 <Romain Francois> finished rebasing after initial R patch merged
887df48 <Romain Francois> skip using rpath linker option
a6de975 <Romain Francois> cleanup
8526e51 <Romain Francois> follow up from @wesm comments on #2489
f03a277 <Romain Francois> + installation instructions
0995ca4 <Romain Francois> not checking for headers on these files
1cb547e <Romain Francois> initial R 📦 with travis setup and testthat suite, that links to arrow c++ library and calls arrow::int32()
705c125 <Romain Francois> exclude Rd files 🐀
605e302 <Romain Francois> time32 only handles second and millisecond time64 only handles microsecond and nanosecond
afdbae6 <Romain Francois> + licence header for R6.R file
65563f5 <Romain Francois> minimal documentation for check()
b7135c7 <Romain Francois> stop exporting everything
6aaf192 <Romain Francois> ignoring the .clang-format file
d854f2f <Romain Francois> + license headers for R files 🙊
d992b26 <Romain Francois> Initial work for type metadata, with tests.
614dd07 <Romain Francois> + installation instructions
afce06a <Romain Francois> initial R 📦 with travis setup and testthat suite, that links to arrow c++ library and calls arrow::int32()
  • Loading branch information
romainfrancois authored and wesm committed Sep 25, 2018
1 parent 5167502 commit ea8940a
Show file tree
Hide file tree
Showing 51 changed files with 4,714 additions and 32 deletions.
4 changes: 4 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
r/R/RcppExports.R linguist-generated=true
r/src/RcppExports.cpp linguist-generated=true
r/man/*.Rd linguist-generated=true

2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,5 @@ python/.eggs/
.pytest_cache/
pkgs
.Rproj.user
arrow.Rcheck/

2 changes: 2 additions & 0 deletions dev/release/rat_exclude_files.txt
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,5 @@ r/.Rbuildignore
r/arrow.Rproj
r/README.md
r/README.Rmd
r/man/*.Rd
.gitattributes
3 changes: 3 additions & 0 deletions r/.Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
^.*\.Rproj$
^\.Rproj\.user$
^README\.Rmd$
src/.clang-format
LICENSE.md
^data-raw$
42 changes: 35 additions & 7 deletions r/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Package: arrow
Title: R Integration to 'Apache' 'Arrow'
Version: 0.0.0.9000
Authors@R: c(
person("Romain", "François", email = "romain@rstudio.com", role = c("aut", "cre")),
person("Romain", "François", email = "romain@rstudio.com", role = c("aut", "cre")),
person("Apache Arrow", email = "dev@arrow.apache.org", role = c("aut", "cph"))
)
Description: R Integration to 'Apache' 'Arrow'.
Expand All @@ -11,11 +11,39 @@ License: Apache License (>= 2.0)
Encoding: UTF-8
LazyData: true
SystemRequirements: C++11
LinkingTo:
Rcpp
Imports:
Rcpp
LinkingTo:
Rcpp (>= 0.12.18)
Imports:
Rcpp (>= 0.12.18),
rlang,
purrr,
assertthat,
glue,
R6,
vctrs,
fs,
tibble,
crayon
Remotes:
r-lib/vctrs
Roxygen: list(markdown = TRUE)
RoxygenNote: 6.0.1.9000
Suggests:
RoxygenNote: 6.1.0.9000
Suggests:
testthat
Collate:
'enums.R'
'R6.R'
'ArrayData.R'
'ChunkedArray.R'
'Column.R'
'Field.R'
'List.R'
'RcppExports.R'
'RecordBatch.R'
'Schema.R'
'Struct.R'
'Table.R'
'array.R'
'memory_pool.R'
'reexports-tibble.R'
'zzz.R'
57 changes: 57 additions & 0 deletions r/NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,4 +1,61 @@
# Generated by roxygen2: do not edit by hand

S3method("!=","arrow::Object")
S3method("$","arrow-enum")
S3method("==","arrow::Array")
S3method("==","arrow::DataType")
S3method("==","arrow::Field")
S3method("==","arrow::RecordBatch")
S3method(as_tibble,"arrow::RecordBatch")
S3method(as_tibble,"arrow::Table")
S3method(length,"arrow::Array")
S3method(names,"arrow::RecordBatch")
S3method(print,"arrow-enum")
export(DateUnit)
export(StatusCode)
export(TimeUnit)
export(Type)
export(array)
export(as_tibble)
export(boolean)
export(chunked_array)
export(date32)
export(date64)
export(decimal)
export(float16)
export(float32)
export(float64)
export(int16)
export(int32)
export(int64)
export(int8)
export(list_of)
export(null)
export(read_arrow)
export(record_batch)
export(schema)
export(struct)
export(table)
export(time32)
export(time64)
export(timestamp)
export(uint16)
export(uint32)
export(uint64)
export(uint8)
export(utf8)
export(write_arrow)
importFrom(R6,R6Class)
importFrom(Rcpp,sourceCpp)
importFrom(assertthat,assert_that)
importFrom(glue,glue)
importFrom(purrr,map)
importFrom(purrr,map2)
importFrom(purrr,map_chr)
importFrom(purrr,map_int)
importFrom(rlang,dots_n)
importFrom(rlang,quo_name)
importFrom(rlang,seq2)
importFrom(rlang,set_names)
importFrom(tibble,as_tibble)
useDynLib(arrow, .registration = TRUE)
28 changes: 28 additions & 0 deletions r/R/ArrayData.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

#' @include R6.R

`arrow::ArrayData` <- R6Class("arrow::ArrayData",
inherit = `arrow::Object`,
active = list(
type = function() `arrow::DataType`$dispatch(ArrayData__get_type(self)),
length = function() ArrayData__get_length(self),
null_count = function() ArrayData__get_null_count(self),
offset = function() ArrayData__get_offset(self)
)
)
46 changes: 46 additions & 0 deletions r/R/ChunkedArray.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

#' @include R6.R

`arrow::ChunkedArray` <- R6Class("arrow::ChunkedArray", inherit = `arrow::Object`,
public = list(
length = function() ChunkedArray__length(self),
null_count = function() ChunkedArray__null_count(self),
num_chunks = function() ChunkedArray__num_chunks(self),
chunk = function(i) `arrow::Array`$new(ChunkedArray__chunk(self, i)),
chunks = function() purrr::map(ChunkedArray__chunks(self), `arrow::Array`$new),
type = function() `arrow::DataType`$dispatch(ChunkedArray__type(self)),
as_vector = function() ChunkedArray__as_vector(self),
Slice = function(offset, length = NULL){
if (is.null(length)) {
`arrow::ChunkedArray`$new(ChunkArray__Slice1(self, offset))
} else {
`arrow::ChunkedArray`$new(ChunkArray__Slice2(self, offset, length))
}
}
)
)

#' create an arrow::Array from an R vector
#'
#' @param \dots Vectors to coerce
#'
#' @export
chunked_array <- function(...){
`arrow::ChunkedArray`$new(ChunkedArray__from_list(rlang::list2(...)))
}
27 changes: 27 additions & 0 deletions r/R/Column.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

#' @include R6.R

`arrow::Column` <- R6Class("arrow::Column", inherit = `arrow::Object`,
public = list(
length = function() Column__length(self),
null_count = function() Column__null_count(self),
type = function() `arrow::DataType`$dispatch(Column__type(self)),
data = function() `arrow::ChunkedArray`$new(Column__data(self))
)
)
50 changes: 50 additions & 0 deletions r/R/Field.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

#' @include R6.R

`arrow::Field` <- R6Class("arrow::Field",
inherit = `arrow::Object`,
public = list(
ToString = function() {
Field__ToString(self)
},
name = function() {
Field__name(self)
},
nullable = function() {
Field__nullable(self)
},
Equals = function(other) {
inherits(other, "arrow::Field") && Field__Equals(self, other)
}
)
)

#' @export
`==.arrow::Field` <- function(lhs, rhs){
lhs$Equals(rhs)
}

field <- function(name, type) {
`arrow::Field`$new(Field__initialize(name, type))
}

.fields <- function(.list){
assert_that( !is.null(nms <- names(.list)) )
map2(nms, .list, field)
}
26 changes: 26 additions & 0 deletions r/R/List.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

#' @include R6.R

`arrow::ListType` <- R6Class("arrow::ListType",
inherit = `arrow::NestedType`
)

#' @rdname DataType
#' @export
list_of <- function(type) `arrow::ListType`$new(list__(type))
Loading

0 comments on commit ea8940a

Please sign in to comment.