Skip to content
Permalink
Browse files
[R] 93 - dplyr chapter feedback (#94)
* Fix bullet points

* Ensure it's obvious arrow is doing the work

* chunks
  • Loading branch information
thisisnic committed Oct 26, 2021
1 parent 76ff1a6 commit d7bc6b230631488da7ee100402d7c8270463d2d5
Showing 1 changed file with 7 additions and 5 deletions.
@@ -55,14 +55,15 @@ test_that("dplyr_raw and dplyr_arrow chunk provide the same results", {

You'll notice we've used `collect()` in the Arrow pipeline above. That's because
one of the ways in which `arrow` is efficient is that it works out the instructions
for the calculations it needs to perform (_expressions_) and only runs them once
you actually pull the data into your R session. This means instead of doing
lots of separate operations, it does them all at once in a more optimised way,
_lazy evaluation_.
for the calculations it needs to perform (_expressions_) and only runs them
using arrow once you actually pull the data into your R session. This means
instead of doing lots of separate operations, it does them all at once in a
more optimised way, _lazy evaluation_.

It also means that you are able to manipulate data that is larger than you can
fit into memory on the machine you're running your code on, if you only pull
data into R when you have selected the desired subset.
data into R when you have selected the desired subset, or when using functions
which can operate on chunks of data.

You can also have data which is split across multiple files. For example, you
might have files which are stored in multiple Parquet or Feather files,
@@ -173,6 +174,7 @@ test_that("dplyr_func_warning", {
## Use arrow functions in dplyr verbs in arrow

You want to use a function which is implemented in Arrow's C++ library but either:

* it doesn't have a mapping to a base R or tidyverse equivalent, or
* it has a mapping but nevertheless you want to call the C++ function directly

0 comments on commit d7bc6b2

Please sign in to comment.