-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34775: [R] arrow_table: as.data.frame() sometimes returns a tbl and sometimes a data.frame #35173
Conversation
… return data.frames
…mes and update tests to use it
8a3d84b
to
652952c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! It seems like a cleaner solution than what we currently have. I like the idea of dropping metadata on the way in where possible because I seem to remember that we can skip some calls from C++ into R if there is no metadata to restore which speeds things up a bit.
Benchmark runs are scheduled for baseline = 2ee0345 and contender = 205ceb9. 205ceb9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
…tbl and sometimes a data.frame (apache#35173) Features of this PR: * Ensures that calling `as.data.frame()` on Arrow objects returns base R `data.frame` objects. * Drops the `class` attribute metadata of input objects of `data.frame` class (i.e. that don't have inherit from any additional classes other than `data.frame`). This results in us sacrificing roundtrip class fidelity for `data.frame` objects (i.e. if we input a base R data.frame, convert it to an Arrow Table, and then convert it back to R, we get a tibble). However, we now have consistency in the type of returned objects, retain roundtrip fidelity for other (non-class) metadata, and guarantee that `as.data.frame()` returns a base R data.frame. Users who wish to input and return a `data.frame` object can call `as.data.frame()` on the returned object. * Implements `dplyr::collect()` for StructArrays so that these objects can still be returned as tibbles if needed. * Renames `expect_data_frame()` to `expect_equal_data_frame()` for clarity, and updates it to convert both the object and expected object to data.frames. * Closes: apache#34775 Authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
…tbl and sometimes a data.frame (apache#35173) Features of this PR: * Ensures that calling `as.data.frame()` on Arrow objects returns base R `data.frame` objects. * Drops the `class` attribute metadata of input objects of `data.frame` class (i.e. that don't have inherit from any additional classes other than `data.frame`). This results in us sacrificing roundtrip class fidelity for `data.frame` objects (i.e. if we input a base R data.frame, convert it to an Arrow Table, and then convert it back to R, we get a tibble). However, we now have consistency in the type of returned objects, retain roundtrip fidelity for other (non-class) metadata, and guarantee that `as.data.frame()` returns a base R data.frame. Users who wish to input and return a `data.frame` object can call `as.data.frame()` on the returned object. * Implements `dplyr::collect()` for StructArrays so that these objects can still be returned as tibbles if needed. * Renames `expect_data_frame()` to `expect_equal_data_frame()` for clarity, and updates it to convert both the object and expected object to data.frames. * Closes: apache#34775 Authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
…tbl and sometimes a data.frame (apache#35173) Features of this PR: * Ensures that calling `as.data.frame()` on Arrow objects returns base R `data.frame` objects. * Drops the `class` attribute metadata of input objects of `data.frame` class (i.e. that don't have inherit from any additional classes other than `data.frame`). This results in us sacrificing roundtrip class fidelity for `data.frame` objects (i.e. if we input a base R data.frame, convert it to an Arrow Table, and then convert it back to R, we get a tibble). However, we now have consistency in the type of returned objects, retain roundtrip fidelity for other (non-class) metadata, and guarantee that `as.data.frame()` returns a base R data.frame. Users who wish to input and return a `data.frame` object can call `as.data.frame()` on the returned object. * Implements `dplyr::collect()` for StructArrays so that these objects can still be returned as tibbles if needed. * Renames `expect_data_frame()` to `expect_equal_data_frame()` for clarity, and updates it to convert both the object and expected object to data.frames. * Closes: apache#34775 Authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
Features of this PR:
Ensures that calling
as.data.frame()
on Arrow objects returns base Rdata.frame
objects.Drops the
class
attribute metadata of input objects ofdata.frame
class (i.e. that don't have inherit from any additional classes other thandata.frame
). This results in us sacrificing roundtrip class fidelity fordata.frame
objects (i.e. if we input a base R data.frame, convert it to an Arrow Table, and then convert it back to R, we get a tibble). However, we now have consistency in the type of returned objects, retain roundtrip fidelity for other (non-class) metadata, and guarantee thatas.data.frame()
returns a base R data.frame. Users who wish to input and return adata.frame
object can callas.data.frame()
on the returned object.Implements
dplyr::collect()
for StructArrays so that these objects can still be returned as tibbles if needed.Renames
expect_data_frame()
toexpect_equal_data_frame()
for clarity, and updates it to convert both the object and expected object to data.frames.Closes: [R] arrow_table: as.data.frame() sometimes returns a tbl and sometimes a data.frame #34775