Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-12901: [R] Follow on to more examples #10436

Closed
wants to merge 5 commits into from

Conversation

thisisnic
Copy link
Member

No description provided.

@thisisnic thisisnic marked this pull request as draft June 2, 2021 12:48
@github-actions
Copy link

github-actions bot commented Jun 2, 2021

@thisisnic thisisnic marked this pull request as ready for review June 2, 2021 13:45
@thisisnic
Copy link
Member Author

thisisnic commented Jun 2, 2021

WRT @jonkeane 's questions on the JIRA ticket:

https://github.com/apache/arrow/pull/10368/files#diff-9fe1a055f9a4df7088de4e0c759c20138422b8bdb9bba04ed74c7881aa787019R66-R67
Should this use @examplesIf ?

Yep - have updated now!

https://github.com/apache/arrow/pull/10368/files#diff-0229d62b9653ff7124e6e2f202e51d46d4d8fcaf14baf83ccf8c047c7c344579R78-R80
These are both genuine questions, I'm not sure we want to include either or both of them:

  • This might be too much indirection, but would it be better/easier to use `write_dataset() here?

Nice, that's made that code a LOT nicer to look at.

  • Do we want to talk about reading a partitioned dataset here?

Yeah, I will add something in about that next.

@thisisnic thisisnic marked this pull request as draft June 2, 2021 13:50
@thisisnic thisisnic marked this pull request as ready for review June 4, 2021 13:57
@thisisnic
Copy link
Member Author

@jonkeane - how does this look now?

@jonkeane
Copy link
Member

jonkeane commented Jun 4, 2021

@github-actions crossbow submit test-r-without-arrow

@github-actions
Copy link

github-actions bot commented Jun 4, 2021

Revision: d9bd85e

Submitted crossbow builds: ursacomputing/crossbow @ actions-465

Task Status
test-r-without-arrow Azure

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. A few more suggestions / comments now that I'm seeing it written out. We should also probably wait for #10455 to merge before we send this one so we can rebase + run the tests here and make sure that's all sorted.

Additionally, I noticed that a lot of this content is in the long running + starting/stoping PR #9748 as well. The only thing that isn't here that is in #9748 is some reference to how df %>% group_by("foo", "bar") %>% write_dataset(...) is the same thing as write_dataset(df, partitioning = c("foo", "bar")) maybe we could add an example or two of that and then close there other PR?

#' # This line will work
#' open_dataset(tf2, format = "ipc")
#'
#' ## You can specify file partitioning to include it as a field in your dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the double-# intentional here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I was trying to use it to specify a new section - however, if this isn't widely done, maybe I should do it differently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep it there — I don't think there's a standard pattern either in the package or elsewhere (that I know of, but let me know if you know of the state of art in other packages!)

r/R/dataset.R Outdated
#' md_schema <- schema(Month = int8(), Day = int8())
#'
#' # Now that partitioning has been specified, your dataset contains columns for Month and Day
#' open_dataset(tf3, partitioning = md_schema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also include the simpler version of this too?

open_dataset(tf3, partitioning = c("Month", "Day"))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

@thisisnic
Copy link
Member Author

This is looking good. A few more suggestions / comments now that I'm seeing it written out. We should also probably wait for #10455 to merge before we send this one so we can rebase + run the tests here and make sure that's all sorted.

Additionally, I noticed that a lot of this content is in the long running + starting/stoping PR #9748 as well. The only thing that isn't here that is in #9748 is some reference to how df %>% group_by("foo", "bar") %>% write_dataset(...) is the same thing as write_dataset(df, partitioning = c("foo", "bar")) maybe we could add an example or two of that and then close there other PR?

That PR is for write_dataset, whereas this one is for open_dataset though. Maybe we could think more about what goes in which and get them more in sync?

@jonkeane
Copy link
Member

jonkeane commented Jun 8, 2021

That PR is for write_dataset, whereas this one is for open_dataset though. Maybe we could think more about what goes in which and get them more in sync?

Ah, right of course. Yeah, it might be good to harmonize them (or maybe even combine them into one page?) But regardless we can do that on a follow on (or on that other ticket), we don't need to do that in this one.

@jonkeane
Copy link
Member

jonkeane commented Jun 8, 2021

@github-actions crossbow submit test-r-without-arrow

@github-actions
Copy link

github-actions bot commented Jun 8, 2021

Revision: 355255b

Submitted crossbow builds: ursacomputing/crossbow @ actions-485

Task Status
test-r-without-arrow Azure

@jonkeane
Copy link
Member

jonkeane commented Jun 8, 2021

@github-actions crossbow submit test-r-minimal-build

@github-actions
Copy link

github-actions bot commented Jun 8, 2021

Revision: 355255b

Submitted crossbow builds: ursacomputing/crossbow @ actions-486

Task Status
test-r-minimal-build Azure

@jonkeane
Copy link
Member

jonkeane commented Jun 8, 2021

This looks good, I've run the two builds we had issues with in the past. This shouldn't be necessary, but just in case I'll also run crossbow -g r I don't think there are other builds that will be impacted by it, but just in case! If that goes fine I'll merge. Thanks!

@jonkeane
Copy link
Member

jonkeane commented Jun 8, 2021

@github-actions crossbow submit -g r

@github-actions
Copy link

github-actions bot commented Jun 8, 2021

Revision: 355255b

Submitted crossbow builds: ursacomputing/crossbow @ actions-487

Task Status
conda-linux-gcc-py36-cpu-r40 Azure
conda-linux-gcc-py37-cpu-r41 Azure
conda-osx-clang-py36-r40 Azure
conda-osx-clang-py37-r41 Azure
conda-win-vs2017-py36-r40 Azure
conda-win-vs2017-py37-r41 Azure
homebrew-r-autobrew Github Actions
test-r-devdocs Github Actions
test-r-install-local Github Actions
test-r-linux-as-cran Github Actions
test-r-linux-rchk Github Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-3.6-centos8 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-r-rtools-35 Github Actions
test-r-version-compatibility Github Actions
test-r-versions Github Actions
test-r-without-arrow Azure
test-ubuntu-18.04-r-sanitizer Azure

@jonkeane jonkeane closed this in 15c4f1f Jun 8, 2021
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
Closes apache#10436 from thisisnic/ARROW-12901_examples

Authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants