New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37813: [R] add quoted_na argument to open_delim_dataset() #37828
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly a few opportunities for clarification but in general looks good to me!
@@ -253,7 +253,7 @@ test_that("readr parse options", { | |||
tsv_dir, | |||
partitioning = "part", | |||
format = "text", | |||
quo = "\"", | |||
del = "," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - because passing in a parameter as quo
(not a valid parameter name) no longer constitutes a unique partial match to either the readr or arrow options, it is caught earlier on in input validation and so raises an "Unrecognized option" error instead of an "Ambiguous option" error. This input validation is something it would be nice to refactor in another PR, but I figured here we just want to replace quo
with something else that takes us down the "Ambiguous option" path to stick with the spirit of the test.
r/tests/testthat/test-dataset-csv.R
Outdated
df <- data.frame(text = c("one", "two", "", "four"), num = 1:4) | ||
write.csv(df, dst_file, row.names = FALSE, quote = FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same as write("one\ntwo\nthree\n\nfour")
(It might be a tiny bit clearer what's actually getting tested to just write the contents of the file or put it in a comment...I forget what all the arguments of write.csv()
actually do!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, updated.
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 72c6497. There were 5 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…pache#37828) ### Rationale for this change The `open_delim_dataset()` family of functions were implemented to have the same arguments as the `read_delim_arrow()` functions where possible, but `quoted_na` was missed. ### What changes are included in this PR? Adding `quoted_na` to those functions. ### Are these changes tested? Yes ### Are there any user-facing changes? Yes **This PR includes breaking changes to public APIs.** Empty strings in input datasets now default to being read in by `open_delim_dataset()` and its derivates as NAs and not empty string * Closes: apache#37813 Authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
…pache#37828) ### Rationale for this change The `open_delim_dataset()` family of functions were implemented to have the same arguments as the `read_delim_arrow()` functions where possible, but `quoted_na` was missed. ### What changes are included in this PR? Adding `quoted_na` to those functions. ### Are these changes tested? Yes ### Are there any user-facing changes? Yes **This PR includes breaking changes to public APIs.** Empty strings in input datasets now default to being read in by `open_delim_dataset()` and its derivates as NAs and not empty string * Closes: apache#37813 Authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
…pache#37828) ### Rationale for this change The `open_delim_dataset()` family of functions were implemented to have the same arguments as the `read_delim_arrow()` functions where possible, but `quoted_na` was missed. ### What changes are included in this PR? Adding `quoted_na` to those functions. ### Are these changes tested? Yes ### Are there any user-facing changes? Yes **This PR includes breaking changes to public APIs.** Empty strings in input datasets now default to being read in by `open_delim_dataset()` and its derivates as NAs and not empty string * Closes: apache#37813 Authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
…pache#37828) ### Rationale for this change The `open_delim_dataset()` family of functions were implemented to have the same arguments as the `read_delim_arrow()` functions where possible, but `quoted_na` was missed. ### What changes are included in this PR? Adding `quoted_na` to those functions. ### Are these changes tested? Yes ### Are there any user-facing changes? Yes **This PR includes breaking changes to public APIs.** Empty strings in input datasets now default to being read in by `open_delim_dataset()` and its derivates as NAs and not empty string * Closes: apache#37813 Authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
Rationale for this change
The
open_delim_dataset()
family of functions were implemented to have the same arguments as theread_delim_arrow()
functions where possible, butquoted_na
was missed.What changes are included in this PR?
Adding
quoted_na
to those functions.Are these changes tested?
Yes
Are there any user-facing changes?
Yes
This PR includes breaking changes to public APIs.
Empty strings in input datasets now default to being read in by
open_delim_dataset()
and its derivates as NAs and not empty string