Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-16690: [R][FlightRPC] Additional max_chunksize parameter in do_put method #13267

Merged
merged 8 commits into from
Aug 22, 2022
Merged

ARROW-16690: [R][FlightRPC] Additional max_chunksize parameter in do_put method #13267

merged 8 commits into from
Aug 22, 2022

Conversation

thatstatsguy
Copy link
Contributor

Summary
An additional parameter in Flight do_put to specify chunk size in R.

Problem
Currently, all data is sent through in a single message. It's a likely scenario that users will want the ability to control the batch sizes without building a custom do_put method.

Solution
Additional (optional) parameter to specify chunk size.

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@thatstatsguy thatstatsguy changed the title Additional max_chunksize paramter to do_put method ARROW-${16690}: [${FlightRPC}] ${Additional max_chunksize paramter to do_put method} May 30, 2022
@thatstatsguy thatstatsguy changed the title ARROW-${16690}: [${FlightRPC}] ${Additional max_chunksize paramter to do_put method} ARROW-16690: [FlightRPC] Additional max_chunksize paramter to do_put method May 30, 2022
@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@kou kou changed the title ARROW-16690: [FlightRPC] Additional max_chunksize paramter to do_put method ARROW-16690: [R][FlightRPC] Additional max_chunksize paramter to do_put method May 30, 2022
@thatstatsguy thatstatsguy changed the title ARROW-16690: [R][FlightRPC] Additional max_chunksize paramter to do_put method ARROW-16690: [R][FlightRPC] Additional max_chunksize parameter in do_put method Jun 1, 2022
@pitrou pitrou requested a review from paleolimbot August 8, 2022 15:54
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! Just a few changes before we can merge:

  • There's a trailing whitespace warning from the linter, which although minor, is something that needs fixing or else we'll get it for all our subsequent PRs
  • It needs another devtools::document() run to add the max_chunksize documentation item into the .Rd file
  • It needs a test to make sure it works! You can double the existing block here: https://github.com/apache/arrow/blob/master/r/tests/testthat/test-python-flight.R#L31-L39 but with passing a max_chunksize parameter to make sure it works when passed.

Let me know if you need any help and I'm happy to help get it across the finish line!

r/R/flight.R Outdated Show resolved Hide resolved
r/R/flight.R Show resolved Hide resolved
@thatstatsguy
Copy link
Contributor Author

thatstatsguy commented Aug 10, 2022

@paleolimbot thanks for such a detailed review and prompts on how to resolve it - you were very kind :)

I've updated the linter, unit tests and rerun the devtools::document(). Assuming the unit test passes, the only issue remaining is the second comment on the implementation

@paleolimbot
Copy link
Member

Awesome! The test looks great. I think ignoring max_chunksize with a warning is the way to go (you could error too...I think either is fine and I haven't actually used this function so I don't know which is more appropriate). Add a test for that and it's good to go!

@thatstatsguy
Copy link
Contributor Author

@paleolimbot thanks for your patience! All updated, assuming there are no build issues we should be good to go!

@paleolimbot paleolimbot merged commit 6d8624b into apache:master Aug 22, 2022
@thatstatsguy thatstatsguy deleted the thatstatsguy_add_max_chunksize branch August 22, 2022 19:28
@ursabot
Copy link

ursabot commented Aug 23, 2022

Benchmark runs are scheduled for baseline = 5f84335 and contender = 6d8624b. 6d8624b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.27% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.14% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 6d8624b0 ec2-t3-xlarge-us-east-2
[Failed] 6d8624b0 test-mac-arm
[Failed] 6d8624b0 ursa-i9-9960x
[Finished] 6d8624b0 ursa-thinkcentre-m75q
[Finished] 5f84335f ec2-t3-xlarge-us-east-2
[Finished] 5f84335f test-mac-arm
[Failed] 5f84335f ursa-i9-9960x
[Finished] 5f84335f ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@kou
Copy link
Member

kou commented Aug 23, 2022

The "R / AMD64 Windows R 4.2 RTools 42" CI job is failed by this commit:

https://github.com/apache/arrow/runs/7957718636?check_suite_focus=true#step:12:70

Error: Error: Not lint free
Warning: file=tests\testthat\test-python-flight.R,line=40,col=1,[trailing_whitespace_linter] Trailing whitespace is superfluous.
Warning: file=tests\testthat\test-python-flight.R,line=46,col=94,[trailing_whitespace_linter] Trailing whitespace is superfluous.

Could you remove trailing spaces from test-python-flight.R?

kou pushed a commit that referenced this pull request Aug 24, 2022
After #13267 there is some trailing whitespace left over in one of the files.

Authored-by: Dewey Dunnington <dewey@voltrondata.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
anjakefala pushed a commit to anjakefala/arrow that referenced this pull request Aug 31, 2022
After apache#13267 there is some trailing whitespace left over in one of the files.

Authored-by: Dewey Dunnington <dewey@voltrondata.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
zagto pushed a commit to zagto/arrow that referenced this pull request Oct 7, 2022
…put method (apache#13267)

**Summary**
An additional parameter in Flight do_put to specify chunk size in R.

**Problem**
Currently, all data is sent through in a single message. It's a likely scenario that users will want the ability to control the batch sizes without building a custom do_put method.

**Solution**
Additional (optional) parameter to specify chunk size.

Lead-authored-by: Christopher.Dunderdale <Christopher.Dunderdale@dyna-mo.com>
Co-authored-by: Christopher Dunderdale <47271795+thatstatsguy@users.noreply.github.com>
Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
zagto pushed a commit to zagto/arrow that referenced this pull request Oct 7, 2022
After apache#13267 there is some trailing whitespace left over in one of the files.

Authored-by: Dewey Dunnington <dewey@voltrondata.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants