New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15979: [C++][Doc] Expose more functions of parquet::WriterProperties in doc #12673
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format?
or
See also: |
|
cpp/src/parquet/properties.h
Outdated
Builder* write_batch_size(int64_t write_batch_size) { | ||
write_batch_size_ = write_batch_size; | ||
return this; | ||
} | ||
|
||
/** | ||
* Specify the max row group length. | ||
* Default 64M. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be changed to 1M, as discussed in user group a month ago with @westonpace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just two small tweaks.
Just fixed them. A side question: I did not expose created_by to the doc, since I think it should not even be a public function exposed to users? It simply logs parquet-cpp-arrow version @ARROW_VERSION@ when writing the meta data. Does it mean to let someone modify it to be, e.g "created by Shawn"? |
Ah, I missed that. Yes, it just sets some metadata about what implementation produced the Parquet file. We can document it, but probably people won't want to/shouldn't change it. |
Guess it is able to merge? The default row group size can be fixed in another PR. created_by does not need to expose in doc. |
Benchmark runs are scheduled for baseline = 864b54d and contender = 7711182. 7711182 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
No description provided.