-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35260: [C++][Python][R] Allow users to adjust S3 log level by environment variable #38267
Conversation
All implementing done in C++, documentation added for C++, Python, and R
These are already exported and IMO part of the public API
cpp/src/arrow/filesystem/s3fs.cc
Outdated
@@ -3016,5 +3023,35 @@ Result<std::string> ResolveS3BucketRegion(const std::string& bucket) { | |||
return resolver->ResolveRegion(bucket); | |||
} | |||
|
|||
S3LogLevel GetS3LogLevelFromEnvOrDefault() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to nit about this function name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like arrow::fs::S3GlobalOptions::Defaults()
.
From an R perspective, looks great! What do you think to the idea of adding a new H2-level heading to the bottom of the "debugging" doc in the R developer docs, briefly mentioning this? Doesn't need to be in this PR! |
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Great idea @thisisnic, filed as #38270. Thanks. |
cpp/src/arrow/filesystem/s3fs.cc
Outdated
@@ -3016,5 +3023,35 @@ Result<std::string> ResolveS3BucketRegion(const std::string& bucket) { | |||
return resolver->ResolveRegion(bucket); | |||
} | |||
|
|||
S3LogLevel GetS3LogLevelFromEnvOrDefault() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like arrow::fs::S3GlobalOptions::Defaults()
.
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Thanks again @kou, sorry this PR turned out to be a bit messy. I've addressed all of your comments and accepted all suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks for the help getting this tidied up @kou! |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 1e9f224. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…y environment variable (apache#38267) ### Rationale for this change It's useful when troubleshooting issues with Arrow's S3 filesystem implementation to raise the log level. Currently, this can only be done in C++ and Python, but not from R. In addition, the log level can only be set during S3 initialization and not directly so the user has to introduce explicit S3 initialization code to turn on logging and must make sure this code is called before S3 initialization. While discussing exposing control of log level to R, we realized that allowing the log level to be controlled by environment variable may be more intuitive and useful and would just be a good addition for C++, Python, and R. ### What changes are included in this PR? - A new environment variable `AWS_S3_LOG_LEVEL` with documentation for controlling S3 log level - Updated documentation for C++, Python, and R - A new `InitializeS3()` as a quality-of-life thing for C++ users. Feel free to ask me to remove this. No changes are needed directly for Python and R because these implementation uses the internal implicit initializer `EnsureS3Initialized` rather than the explicit form, `InitializeS3`. And it's the behavior of the `EnsureS3Initialized` routine that's changed here. ### Are these changes tested? Yes. I added a unit test for the new `GetS3LogLevelFromEnvOrDefault` and tested from Python and R manually. I didn't add a test to make sure the underlying `AwsInstance` gets set up correctly because it looked like it would require a refactor and didn't seem worth it. ### Are there any user-facing changes? Yes. A new way to turn on logging for S3 and matching docs in C++, Python, and R. * Closes: apache#35260 Lead-authored-by: Bryce Mecum <petridish@gmail.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Co-authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…y environment variable (apache#38267) ### Rationale for this change It's useful when troubleshooting issues with Arrow's S3 filesystem implementation to raise the log level. Currently, this can only be done in C++ and Python, but not from R. In addition, the log level can only be set during S3 initialization and not directly so the user has to introduce explicit S3 initialization code to turn on logging and must make sure this code is called before S3 initialization. While discussing exposing control of log level to R, we realized that allowing the log level to be controlled by environment variable may be more intuitive and useful and would just be a good addition for C++, Python, and R. ### What changes are included in this PR? - A new environment variable `AWS_S3_LOG_LEVEL` with documentation for controlling S3 log level - Updated documentation for C++, Python, and R - A new `InitializeS3()` as a quality-of-life thing for C++ users. Feel free to ask me to remove this. No changes are needed directly for Python and R because these implementation uses the internal implicit initializer `EnsureS3Initialized` rather than the explicit form, `InitializeS3`. And it's the behavior of the `EnsureS3Initialized` routine that's changed here. ### Are these changes tested? Yes. I added a unit test for the new `GetS3LogLevelFromEnvOrDefault` and tested from Python and R manually. I didn't add a test to make sure the underlying `AwsInstance` gets set up correctly because it looked like it would require a refactor and didn't seem worth it. ### Are there any user-facing changes? Yes. A new way to turn on logging for S3 and matching docs in C++, Python, and R. * Closes: apache#35260 Lead-authored-by: Bryce Mecum <petridish@gmail.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Co-authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…y environment variable (apache#38267) ### Rationale for this change It's useful when troubleshooting issues with Arrow's S3 filesystem implementation to raise the log level. Currently, this can only be done in C++ and Python, but not from R. In addition, the log level can only be set during S3 initialization and not directly so the user has to introduce explicit S3 initialization code to turn on logging and must make sure this code is called before S3 initialization. While discussing exposing control of log level to R, we realized that allowing the log level to be controlled by environment variable may be more intuitive and useful and would just be a good addition for C++, Python, and R. ### What changes are included in this PR? - A new environment variable `AWS_S3_LOG_LEVEL` with documentation for controlling S3 log level - Updated documentation for C++, Python, and R - A new `InitializeS3()` as a quality-of-life thing for C++ users. Feel free to ask me to remove this. No changes are needed directly for Python and R because these implementation uses the internal implicit initializer `EnsureS3Initialized` rather than the explicit form, `InitializeS3`. And it's the behavior of the `EnsureS3Initialized` routine that's changed here. ### Are these changes tested? Yes. I added a unit test for the new `GetS3LogLevelFromEnvOrDefault` and tested from Python and R manually. I didn't add a test to make sure the underlying `AwsInstance` gets set up correctly because it looked like it would require a refactor and didn't seem worth it. ### Are there any user-facing changes? Yes. A new way to turn on logging for S3 and matching docs in C++, Python, and R. * Closes: apache#35260 Lead-authored-by: Bryce Mecum <petridish@gmail.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Co-authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Rationale for this change
It's useful when troubleshooting issues with Arrow's S3 filesystem implementation to raise the log level. Currently, this can only be done in C++ and Python, but not from R. In addition, the log level can only be set during S3 initialization and not directly so the user has to introduce explicit S3 initialization code to turn on logging and must make sure this code is called before S3 initialization.
While discussing exposing control of log level to R, we realized that allowing the log level to be controlled by environment variable may be more intuitive and useful and would just be a good addition for C++, Python, and R.
What changes are included in this PR?
AWS_S3_LOG_LEVEL
with documentation for controlling S3 log levelInitializeS3()
as a quality-of-life thing for C++ users. Feel free to ask me to remove this.No changes are needed directly for Python and R because these implementation uses the internal implicit initializer
EnsureS3Initialized
rather than the explicit form,InitializeS3
. And it's the behavior of theEnsureS3Initialized
routine that's changed here.Are these changes tested?
Yes. I added a unit test for the new
GetS3LogLevelFromEnvOrDefault
and tested from Python and R manually. I didn't add a test to make sure the underlyingAwsInstance
gets set up correctly because it looked like it would require a refactor and didn't seem worth it.Are there any user-facing changes?
Yes. A new way to turn on logging for S3 and matching docs in C++, Python, and R.