Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35260: [C++][Python][R] Allow users to adjust S3 log level by environment variable #38267

Merged
merged 19 commits into from
Oct 17, 2023

Conversation

amoeba
Copy link
Member

@amoeba amoeba commented Oct 14, 2023

Rationale for this change

It's useful when troubleshooting issues with Arrow's S3 filesystem implementation to raise the log level. Currently, this can only be done in C++ and Python, but not from R. In addition, the log level can only be set during S3 initialization and not directly so the user has to introduce explicit S3 initialization code to turn on logging and must make sure this code is called before S3 initialization.

While discussing exposing control of log level to R, we realized that allowing the log level to be controlled by environment variable may be more intuitive and useful and would just be a good addition for C++, Python, and R.

What changes are included in this PR?

  • A new environment variable AWS_S3_LOG_LEVEL with documentation for controlling S3 log level
  • Updated documentation for C++, Python, and R
  • A new InitializeS3() as a quality-of-life thing for C++ users. Feel free to ask me to remove this.

No changes are needed directly for Python and R because these implementation uses the internal implicit initializer EnsureS3Initialized rather than the explicit form, InitializeS3. And it's the behavior of the EnsureS3Initialized routine that's changed here.

Are these changes tested?

Yes. I added a unit test for the new GetS3LogLevelFromEnvOrDefault and tested from Python and R manually. I didn't add a test to make sure the underlying AwsInstance gets set up correctly because it looked like it would require a refactor and didn't seem worth it.

Are there any user-facing changes?

Yes. A new way to turn on logging for S3 and matching docs in C++, Python, and R.

All implementing done in C++, documentation added for C++, Python, and R
These are already exported and IMO part of the public API
@@ -3016,5 +3023,35 @@ Result<std::string> ResolveS3BucketRegion(const std::string& bucket) {
return resolver->ResolveRegion(bucket);
}

S3LogLevel GetS3LogLevelFromEnvOrDefault() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to nit about this function name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like arrow::fs::S3GlobalOptions::Defaults().

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 14, 2023
r/R/filesystem.R Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Oct 14, 2023
@thisisnic
Copy link
Member

From an R perspective, looks great! What do you think to the idea of adding a new H2-level heading to the bottom of the "debugging" doc in the R developer docs, briefly mentioning this? Doesn't need to be in this PR!

Co-authored-by: Nic Crane <thisisnic@gmail.com>
@amoeba
Copy link
Member Author

amoeba commented Oct 14, 2023

Great idea @thisisnic, filed as #38270. Thanks.

cpp/src/arrow/filesystem/s3fs.h Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/s3fs.h Outdated Show resolved Hide resolved
@@ -3016,5 +3023,35 @@ Result<std::string> ResolveS3BucketRegion(const std::string& bucket) {
return resolver->ResolveRegion(bucket);
}

S3LogLevel GetS3LogLevelFromEnvOrDefault() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like arrow::fs::S3GlobalOptions::Defaults().

cpp/src/arrow/filesystem/s3fs_test.cc Outdated Show resolved Hide resolved
docs/source/cpp/env_vars.rst Show resolved Hide resolved
docs/source/cpp/env_vars.rst Outdated Show resolved Hide resolved
docs/source/cpp/env_vars.rst Outdated Show resolved Hide resolved
docs/source/cpp/env_vars.rst Outdated Show resolved Hide resolved
docs/source/python/filesystems.rst Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 14, 2023
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 14, 2023
amoeba and others added 2 commits October 14, 2023 13:48
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@github-actions github-actions bot added the awaiting change review Awaiting change review label Oct 16, 2023
amoeba and others added 4 commits October 16, 2023 11:25
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@amoeba
Copy link
Member Author

amoeba commented Oct 16, 2023

Thanks again @kou, sorry this PR turned out to be a bit messy. I've addressed all of your comments and accepted all suggestions.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 17, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes labels Oct 17, 2023
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit 1e9f224 into apache:main Oct 17, 2023
35 checks passed
@kou kou removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Oct 17, 2023
@github-actions github-actions bot added the awaiting merge Awaiting merge label Oct 17, 2023
@amoeba
Copy link
Member Author

amoeba commented Oct 17, 2023

Thanks for the help getting this tidied up @kou!

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 1e9f224.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…y environment variable (apache#38267)

### Rationale for this change

It's useful when troubleshooting issues with Arrow's S3 filesystem implementation to raise the log level. Currently, this can only be done in C++ and Python, but not from R. In addition, the log level can only be set during S3 initialization and not directly so the user has to introduce explicit S3 initialization code to turn on logging and must make sure this code is called before S3 initialization.

While discussing exposing control of log level to R, we realized that allowing the log level to be controlled by environment variable may be more intuitive and useful and would just be a good addition for C++, Python, and R. 

### What changes are included in this PR?

- A new environment variable `AWS_S3_LOG_LEVEL` with documentation for controlling S3 log level
- Updated documentation for C++, Python, and R
- A new `InitializeS3()` as a quality-of-life thing for C++ users. Feel free to ask me to remove this.

No changes are needed directly for Python and R because these implementation uses the internal implicit initializer `EnsureS3Initialized` rather than the explicit form, `InitializeS3`. And it's the behavior of the `EnsureS3Initialized` routine that's changed here.

### Are these changes tested?

Yes. I added a unit test for the new `GetS3LogLevelFromEnvOrDefault` and tested from Python and R manually. I didn't add a test to make sure the underlying `AwsInstance` gets set up correctly because it looked like it would require a refactor and didn't seem worth it.

### Are there any user-facing changes?

Yes. A new way to turn on logging for S3 and matching docs in C++, Python, and R.

* Closes: apache#35260

Lead-authored-by: Bryce Mecum <petridish@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…y environment variable (apache#38267)

### Rationale for this change

It's useful when troubleshooting issues with Arrow's S3 filesystem implementation to raise the log level. Currently, this can only be done in C++ and Python, but not from R. In addition, the log level can only be set during S3 initialization and not directly so the user has to introduce explicit S3 initialization code to turn on logging and must make sure this code is called before S3 initialization.

While discussing exposing control of log level to R, we realized that allowing the log level to be controlled by environment variable may be more intuitive and useful and would just be a good addition for C++, Python, and R. 

### What changes are included in this PR?

- A new environment variable `AWS_S3_LOG_LEVEL` with documentation for controlling S3 log level
- Updated documentation for C++, Python, and R
- A new `InitializeS3()` as a quality-of-life thing for C++ users. Feel free to ask me to remove this.

No changes are needed directly for Python and R because these implementation uses the internal implicit initializer `EnsureS3Initialized` rather than the explicit form, `InitializeS3`. And it's the behavior of the `EnsureS3Initialized` routine that's changed here.

### Are these changes tested?

Yes. I added a unit test for the new `GetS3LogLevelFromEnvOrDefault` and tested from Python and R manually. I didn't add a test to make sure the underlying `AwsInstance` gets set up correctly because it looked like it would require a refactor and didn't seem worth it.

### Are there any user-facing changes?

Yes. A new way to turn on logging for S3 and matching docs in C++, Python, and R.

* Closes: apache#35260

Lead-authored-by: Bryce Mecum <petridish@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…y environment variable (apache#38267)

### Rationale for this change

It's useful when troubleshooting issues with Arrow's S3 filesystem implementation to raise the log level. Currently, this can only be done in C++ and Python, but not from R. In addition, the log level can only be set during S3 initialization and not directly so the user has to introduce explicit S3 initialization code to turn on logging and must make sure this code is called before S3 initialization.

While discussing exposing control of log level to R, we realized that allowing the log level to be controlled by environment variable may be more intuitive and useful and would just be a good addition for C++, Python, and R. 

### What changes are included in this PR?

- A new environment variable `AWS_S3_LOG_LEVEL` with documentation for controlling S3 log level
- Updated documentation for C++, Python, and R
- A new `InitializeS3()` as a quality-of-life thing for C++ users. Feel free to ask me to remove this.

No changes are needed directly for Python and R because these implementation uses the internal implicit initializer `EnsureS3Initialized` rather than the explicit form, `InitializeS3`. And it's the behavior of the `EnsureS3Initialized` routine that's changed here.

### Are these changes tested?

Yes. I added a unit test for the new `GetS3LogLevelFromEnvOrDefault` and tested from Python and R manually. I didn't add a test to make sure the underlying `AwsInstance` gets set up correctly because it looked like it would require a refactor and didn't seem worth it.

### Are there any user-facing changes?

Yes. A new way to turn on logging for S3 and matching docs in C++, Python, and R.

* Closes: apache#35260

Lead-authored-by: Bryce Mecum <petridish@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R] Allow users to adjust AWS S3 log level from R
3 participants