Skip to content

Conversation

@nija-at
Copy link
Contributor

@nija-at nija-at commented Nov 8, 2024

What changes were proposed in this pull request?

Expose configure_logging as a public API that can be used
to configure the log level for Pyspark connect component.

Why are the changes needed?

We currently offer the mechanism to configure the connect-specific logger
based on the environment variable SPARK_CONNECT_LOG_LEVEL.

The logger is configured once at the the time of "module load". In some cases,
Python frameworks (eg. IPythonKernel) can modify the Python log level after the
fact leading to unintended log output.

There is no good way to restore the logger to restore its previous functionality
to honor the environment variable configured. 

Does this PR introduce any user-facing change?

Yes.

Provide a new API configure_logging in the module
pyspark.sql.connect.logging.

How was this patch tested?

Local testing by calling configure_logging with different log levels.

Further tested with IPythonKernel instance which changes the log level
and confirmed that calling this API during app startup fixes it back to the
correct log level.

Was this patch authored or co-authored using generative AI tooling?

No.

@nija-at nija-at changed the title expose configure_logging [SPARK-50427][CONNECT][PYTHON] Expose configure_logging as a public API Nov 26, 2024
@nija-at nija-at marked this pull request as ready for review November 26, 2024 09:59
@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants