Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to set the logger level in XGBoost-PySpark #10065

Closed
danmar3 opened this issue Feb 22, 2024 · 4 comments · Fixed by #10077
Closed

Allow users to set the logger level in XGBoost-PySpark #10065

danmar3 opened this issue Feb 22, 2024 · 4 comments · Fixed by #10077

Comments

@danmar3
Copy link

danmar3 commented Feb 22, 2024

Hi, currently using XGBoost-PySpark in notebooks generates several log messages. I have not been able to turn them off. For example, when calling .transform, the notebook gets spammed with several messages like:

2024-02-22 08:44:49,270 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
2024-02-22 08:44:49,275 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
2024-02-22 08:44:49,310 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
2024-02-22 08:44:49,403 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
2024-02-22 08:44:49,411 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
2024-02-22 08:44:49,535 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
2024-02-22 08:44:49,603 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
2024-02-22 08:44:49,616 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs

Currently, every time get_logger is called (here), the logger information is set to INFO here.
This does not allow the user to set the logging level, as it is always re-set to 'INFO'.

I think this can be solved by removing the setLevel line here.

Thank you

@trivialfis
Copy link
Member

Hmm, we need to find a way to unify all the logging levels.

@trivialfis
Copy link
Member

There's XGB logging, Python logging, spark logging, among some others.

@wbo4958
Copy link
Contributor

wbo4958 commented Feb 26, 2024

Let me have a PR to fix this issue.

@wbo4958
Copy link
Contributor

wbo4958 commented Feb 28, 2024

Hi @danmar3, previously, it will print the Do the inference on the CPUs for every partition, which is really annoying. So I made #10077 to rework the log by putting the log showing on partition 0, which means there's only 1 line log printed for the inference. I think this is ok for debugging, especially for the GPU scenario, sometimes it will fall back to CPU due to the environment even though we have manually set it to use GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants