-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-51605][CONNECT] Try to create the parent directory before touching the logFile #50421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Option(System.getenv("SPARK_LOG_DIR")) | ||
| .orElse(Option(System.getenv("SPARK_HOME")).map(p => Paths.get(p, "logs").toString)) | ||
| .foreach { p => | ||
| Files.createDirectories(Paths.get(p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @HyukjinKwon
It seems that manually creating the service directory here can resolve the issue, and even if the parent directory already exists, this action will not cause any actual negative impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or could we call waitUntilFileExists(logDir) before waitUntilFileExists(logFile)? Is it necessary to do so?
…hing the logFile ### What changes were proposed in this pull request? This PR try to create the parent directory before touching the log file in `connect.SparkSession.withLocalConnectServer` to avoid issues when the parent directory does not exist. ### Why are the changes needed? When the `logs` directory does not exist under the `SPARK_HOME` path, executing `bin/spark-shell --remote local` will result in the following error: ``` bin/spark-shell --remote local WARNING: Using incubator modules: jdk.incubator.vector Exception in thread "main" java.nio.file.NoSuchFileException: /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) at java.base/java.nio.file.Files.readAttributes(Files.java:1851) at java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173) at java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154) at java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151) at java.base/java.security.AccessController.doPrivileged(AccessController.java:569) at java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150) at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885) at java.base/java.nio.file.Path.register(Path.java:894) at org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717) at org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798) at org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791) at scala.Option.foreach(Option.scala:437) at org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791) at org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) at org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory /private/var/folders/j2/cfn7w6795538n_416_27rkqm0000gn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538cccc ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Acitons - Manual Check: 1. Package a Spark Client using the command `dev/make-distribution.sh --tgz -Phive`. 2. Execute `bin/spark-shell --remote local`. Although the logs directory does not exist, the aforementioned error is no longer reported. 3. After exiting the Connect REPL, execute `bin/spark-shell --remote local` again. At this point, the logs directory already exists, and the shell will start successfully. (Due to the unresolved issue SPARK-51606, it is necessary to manually kill the Connect Server process after exiting the Connect REPL.) ### Was this patch authored or co-authored using generative AI tooling? No Closes #50421 from LuciferYang/SPARK-51605. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 09e726d) Signed-off-by: yangjie01 <yangjie01@baidu.com>
|
Merged into master/branch-4.0. Thanks @HyukjinKwon |
…hing the logFile ### What changes were proposed in this pull request? This PR try to create the parent directory before touching the log file in `connect.SparkSession.withLocalConnectServer` to avoid issues when the parent directory does not exist. ### Why are the changes needed? When the `logs` directory does not exist under the `SPARK_HOME` path, executing `bin/spark-shell --remote local` will result in the following error: ``` bin/spark-shell --remote local WARNING: Using incubator modules: jdk.incubator.vector Exception in thread "main" java.nio.file.NoSuchFileException: /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) at java.base/java.nio.file.Files.readAttributes(Files.java:1851) at java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173) at java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154) at java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151) at java.base/java.security.AccessController.doPrivileged(AccessController.java:569) at java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150) at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885) at java.base/java.nio.file.Path.register(Path.java:894) at org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717) at org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798) at org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791) at scala.Option.foreach(Option.scala:437) at org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791) at org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) at org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory /private/var/folders/j2/cfn7w6795538n_416_27rkqm0000gn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538cccc ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Acitons - Manual Check: 1. Package a Spark Client using the command `dev/make-distribution.sh --tgz -Phive`. 2. Execute `bin/spark-shell --remote local`. Although the logs directory does not exist, the aforementioned error is no longer reported. 3. After exiting the Connect REPL, execute `bin/spark-shell --remote local` again. At this point, the logs directory already exists, and the shell will start successfully. (Due to the unresolved issue SPARK-51606, it is necessary to manually kill the Connect Server process after exiting the Connect REPL.) ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50421 from LuciferYang/SPARK-51605. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 487113c) Signed-off-by: yangjie01 <yangjie01@baidu.com>
What changes were proposed in this pull request?
This PR try to create the parent directory before touching the log file in
connect.SparkSession.withLocalConnectServerto avoid issues when the parent directory does not exist.Why are the changes needed?
When the
logsdirectory does not exist under theSPARK_HOMEpath, executingbin/spark-shell --remote localwill result in the following error:Does this PR introduce any user-facing change?
No
How was this patch tested?
dev/make-distribution.sh --tgz -Phive.bin/spark-shell --remote local. Although the logs directory does not exist, the aforementioned error is no longer reported.bin/spark-shell --remote localagain. At this point, the logs directory already exists, and the shell will start successfully. (Due to the unresolved issue SPARK-51606, it is necessary to manually kill the Connect Server process after exiting the Connect REPL.)Was this patch authored or co-authored using generative AI tooling?
No