Skip to content

[SPARK-49825][SQL] Fix the default warehouse dir for case with white spaces.#54794

Open
fm100 wants to merge 1 commit intoapache:masterfrom
fm100:fix-default-spark-sql-warehouse-dir-url-encoded
Open

[SPARK-49825][SQL] Fix the default warehouse dir for case with white spaces.#54794
fm100 wants to merge 1 commit intoapache:masterfrom
fm100:fix-default-spark-sql-warehouse-dir-url-encoded

Conversation

@fm100
Copy link

@fm100 fm100 commented Mar 13, 2026

What changes were proposed in this pull request?

The default warehouse directory, the value of spark.sql.warehouse.sql conf, is always spark-warehouse/ under the current directory. The reported issue is for the case when the current directory path contains white spaces. Current behavior of the default conf for warehouse path is to resolve the URI of the current path, which causes the URL encoded string. Because we know that the default warehouse is always the local filesystem directory, this PR proposes to use directly the File object instead of Utils.resolveURI to avoid unnecessary URL encoded string.

Why are the changes needed?

Without this PR, when saving dataframe as table like df.saveAsTable("my_table"), the files for the table is not created under expected $PWD/spark-warehouse/my_table, but it creates a brand new directory that with URL encoded string.

Does this PR introduce any user-facing change?

Yes, when the spark process runs at the path with white spaces, the destination of df.saveAsTable changes.

How was this patch tested?

It was tricky to add a test case since this is related with the process path. Instead I did perform manual testing, making the directory with white spaces, reproduce the issue with the existing code, and then verify the fix. Followed through the reproduce path that the original jira ticket proposed.

Was this patch authored or co-authored using generative AI tooling?

No.

@fm100 fm100 force-pushed the fix-default-spark-sql-warehouse-dir-url-encoded branch from fb7a6d0 to be4d1ab Compare March 14, 2026 00:07
The default warehouse dir, when not set, is the spark-warehouse/ under the
current directory. The issue is about when the current directory path contains
white space, where the URI resolution makes the path URI encoded. This commit
is to fix by using File object directly for the default case. The default is
always local file system, and does not need to go through URI resolution.

Signed-off-by: Minkyu Park <minkyu.park.200@gmail.com>
@fm100 fm100 force-pushed the fix-default-spark-sql-warehouse-dir-url-encoded branch from be4d1ab to 2b1a18c Compare March 14, 2026 00:08
@fm100
Copy link
Author

fm100 commented Mar 14, 2026

@holdenk Thank you for your help. Please review 😄

@HyukjinKwon HyukjinKwon changed the title [SPARK-49825] Fix the default warehouse dir for case with white spaces. [SPARK-49825][SQL] Fix the default warehouse dir for case with white spaces. Mar 16, 2026
.version("2.0.0")
.stringConf
.createWithDefault(Utils.resolveURI("spark-warehouse").toString)
.createWithDefault(new File("spark-warehouse").getCanonicalFile.toString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think it might be an issue when HDFS scheme is default. new File will always pick the local file as spark-warehouse. Utils.resolveURI("spark-warehouse").toString will result in HDFS's current working directory if default scheme is set as HDFS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looking at the resolveURI code path it says:

   * Return a well-formed URI for the file described by a user input string.
   *
   * If the supplied path does not contain a scheme, or is a relative path, it will be
   * converted into an absolute path with a file:// scheme.

So it it shouldn't ever give an HDFS path right? Since this is only on the create with default path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. makes sense

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK LGTM cc @cloud-fan for a second look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants