Skip to content

[SPARK-46971][SQL] When the compression is null, a NullPointException should not be thrown#45015

Closed
panbingkun wants to merge 2 commits intoapache:masterfrom
panbingkun:compression_is_null
Closed

[SPARK-46971][SQL] When the compression is null, a NullPointException should not be thrown#45015
panbingkun wants to merge 2 commits intoapache:masterfrom
panbingkun:compression_is_null

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Feb 4, 2024

What changes were proposed in this pull request?

The pr aims to provide better prompts when option's compression is null.

Why are the changes needed?

In the original logic, if the compression is null, Spark will throw a NullPointerException, which is obviously unfriendly to the user.

val df = (1 to 5).map(i => ((i % 2).toString)).toDF("a")
df.write.option("compression", null).text("test1")

Before:

scala> df.write.option("compression", null).text("test1")
org.apache.spark.SparkException: [INTERNAL_ERROR] Eagerly executed command failed. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. SQLSTATE: XX000
  at org.apache.spark.SparkException$.internalError(SparkException.scala:107)
  at org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:550)
  at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:562)
  at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:119)
  at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:109)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:442)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:83)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:442)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:34)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:271)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:267)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:34)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:34)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:418)
  at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:109)
  at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:96)
  at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:94)
  at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:156)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:892)
  at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:389)
  at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:362)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:240)
  at org.apache.spark.sql.DataFrameWriter.text(DataFrameWriter.scala:834)
  ... 42 elided
Caused by: java.lang.NullPointerException: Cannot invoke "String.toLowerCase(java.util.Locale)" because "name" is null
  at org.apache.spark.sql.catalyst.util.CompressionCodecs$.getCodecClassName(CompressionCodecs.scala:38)
  at org.apache.spark.sql.execution.datasources.text.TextOptions.$anonfun$compressionCodec$1(TextOptions.scala:38)
  at scala.Option.map(Option.scala:242)
  ... 17 elided and 62 more

After:

scala> df.write.option("compression", null).text("test1")
org.apache.spark.SparkIllegalArgumentException: [CODEC_NOT_AVAILABLE.WITH_AVAILABLE_CODECS_SUGGESTION] The codec NULL is not available. Available codecs are bzip2, deflate, uncompressed, snappy, none, lz4, gzip. SQLSTATE: 56038
  at org.apache.spark.sql.errors.QueryExecutionErrors$.codecNotAvailableError(QueryExecutionErrors.scala:2716)
  at org.apache.spark.sql.catalyst.util.CompressionCodecs$.getCodecClassName(CompressionCodecs.scala:40)
  at org.apache.spark.sql.execution.datasources.text.TextOptions.$anonfun$compressionCodec$1(TextOptions.scala:38)
  at scala.Option.map(Option.scala:242)
  ... 79 elided

Does this PR introduce any user-facing change?

Yes, when compression is null, will display better error prompts.

How was this patch tested?

  • Add new UT.
  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

@panbingkun panbingkun marked this pull request as ready for review February 4, 2024 09:40
val compression: String = {
parameters.get(COMPRESSION).getOrElse(SQLConf.get.avroCompressionCodec)
val v = parameters.get(COMPRESSION).getOrElse(SQLConf.get.avroCompressionCodec)
if (v == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem is actually here and there, not only AvroOptions. I believe a lot of CSVOptions also has this problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for @HyukjinKwon 's comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be compatibility issues if null values are prohibited in CaseInsensitiveMap?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label May 19, 2024
@github-actions github-actions bot closed this May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants