Skip to content

[SPARK-28195][SQL] Fix CheckAnalysis not working for InsertIntoDataSourceDirCommand and report misleading error message#25019

Closed
liupc wants to merge 2 commits intoapache:masterfrom
liupc:Fix-SPARK-28195
Closed

[SPARK-28195][SQL] Fix CheckAnalysis not working for InsertIntoDataSourceDirCommand and report misleading error message#25019
liupc wants to merge 2 commits intoapache:masterfrom
liupc:Fix-SPARK-28195

Conversation

@liupc
Copy link

@liupc liupc commented Jul 1, 2019

What changes were proposed in this pull request?

This PR will try to fix the issue that the sub plan of InsertDataSourceDirCommand is not being checked for analysis and sometimes will report misleading error message.
An example sql is like
insert overwrite directory '/path' using parquet select * from table1

When "table1" does not exists, we will finally got a misleading error message:

Caused by: org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'kr.objective_id
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:440)
at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:159)
at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:159)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:544)
at org.apache.spark.sql.execution.command.InsertIntoDataSourceDirCommand.run(InsertIntoDataSourceDirCommand.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.execution.adaptive.QueryStage.executeCollect(QueryStage.scala:246)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3277)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3276)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:277)
... 11 more

How was this patch tested?

exist UT(AnalysisSuite)

@maropu maropu changed the title [SPARK-28195]Fix CheckAnalysis not working for command and report misleading error message [SPARK-28195][SQL] Fix CheckAnalysis not working for command and report misleading error message Jul 1, 2019
@maropu
Copy link
Member

maropu commented Jul 1, 2019

ok to test

@maropu
Copy link
Member

maropu commented Jul 1, 2019

This issue only exists in INSERT command? Probably, you'd be better to make the title and description more concrete...

@SparkQA
Copy link

SparkQA commented Jul 1, 2019

Test build #107059 has finished for PR 25019 at commit 9a32d8e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liupc liupc changed the title [SPARK-28195][SQL] Fix CheckAnalysis not working for command and report misleading error message [SPARK-28195][SQL] Fix CheckAnalysis not working for insert command and report misleading error message Jul 1, 2019
@liupc
Copy link
Author

liupc commented Jul 1, 2019

I checked the command implementations, I'm not very sure, but it seems only affect the INSERT command, thank you! @maropu

@liupc liupc changed the title [SPARK-28195][SQL] Fix CheckAnalysis not working for insert command and report misleading error message [SPARK-28195][SQL] Fix CheckAnalysis not working for INSERT command and report misleading error message Jul 1, 2019
@maropu
Copy link
Member

maropu commented Jul 2, 2019

Also, can you add some tests? This pr will change exception messages, right?

@liupc
Copy link
Author

liupc commented Jul 2, 2019

OK, I will add some test later.

@liupc liupc changed the title [SPARK-28195][SQL] Fix CheckAnalysis not working for INSERT command and report misleading error message [SPARK-28195][SQL] Fix CheckAnalysis not working for InsertIntoDataSourceDirCommand and report misleading error message Jul 2, 2019
@SparkQA
Copy link

SparkQA commented Jul 2, 2019

Test build #107114 has finished for PR 25019 at commit ec711af.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val exception = intercept[AnalysisException](
analyzer.executeAndCheck(parser.parsePlan(query), new QueryPlanningTracker))
assert(exception.getMessage.contains("Table or view not found: table1"))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this into InsertSuite?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK

case e: InsertIntoDataSourceDirCommand => e.query
case _ => plan
}
super.checkAnalysis(planToCheck)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we resolve this issue in the InsertIntoDataSourceDirCommand side?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu Yes, I agree that resolving this issue inside InsertIntoDataSourceDirCommand is better, but as I have already questioned is the jira: https://issues.apache.org/jira/browse/SPARK-28195, I'm not sure whether there are some special consideration for making the children of InsertIntoDataSourceDirCommand to empty and use innerChildren instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant this issue only happened when spark.sql.runSQLOnFiles=true? What if spark.sql.runSQLOnFiles=false?

Copy link
Author

@liupc liupc Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu This issue affect more aspects, what I point out in the jira is just for the case "table or view not found", actually, this issue may cause many other problems.
The root cause is that the the children of InsertIntoDataSourceDirCommand is empty, thus many analysis rules after InsertIntoDataSourceDirCommand being inserted(In DataSourceAnalysis rule) may not be effective and so was it for CheckAnalysis.
I think we can fix it better inside InsertIntoDataSourceDirCommand, but I should first make it clear that why we set it's children to empty, but use innerChildren instead? Is there any PR or issue for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should investigate it by yourself. I don't like the current approach too. Appeartly issue is minor but the fix is pretty invasive. I doesn't need to touch SessionStateBuilder side at all.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

Closing this due to inactivity from its author.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants