[SPARK-28195][SQL] Fix CheckAnalysis not working for InsertIntoDataSourceDirCommand and report misleading error message#25019
[SPARK-28195][SQL] Fix CheckAnalysis not working for InsertIntoDataSourceDirCommand and report misleading error message#25019liupc wants to merge 2 commits intoapache:masterfrom
Conversation
…leading error message
|
ok to test |
|
This issue only exists in INSERT command? Probably, you'd be better to make the title and description more concrete... |
|
Test build #107059 has finished for PR 25019 at commit
|
|
I checked the command implementations, I'm not very sure, but it seems only affect the INSERT command, thank you! @maropu |
|
Also, can you add some tests? This pr will change exception messages, right? |
|
OK, I will add some test later. |
|
Test build #107114 has finished for PR 25019 at commit
|
| val exception = intercept[AnalysisException]( | ||
| analyzer.executeAndCheck(parser.parsePlan(query), new QueryPlanningTracker)) | ||
| assert(exception.getMessage.contains("Table or view not found: table1")) | ||
| } |
There was a problem hiding this comment.
Can we move this into InsertSuite?
| case e: InsertIntoDataSourceDirCommand => e.query | ||
| case _ => plan | ||
| } | ||
| super.checkAnalysis(planToCheck) |
There was a problem hiding this comment.
Can we resolve this issue in the InsertIntoDataSourceDirCommand side?
There was a problem hiding this comment.
@maropu Yes, I agree that resolving this issue inside InsertIntoDataSourceDirCommand is better, but as I have already questioned is the jira: https://issues.apache.org/jira/browse/SPARK-28195, I'm not sure whether there are some special consideration for making the children of InsertIntoDataSourceDirCommand to empty and use innerChildren instead.
There was a problem hiding this comment.
You meant this issue only happened when spark.sql.runSQLOnFiles=true? What if spark.sql.runSQLOnFiles=false?
There was a problem hiding this comment.
@maropu This issue affect more aspects, what I point out in the jira is just for the case "table or view not found", actually, this issue may cause many other problems.
The root cause is that the the children of InsertIntoDataSourceDirCommand is empty, thus many analysis rules after InsertIntoDataSourceDirCommand being inserted(In DataSourceAnalysis rule) may not be effective and so was it for CheckAnalysis.
I think we can fix it better inside InsertIntoDataSourceDirCommand, but I should first make it clear that why we set it's children to empty, but use innerChildren instead? Is there any PR or issue for that?
There was a problem hiding this comment.
I think you should investigate it by yourself. I don't like the current approach too. Appeartly issue is minor but the fix is pretty invasive. I doesn't need to touch SessionStateBuilder side at all.
|
Can one of the admins verify this patch? |
|
Closing this due to inactivity from its author. |
What changes were proposed in this pull request?
This PR will try to fix the issue that the sub plan of InsertDataSourceDirCommand is not being checked for analysis and sometimes will report misleading error message.
An example sql is like
insert overwrite directory '/path' using parquet select * from table1When "table1" does not exists, we will finally got a misleading error message:
How was this patch tested?
exist UT(AnalysisSuite)