New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-1850][HUDI-3234] Fixing read of a empty table but with failed write #2903
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2903 +/- ##
=============================================
- Coverage 47.62% 27.41% -20.22%
+ Complexity 5502 1285 -4217
=============================================
Files 930 381 -549
Lines 41268 15107 -26161
Branches 4137 1304 -2833
=============================================
- Hits 19655 4141 -15514
+ Misses 19865 10667 -9198
+ Partials 1748 299 -1449
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
Outdated
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
Outdated
Show resolved
Hide resolved
7f8c4ef
to
fa044aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what the right behavior here should be. Should we error out or return an empty dataframe?
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
Outdated
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
Outdated
Show resolved
Hide resolved
If someone tries to read a hudi table from a path that does not even exist, we get this exception as of now.
So, I Guess the current fix should be fine. WDYT. |
297875e
to
fcb7ab7
Compare
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
Outdated
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
Outdated
Show resolved
Hide resolved
fcb7ab7
to
f474fba
Compare
my bad. I have no idea why I complicated it so much. Fixed it. |
case e: InvalidTableException => | ||
assertTrue(e.getMessage.contains("Invalid Hoodie Table")) | ||
} finally { | ||
spark.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nsivabalan why are stopping the spark session here? is it not shared outside of a single test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we initialize at the beginning of each test. I know we have to fix this entire class for reuse in general. @hmit is working on this refactoring.
SQL DML extension tests are failing. I need time to check those out. Will update once I am able to make CI happy. |
f474fba
to
0b48bd5
Compare
@nsivabalan moving this back to ready for review. Please update when the tests are fixed |
0b48bd5
to
712b446
Compare
712b446
to
2984599
Compare
This patch needs to be redone a bit. Since w/ sql dml, create relation will be called upfront, the empty table check has to be moved to sql dml layer. I will sync up with @pengzhiwei2018 on how to go about this. |
Guess we can remove the release blocker and critical label. Don't think this is very critical. I understand its nice to have. |
83ca285
to
fcafe23
Compare
@xushiyan : patch is good to review. |
@YannByron : Can you review the patch when you get a chance. should be small one. |
fcafe23
to
295deba
Compare
it's a very simple and forcible solution. There should have been |
5da3cac
to
6fa140c
Compare
@YannByron : there are some failures in sql-dml related tests. TestMergeInfo etc. after we create the table, first merge into fails bcoz, with the proposed fix, we return an empty relation which return NIL schema.
So, I guess we might need some fix on sql-dml classes. |
@nsivabalan All the failed DML is for an empty table but with failed write first? If the table is created by Spark-SQL, the right schema can be returned correctly. If not, throwing an exception is right and can't fix it. |
two TODO we need to consider:
|
@YannByron : yeah. feel free to put up a patch with all fixes required. once you have your patch, we can close this one. good to think about the unified schema. but wondering, for an empty table, does it really require to unify the schemas. why can't sql-dml layer intercept empty table and take action appropriately. |
@nsivabalan
�spark-sql also calls the The pr I submitted to your For table created by spark-sql, there is the right schema persisted in |
awesome, thanks a ton. I will work on it and update the patch. |
f0f4cba
to
ab45189
Compare
@YannByron : can you review the patch. |
LGTM. |
What is the purpose of the pull request
*Fixed read of an empty table (failed write) with proper information.
Brief change log
Verify this pull request
Stacktrace before the fix:
find in the attached jira.
Result after the fix:
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.