New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINOR] Test case for hoodie.merge.allow.duplicate.on.inserts #6949
Changes from all commits
7607e14
afe56a9
6237e37
d549364
562b126
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -901,4 +901,47 @@ class TestInsertTable extends HoodieSparkSqlTestBase { | |
} | ||
} | ||
} | ||
|
||
test("Test enable hoodie.merge.allow.duplicate.on.inserts when write") { | ||
spark.sql("set hoodie.datasource.write.operation = insert") | ||
Seq("mor", "cow").foreach { tableType => | ||
withTempDir { tmp => | ||
val tableName = generateTableName | ||
spark.sql( | ||
s""" | ||
| create table $tableName ( | ||
| id int, | ||
| name string, | ||
| price double, | ||
| ts long, | ||
| dt string | ||
| ) using hudi | ||
| partitioned by (dt) | ||
| location '${tmp.getCanonicalPath}/$tableName' | ||
| tblproperties ( | ||
| primaryKey = 'id', | ||
| preCombineField = 'ts', | ||
| type = '$tableType' | ||
| ) | ||
""".stripMargin) | ||
spark.sql(s"insert into $tableName partition(dt='2021-12-25') values (1, 'a1', 10, 1000)") | ||
checkAnswer(s"select id, name, price, ts, dt from $tableName")( | ||
Seq(1, "a1", 10, 1000, "2021-12-25") | ||
) | ||
spark.sql("set hoodie.merge.allow.duplicate.on.inserts = false") | ||
spark.sql(s"insert into $tableName partition(dt='2021-12-25') values (1, 'a2', 20, 1001)") | ||
checkAnswer(s"select id, name, price, ts, dt from $tableName")( | ||
Seq(1, "a2", 20, 1001, "2021-12-25") | ||
) | ||
spark.sql("set hoodie.merge.allow.duplicate.on.inserts = true") | ||
spark.sql(s"insert into $tableName partition(dt='2021-12-25') values (1, 'a3', 30, 1002)") | ||
checkAnswer(s"select id, name, price, ts, dt from $tableName")( | ||
Seq(1, "a2", 20, 1001, "2021-12-25"), | ||
Seq(1, "a3", 30, 1002, "2021-12-25") | ||
) | ||
} | ||
} | ||
spark.sql("set hoodie.merge.allow.duplicate.on.inserts = false") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it makes sense to set these after the test is done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since it defaults to false, we need to restore it after the test is complete There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should reset it at the beginning of the test There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is done to ensure that other test cases are not affected by this case, just like line 945, 535 and so on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I saw many cases use this way to set a config at the beginning and restore a config at the ending. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @YannByron OK, I create a ticket https://issues.apache.org/jira/browse/HUDI-5055 |
||
spark.sql("set hoodie.datasource.write.operation = upsert") | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -609,7 +609,6 @@ class TestMergeIntoTable2 extends HoodieSparkSqlTestBase { | |
| preCombineField = 'ts' | ||
| ) | ||
""".stripMargin) | ||
spark.sql("set hoodie.merge.allow.duplicate.on.inserts = true") | ||
// Insert data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why drop this line ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Because this configuration does not work here and can be misleading |
||
spark.sql(s"insert into $tableName select 1, 1, 'a1', 1, 10, '2022-08-18'") | ||
checkAnswer(s"select id1, id2, name, price, ts, dt from $tableName")( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest we extract primary-key to a common var to make sharing apparent