Flink: Refactor SimpleDataUtil#assertTableRecords method#1765
Flink: Refactor SimpleDataUtil#assertTableRecords method#1765openinx merged 2 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Using hashCode to sort this list is correct if we only want to check wether the two list is equal or not. But it may not friend to debug the root cause when two list is not equal because those elements are disordered. I'd prefer to compare the fields in lexicographical order if possible.
There was a problem hiding this comment.
will there be some unknown data type in the Record that does not implement the compareTo interface?
There was a problem hiding this comment.
Yeah, if this is just for comparing two lists, then it would be better to use a set.
There was a problem hiding this comment.
The assertTableRecords method provides a List parameter. For a new user to use this method, he may not know to use set when comparing data.
Whether there is such a situation: the original data is (1,'a'),(1,'a'), but the actual running result is (1,'a'). This test case passed, but it is wrong.
There was a problem hiding this comment.
Maybe we should remove the duplicate records from tests so that we can use a set?
There was a problem hiding this comment.
We can generate non-repetitive test data when writing test cases, but there must be duplicate data in the production environment. Will there be test cases that pass but the production environment is incorrect?
There was a problem hiding this comment.
In this SimpleDataUtil, all Record should have the same schema (id and data column), so I think we could just compare those two fields directly. Don't have to compare the hashCode.
About using set to compare lists, I'd prefer to compare sorted lists. Because in future equality-delete test cases, we may delete the same row several times to make sure that we've gurantteed the correct semantics.
There was a problem hiding this comment.
In order to test the multi-level partition filter, I constructed a 2-level partition in this method : TestRewriteDataFilesAction#testRewriteDataFilesWithFilter , so each record has three fields. I don’t know if there will be similar test cases in the future.
There was a problem hiding this comment.
Well, how about using the com.google.common.collect.Multiset to assert those lists with duplicated records ? Don't have to sort based on hashCode, and it also allow to use more fields.
There was a problem hiding this comment.
Yes,it is great ,I update the pr.
e52aaa8 to
b32919f
Compare
if the parameter
List<Record>contains repeated data, theSetwill remove the duplicates, which is different from our expected value,we should compare the original list