Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupt BTree merge results when duplicate keys leak across common subtree boundaries #20

Closed
lorentey opened this issue Nov 5, 2016 · 2 comments

Comments

@lorentey
Copy link
Collaborator

lorentey commented Nov 5, 2016

Most of BTree's merge methods have subtle bugs in their common subtree detection logic, which can sometimes lead to corrupt results and/or assertions when duplicate keys cross the boundaries of a common subtree.

It is not easy to reproduce this bug using public API, but the following test manages to do it:

    func test_Intersection_withModifiedSelf() {
        var tree = BTree<Int, Int>(order: 5)
        for i in 0 ..< 10 {
            for j in 0 ..< 2 {
                tree.insert((i, j))
            }
        }

        for i in 0 ..< 10 {
            var other = tree
            other.withCursor(atOffset: 2 * i) { cursor in
                cursor.remove(2)
            }
            other.assertValid()
            let test = tree.intersection(other)
            XCTAssertEqual(test.map { $0.0 }, other.map { $0.0 })
        }
    }

Results on current master branch:

Test Case '-[BTreeTests.BTreeMergeTests test_Intersection_withModifiedSelf]' started.
XCTAssertEqual failed: ("[1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") - 
XCTAssertEqual failed: ("[0, 0, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 0, 0, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") - 
XCTAssertEqual failed: ("[0, 0, 1, 1, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 0, 0, 1, 1, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 1, 1, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") - 
XCTAssertEqual failed: ("[0, 0, 1, 1, 2, 2, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 0, 0, 1, 1, 2, 2, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 1, 1, 2, 2, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") - 
XCTAssertEqual failed: ("[0, 0, 1, 1, 2, 2, 3, 3, 5, 5, 6, 6, 7, 7, 8, 8, 0, 0, 1, 1, 2, 2, 3, 3, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 1, 1, 2, 2, 3, 3, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]") - 
XCTAssertEqual failed: ("[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 6, 6, 7, 7, 8, 8, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 6, 6, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 6, 6, 7, 7, 8, 8, 9, 9]") - 
XCTAssertEqual failed: ("[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 7, 7, 8, 8, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 7, 7, 8, 8, 9, 9]") is not equal to ("[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 7, 7, 8, 8, 9, 9]") - 
Test Case '-[BTreeTests.BTreeMergeTests test_Intersection_withModifiedSelf]' failed (0.076 seconds).

Note how the returned results are often not even sorted correctly.

The bug affects distinctUnion, subtracting, symmetricDifference, intersection, and the new bag variants, but not union.

lorentey added a commit that referenced this issue Nov 5, 2016
Merge methods (i.e., set operations) sometimes did not correctly handle duplicate keys at the boundaries of common subtrees.

This is extremely hard to reproduce using conventional means.

Affected methods:
- BTree.distinctUnion
- BTree.subtracting
- BTree.bagSubtracting
- BTree.symmetricDifference
- BTree.bagSymmetricDifference
- BTree.intersection
- BTree.bagIntersection

- BTreeMerger.copyCommonElementsFromSecond
- BTreeMerger.copyMatchingNumberOfCommonElementsFromSecond
- BTreeMerger.skipCommonElements
- BTreeMerger.skipMatchingNumberOfCommonElements

This fixes issue #20.
@lorentey
Copy link
Collaborator Author

lorentey commented Nov 5, 2016

There were two distinct sub-bugs:

  1. Sometimes BTree merges considered an entire subtree shared even after they have already processed a non-empty prefix of it. As a result, the prefix tree would be inserted twice into the result, breaking the ordering invariant. This happened when the shared subtree was a suffix tree of one of the input trees. Affected BTree methods: distinctUnion, intersection, and bagIntersection.
  2. When a merging method detected a shared subtree S, and exactly one of the input trees had at least one element immediately following S with the same key as S.last!.key, these extra elements would not be counted as having matching keys across the two input trees. Affected BTree methods: distinctUnion, subtracting, symmetricDifference and intersection.

@lorentey lorentey closed this as completed Nov 5, 2016
@lorentey
Copy link
Collaborator Author

lorentey commented Nov 5, 2016

(Unit tests to catch these errors have been added in 8680703.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant