Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctness issues #7

Open
demurgos opened this issue Sep 20, 2018 · 4 comments
Open

Correctness issues #7

demurgos opened this issue Sep 20, 2018 · 4 comments

Comments

@demurgos
Copy link

The lib currently produces incorrect results.
I am still looking into it but here are two examples:

  • See this gist. Merging input.json with itself should simply double the counts. Instead, it returns output.json.

  • The library uses function names as cache keys, but there are no guarantees about the names: they can be empty strings or duplicated. It causes the lib to try to merge this closure with this one because the coverage gives them both the name process.on.

@demurgos
Copy link
Author

demurgos commented Sep 22, 2018

I'm still working on getting the right algorithm, but it's a bit tricky to get something correct while minimizing perf impact. This write up is mostly to quickly document the current problem I am working on. I still plan to write a detailed explanation alongside the algorithm before sending a PR.

The basic idea is that in a function coverage from a single script coverage, the ranges form a tree based on the inclusion relation: if a range is included in another one, it's as if this range was a child in a tree.

ranges: [{start: 0, end: 9, count: 1}, {start: 1, end: 3, count: 2}, {start: 5, end: 7, count: 3}]

gives the tree (one root range with two child ranges):

0  1  2  3  4  5  6  7  8  9
[1------------------------)
      |           |
   [2---)      [3---)

I won't draw the links in the next parts: basically if a range is inside another, it's as if it was a child in a tree.

Now, if you look at ranges for the same function but across different script coverages, the ranges no longer form a tree: they have partial overlap such that no node is the parent of each other.
Here are two trees with partial overlap:

Tree A:
0  1  2  3  4  5  6  7  8  9
[1------------------------)
   [2---------)
Tree B:
0  1  2  3  4  5  6  7  8  9
[4------------------------)
         [7---------)

The root ranges are the same (from 0 to 9), this is a property of function coverages. But the child ranges of each tree have partial overlap. This kind of situation only happens when merging the results of sibling nodes. Splitting at the edges of each range is safe with regard to the relations to AST spans.
The expected result after merging the two trees is:

0  1  2  3  4  5  6  7  8  9
[5------------------------)
   [6---)[9---)[8---)

The count of 5 is obtained simply by summing the counts of each root ranges: these are the same. For count of 6, we merge the root range of B with the first part of the child range of A. 9 is found by summing the counts in the intersection of the children. And 8 is found with the root of A and last part of the child range of B.

The first version of the algorithm I had was over-splitting the ranges. This is something that is almost fixed but still requires testing. The issue occurs when trying to merge the following range trees:

Tree A:
0  1  2  3  4  5  6  7  8  9
[1------------------------)
   [2---------------)
Tree B:
0  1  2  3  4  5  6  7  8  9
[4------------------------)
         [7---)

The expected result is:

0  1  2  3  4  5  6  7  8  9
[5------------------------)
   [6---------------)
         [9---)

But my first version produced:

0  1  2  3  4  5  6  7  8  9
[5------------------------)
   [6---)[9---)[6---)

Chopping down ranges when there is no partial overlap is unsafe because it may break invariants about AST spans, for example by splitting an if or for statement. The tricky part is that you need to find all the pairs of ranges that are either nesting or having partial overlap. For nesting branches, you need to add an additional nesting layer while ranges with partial overlap need to be split at their boundaries. (Also merging occurs between any number of trees, not only 2).
I initially explored a solution trying to handle these cases separately (nest first, then split at boundaries) but it required to repeat it until reaching a fixed point because a range can both nest with some range and have partial overlap with another one. The current solution properly prepares the trees (without a fix point), but I need to adjust the summing part to account for the changes. I'll have a few days off but will report results.

@demurgos
Copy link
Author

demurgos commented Sep 26, 2018

If you support nesting and partial overlap of ranges across coverage from different processes, binary merge (two coverages at a time) is not associative. I have a working binary merge but some issues with n-ary merges. I need to reevaluate what properties apply to coverages.

@demurgos
Copy link
Author

demurgos commented Sep 26, 2018

Here are the correct merge rules (these rules should restore associativity and match the experimental results):

Disjoint:

A      B       C      D       E     F
[a---------------------------------)
       [b-----)
+
[c---------------------------------)
                      [d-----)
=
[a+c-------------------------------)
       [b+c---)       [a+d---)

Nested

A      B       C      D       E     F
[a---------------------------------)
       [b--------------------)
+
[c---------------------------------)
               [d----)
=
[a+c-------------------------------)
       [b+c------------------)
               [b+d--)

Partial overlap

A      B       C      D       E     F
[a---------------------------------)
       [b------------)
+
[c---------------------------------)
               [d------------)
=
[a+c-------------------------------)
       [b+c----------)[a+d---)
               [b+d--)

The main thing that bothers me is the left-side bias of partial overlaps. I couldn't find any real-world example where overlap should be resolved with a right-side bias, but I only checked a few examples. If one such example of real-world right-side bias is found, it would mean that it is impossible to write a merge function producing valid results without external input (such as the source code) because of the ambiguity of the overlaps.

Here is a small example producing overlapping ranges (that must be resolved with a left-side bias):

module.exports = function lib (n) {
  if (n > 0) {
    return true
    console.log('Hello World')
  }

  return false
}

By calling it with 0, 1 and 2, you get:

79    111   128   163   179   180   181
[1---------------------------------)
      [0---------)      [0---)
+
[1---------------------------------)
            [0---------------)
+
[1---------------------------------)
            [0---------------)
=
[3---------------------------------)
      [2---------)[1---)[0---)
            [0---)

But even here, it's not quite clear why the result wouldn't be:

79    111   128   163   179   180   181
[3---------------------------------)
      [2---------)[1---------)
            [0---)      [0---)

demurgos added a commit to demurgos/c8 that referenced this issue Oct 9, 2018
This commit uses the [new merge algorithm][merge] to handle sub-processes. It fixes the issues reported in bcoe/v8-coverage-merge#7 and improves performance.

merge: https://github.com/demurgos/v8-coverage/blob/master/docs/merge.md
demurgos added a commit to demurgos/c8 that referenced this issue Oct 9, 2018
This commit uses the [new merge algorithm][merge] to handle sub-processes. It fixes the issues reported in bcoe/v8-coverage-merge#7 and improves performance.

[merge]: https://github.com/demurgos/v8-coverage/blob/master/docs/merge.md
demurgos added a commit to demurgos/c8 that referenced this issue Oct 9, 2018
This commit uses the [new merge algorithm][merge] to handle sub-processes. It fixes the issues reported in bcoe/v8-coverage-merge#7 and improves performance.

[merge]: https://github.com/demurgos/v8-coverage/blob/master/docs/merge.md
demurgos added a commit to demurgos/c8 that referenced this issue Oct 12, 2018
This commit uses the [new merge algorithm][merge] to handle sub-processes. It fixes the issues reported in bcoe/v8-coverage-merge#7 and improves performance.

[merge]: https://github.com/demurgos/v8-coverage/blob/master/docs/merge.md

Closes bcoe#27
demurgos added a commit to demurgos/c8 that referenced this issue Oct 12, 2018
This commit uses the [new merge algorithm][merge] to handle sub-processes. It fixes the issues reported in bcoe/v8-coverage-merge#7 and improves performance.

[merge]: https://github.com/demurgos/v8-coverage/blob/master/docs/merge.md

Closes bcoe#27
@bcoe
Copy link
Owner

bcoe commented Dec 24, 2018

@demurgos perhaps we should just deprecate this repo in favor of @c88/v8-coverage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants