Make sure bisect-ppx-report handles corrupted coverage files gracefully #385

aantron · 2021-10-20T10:58:02Z

No description provided.

vch9 · 2021-10-29T08:24:53Z

Regarding corrupted coverage files do you have a hint on the difference between the following errors:

Info: found coverage files in '_coverage_output/'
bisect-ppx-report: internal error, uncaught exception:
                   Bisect_common.Invalid_file("_coverage_output/277891434.coverage", "unexpected end of file while reading magic number")

and

Info: found coverage files in '_coverage_output/'
bisect-ppx-report: internal error, uncaught exception:
                   Bisect_common.Invalid_file("_coverage_output/697300040.coverage", "exception reading data: End_of_file")

vch9 · 2021-10-29T08:50:25Z

Well I just had another exception raised. If you're not already working on this @aantron and need help (review, pr etc) I'll be happy to help as we deeply need this feature

aantron · 2021-10-29T10:36:13Z

Regarding corrupted coverage files do you have a hint on the difference between the following errors:

They are the same error (reading a truncated file, so End_of_file), but occurring during different parts of the read. The first is occurring during reading of a magic number in the file header, and the second during reading of the main file "payload." Neither should be fatal like this or reported in this way.

I haven't started working on this in particular. What I've done so far is considerably refactored the reporter to make it easier to work on, because I dreaded working on it due to some legacy holdover from features deleted in the last couple of years. I've just removed all of those. You can see the progress in src/report in master.

Would you like to work on this directly? If not, my plan is to first clarify some of the I/O functions that are most responsible for all the I/O errors, and then fix this issue. If you want to work on this immediately, though, I will pause refactoring the reporter.

vch9 · 2021-10-29T15:06:32Z

I haven't started working on this in particular. What I've done so far is considerably refactored the reporter to make it easier to work on, because I dreaded working on it due to some legacy holdover from features deleted in the last couple of years. I've just removed all of those. You can see the progress in src/report in master.

I saw this; this is great :)

Would you like to work on this directly? If not, my plan is to first clarify some of the I/O functions that are most responsible for all the I/O errors, and then fix this issue. If you want to work on this immediately, though, I will pause refactoring the reporter.

My priority would be to fix the bad writing of coverage files to avoid the loss of information If you're already working or have plans to do this this should be faster. However, I could test when the issue is fixed to see how well that kind of exception is handled.

aantron · 2021-10-29T20:15:35Z

My priority would be to fix the bad writing of coverage files to avoid the loss of information If you're already working or have plans to do this this should be faster. However, I could test when the issue is fixed to see how well that kind of exception is handled.

I'm not working on that at all yet — I want to clean up the reporter a bit more first, before looking in detail at the writing of .coverage files. I also wanted to give time to get some more information/opinions in #384 before I do something that might turn out not to be necessary :)

Related to #385.

aantron · 2021-11-03T16:10:33Z

The errors should be somewhat clearer now, and I've added some documentation about them in d8917a3. I've simplified the code, so that it would be easier to implement graceful error handling. However, before adding it, it's better to work on making the writing as correct as possible instead, as per #385 (comment). Then, it will be clearer what the reader in the reporter has to actually adapt to. Any word on what from #384 (comment) makes sense in your testing environment?

vch9 · 2021-11-04T15:11:07Z

Any word on what from #384 (comment) makes sense in your testing environment?

It is most certainly related to this issue, we launch several processes and kill them if something went wrong.

aantron · 2021-11-04T15:16:18Z

Any word on what from #384 (comment) makes sense in your testing environment?

It is most certainly related to this issue, we launch several processes and kill them if something went wrong.

I mean what potential solution makes sense in your environment. Can you have the processes handle the signals better, for example?

vch9 · 2021-11-04T15:34:53Z

Any word on what from #384 (comment) makes sense in your testing environment?

It is most certainly related to this issue, we launch several processes and kill them if something went wrong.

I mean what potential solution makes sense in your environment. Can you have the processes handle the signals better, for example?

Yes this is what I'm trying to do on my side. However, I still believe bisect-ppx-report could emit a warning instead of failing if let's say only one coverage file is corrupted (in hundreds of them idk).

Part of #385.

aantron · 2021-11-07T05:25:17Z

I've added a test that truncates a .coverage file to various lengths and shows the behavior of the reporter.

I still believe bisect-ppx-report could emit a warning instead of failing if let's say only one coverage file is corrupted (in hundreds of them idk).

We can probably add this as an option. I don't think it should be the default, because in a smaller project in which runs are more predictable:

corrupt coverage files should be extremely rare;
when they do occur, they suggest some kind of severe error that should not be ignored;
a missing coverage file causing only a warning (say, in CI) can silently skew final coverage statistics considerably in a smaller project.

Part of #385.

arvidj · 2021-11-15T18:36:45Z

Any word on what from #384 (comment) makes sense in your testing environment?

Broadly, we have three types of tests:

unit tests that do not forking nor spawn any processes
integration tests that spawn processes by forking
system tests that execute separate binaries

For the first type, I see no reason why we should have any corrupted file, nor have we seen that.

For the second type of test, of which we have very few, we've experimented with adding signal handlers, attempting to make sure that everything terminates gracefully w.r.t. bisect. However, it's been tricky to find a solution that works well, especially when running in gitlab CI for some reason. In the latest version, which seems to work OK, we have a signal handler that catches sigterms from the parent process, and then make sure that bisect's at_exit hook is finally called. It does seem like I need to mask sigterm in bisect's at_exit hook though. You can read more about this here.

Since we need to both install signal handlers and modify bisect's runtime for each test, I think I'll try to implement the --sigterm option you propose in #384 : this would give us a handier solution if similar problems pop up in the future.

Regarding the third set of tests, as long as the testing frameworks terminates the binaries gracefully, and as long as the terminated binary has a sigterm handler in place that does nothing crazy, it seems like no modifications are needed.

Our goal is to never have any corrupted traces written and to make sure each launched processes writes a trace. And to fail fast when this is not the case. We figure this will give users more confidence in the coverage reports. We're just hoping this will not make our CI less robust and that we won't have debug tests corrupting traces all the time.

aantron · 2021-11-18T10:07:19Z

Thanks for that! I'm going to close this issue as likely unnecessary, since it seems it can be addressed by trying not to write corrupted coverage files. If that turns out to still not be practical, this issue can be revisited.

aantron mentioned this issue Oct 20, 2021

Gracefully handling termination in instrumented binaries #384

Closed

aantron added this to the 2.6.4 milestone Oct 20, 2021

aantron added a commit that referenced this issue Nov 3, 2021

Reporter: clarify error handling in input.ml

d8917a3

Related to #385.

aantron added a commit that referenced this issue Nov 7, 2021

Reporter: test truncated .coverage files

fef9475

Part of #385.

aantron added a commit that referenced this issue Nov 7, 2021

Reporter: test truncated .coverage files

e14d89f

Part of #385.

aantron closed this as completed Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure bisect-ppx-report handles corrupted coverage files gracefully #385

Make sure bisect-ppx-report handles corrupted coverage files gracefully #385

aantron commented Oct 20, 2021

vch9 commented Oct 29, 2021

vch9 commented Oct 29, 2021

aantron commented Oct 29, 2021

vch9 commented Oct 29, 2021 •

edited

Loading

aantron commented Oct 29, 2021

aantron commented Nov 3, 2021

vch9 commented Nov 4, 2021

aantron commented Nov 4, 2021

vch9 commented Nov 4, 2021

aantron commented Nov 7, 2021 •

edited

Loading

arvidj commented Nov 15, 2021

aantron commented Nov 18, 2021

Make sure bisect-ppx-report handles corrupted coverage files gracefully #385

Make sure bisect-ppx-report handles corrupted coverage files gracefully #385

Comments

aantron commented Oct 20, 2021

vch9 commented Oct 29, 2021

vch9 commented Oct 29, 2021

aantron commented Oct 29, 2021

vch9 commented Oct 29, 2021 • edited Loading

aantron commented Oct 29, 2021

aantron commented Nov 3, 2021

vch9 commented Nov 4, 2021

aantron commented Nov 4, 2021

vch9 commented Nov 4, 2021

aantron commented Nov 7, 2021 • edited Loading

arvidj commented Nov 15, 2021

aantron commented Nov 18, 2021

vch9 commented Oct 29, 2021 •

edited

Loading

aantron commented Nov 7, 2021 •

edited

Loading