Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypothesis Hijacking of Coverage Collector Breaks Coverage of Unexecuted Files #1085

Closed
theaeolianmachine opened this issue Jan 24, 2018 · 9 comments
Labels
bug something is clearly wrong here

Comments

@theaeolianmachine
Copy link

Hey Hypothesis contributors - thanks for contributing to such a useful library.

Recently we were really befuddled that after using coverage for a long time, it was no longer respecting the --source flag, which searches for unexecuted Python files to include in the coverage report. For a team that's working up its code coverage, it's pretty essential to get a realistic picture of the code base.

However, when you use hypothesis in any of the tests, it only reports on files that are hit by tests, over inflating this number. Note this is specifically with coverage + py.test + hypothesis. I spent a decent bit of time debugging this, and found the following:

In collector.py in coverage, the Collector class has a save_data method.

Of note is that this function returns False if no activity has happened on any of its tracers, but returns True otherwise.

Hypothesis, in core.py has a hijack_collector function in StateForActualGivenExecution which adds on top of it. Of note is that this function does not return anything - it ignores the bool value returned from coverage's Collector.save_data, and being Python, None is technically returned.

Unfortunately, this breaks coverage's Coverage.get_data function which relies on it here because this conditional statement never gets run:

        if self.collector.save_data(self.data):
            self._post_save_work()

Because self.collector.save_data, being overwritten by Hypothesis, doesn't return a bool, and instead returns None.

I'm not sure what the best fix for hypothesis is, but a reasonable guess in hijack_collector would be:

    def hijack_collector(self, collector):  # pragma: no cover
        self.__existing_collector = collector
        original_save_data = collector.save_data

        def save_data(covdata):
            data_saved = original_save_data(covdata) # Save whether there was any activity
            if data_saved: # Only write hypothesis data if data was saved in the first place?
                if collector.branch:
                    covdata.add_arcs({
                        filename: {
                            arc: None
                            for arc in self.coverage_data.arcs(filename)}
                        for filename in self.files_to_propagate
                    })
                else:
                    covdata.add_lines({
                        filename: {
                            line: None
                            for line in self.coverage_data.lines(filename)}
                        for filename in self.files_to_propagate
                    })
            collector.save_data = original_save_data
            return data_saved # Make sure to return the correct boolean.

        collector.save_data = save_data

However, given my time investment up to this point, I'll leave it to you all to figure out the best way to fix this.

Thanks! Let me know if you all have any questions.

@Zac-HD Zac-HD added the bug something is clearly wrong here label Jan 24, 2018
@Zac-HD
Copy link
Member

Zac-HD commented Jan 24, 2018

Thanks for letting us know - this is a great bug report. 🎉

Based on a quick skim, any fix will require careful thought and testing to avoid breaking something else - there are some stateful parts in the way it clears saved data that could cause trouble.

If you could provide a small example of code+tests with this problem, that would make debugging a lot easier and let us know whether a fix actually resolves the issue 😄

@theaeolianmachine
Copy link
Author

No worries, feel free to fork this into HypothesisWorks!

https://github.com/theaeolianmachine/hypothesis_1085_repro/

@DRMacIver
Copy link
Member

Thanks for the great bug report seconded!

I'll try to look at this in the next week or so. I think your suggested fix looks good, but the more I look at the original code the more I'm suspicious of what it's doing (I mean beyond the basics that it's an obvious awful hack), so this is going to require some care.

@theaeolianmachine
Copy link
Author

Of course! Yeah I can't properly explain what my face looked like when I was looking at a function in coverage that clearly returns True or False, and instead it was actually returning None...and later pdb showed me I was in a function that started with "hijack" haha.

@DRMacIver
Copy link
Member

DRMacIver commented Jan 24, 2018

Yeah I can't properly explain what my face looked like when I was looking at a function in coverage that clearly returns True or False, and instead it was actually returning None...and later pdb showed me I was in a function that started with "hijack" haha.

😳 sorry. There's not really a good way to do what Hypothesis needs to do there right now, so I kinda just rammed it in. I have some open issues and an IOU to Ned to try and sort out a better API for this.

@theaeolianmachine
Copy link
Author

Haha no worries at all, I definitely laughed, and I absolutely get working with what you have - if anything, I appreciate the honesty as it made more sense when debugging it as to where I was! Definitely appreciate the time and investment you all put into the library.

@Zac-HD
Copy link
Member

Zac-HD commented Jul 15, 2018

The upstream issues for a better API are nedbat/coveragepy#603 and nedbat/coveragepy#604 - we're currently using internals, which is bad in principle and also harder in practice than a public API.
That's the proper fix for the ecosystem, but it would be nice to patch this sooner.


Therefore, a good first contribution would be to ensure that the return value from original_save_data(covdata) is returned from the calling function too, here:

def hijack_collector(self, collector): # pragma: no cover
self.__existing_collector = collector
original_save_data = collector.save_data
def save_data(covdata):
original_save_data(covdata)
if collector.branch:
covdata.add_arcs({
filename: {
arc: None
for arc in self.coverage_data.arcs(filename) or ()}
for filename in self.files_to_propagate
})
else:
covdata.add_lines({
filename: {
line: None
for line in self.coverage_data.lines(filename) or ()}
for filename in self.files_to_propagate
})
collector.save_data = original_save_data
collector.save_data = save_data

(and pssst, @DRMacIver, any idea why the new save_data function replaces itself with the original version the first time it's called?)

@garyd203
Copy link
Contributor

I have added a PR #1538 that has a test case for the coverage collector mistake, and also implements the simple solution proposed by @theaeolianmachine and @Zac-HD .

Do we want to leave this issue open pending a more complete refactor?

@Zac-HD
Copy link
Member

Zac-HD commented Aug 28, 2018

With the basic problem fixed and a regression test in place I'm happy to close this 😄

garyd203 added a commit to garyd203/hypothesis that referenced this issue Aug 28, 2018
Zac-HD pushed a commit to garyd203/hypothesis that referenced this issue Aug 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is clearly wrong here
Projects
None yet
Development

No branches or pull requests

4 participants