New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to avoid coverage drift? #118

Closed
kbaum opened this Issue Sep 17, 2018 · 7 comments

Comments

Projects
None yet
2 participants
@kbaum
Collaborator

kbaum commented Sep 17, 2018

This is more of an idea than an issue. If we include the checksum as the key for each file within the redis store, we would remove the need for the user clearing the store. For example:

Digest::MD5.file('app/models/user.rb').hexdigest

This would also remove confusion around a files coverage not being accurate when the file has been changed but the store has not been cleared.

In order to avoid stale files hanging out forever within redis, we could also just supply a high default ttl and the stale files should automatically be cleaned up by redis.

@danmayer

This comment has been minimized.

Owner

danmayer commented Sep 18, 2018

interesting idea, we do have a ttl option but it defaults to forever... so something like 1 year as a high default would work...

I like the idea, I had been considering namespacing by deployed git hash for a while, but this in some ways is better as it doesn't impact files that weren't changed between deployments.

@danmayer danmayer changed the title from Include checksum as part of key for each file in store to How to avoid coverage drift? Nov 3, 2018

@danmayer

This comment has been minimized.

Owner

danmayer commented Nov 3, 2018

updated title from "Include checksum as part of key for each file in store"

@danmayer

This comment has been minimized.

Owner

danmayer commented Nov 3, 2018

I rephrased this as I think there are a couple different things that could be helpful.

One thing was that clearing Coverband coverage can be confusing as people don't know or trust how long the data has been there... The most often reason for clearing is because of the coverage drift where line numbers don't match the captured data anymore.

While the checksum solution solves the drift issue, it does cause a slightly different trust issue. Basically, that a frequently changed file might no have a very long history showing usage... So it could look like it isn't used if it was just part of a not often run cron job or something like that.

I guess in a perfect world we would really have line number level MD5 and only throw out coverage when the actual line has moved / changed. This is not realistic.

Ideas to help with understanding the time range that the coverage data is valid for, and to avoid the drift.

  • store some metadata like when the file first started to record coverage data...
  • tie this with the MD5 idea

update the UI so for any file you would have something like

app/controllers/users_controller.rb version XYZ first seen on 09-15-2018 08:05am, last coverage updated on 10-03-2018 08:05am

In theory, one could actually users see all versions of a file and navigate back and forth... seeing coverage change over file history... over being able to generate a coverage report based on time range... Show me coverage data between date X and Y.

I had been considering similar ideas based on git hashes at time of deploy... so you could query coverage based on related git versions or release time... but I do think the MD5 version has advantages as it wouldn't be tied to any code repository or deployment methods.

The benchmarks you shared @kbaum make it seem reasonably fast... I think when first introduced it still probably makes sense to do as an optional feature.

Related: I believe one still might want to be able to clear all coverage data, as changing routes or controllers could mean downstream models no longer receive coverage... That being said if we add the metadata about coverage time alongside the MD5 hash, this is where coverage queries over time could be very powerful... Never clear, but query the time of interest and know that MD5 shows only hits on most recent versions of files.

@kbaum

This comment has been minimized.

Collaborator

kbaum commented Nov 4, 2018

Love the idea of tying the md5 change with updating the UI to display "first seen" and "coverage last updated". Showing previous versions of a file would also be a great feature though not a must have for the first release.

Seeing coverage data over time could be useful eventually but not sure it's worth the complexity it introduces. I would imagine the size of the data stored in redis would greatly increase since we would need to store snapshots of data.

I agree that one might want to clear coverage data but with the features discussed hear, it feels like the exception.

@danmayer

This comment has been minimized.

Owner

danmayer commented Nov 4, 2018

Agreed on avoiding complexity at first, I think MD5 and metadata but only showing the current version would be a good first version of this. Showing old versions introduces some complexity on the code viewing side of the report so that might not ever be worth adding.

@kbaum

This comment has been minimized.

Collaborator

kbaum commented Nov 15, 2018

@danmayer guess we can close this one?

@danmayer

This comment has been minimized.

Owner

danmayer commented Nov 16, 2018

yeah just added the notes about putting meta data in the views later into the future changes file...

@danmayer danmayer closed this Nov 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment