New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De-increment nincluded/nreported after merge #47
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #47 +/- ##
==========================================
+ Coverage 76.52% 76.65% +0.12%
==========================================
Files 7 7
Lines 6965 6968 +3
==========================================
+ Hits 5330 5341 +11
+ Misses 1635 1627 -8
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
You can actually set |
Close this in favor of linking to the fixed hmmer library? |
Nah, until a new numbered release of HMMER3 I'm not gonna bump the local version, so a temporary patch is welcome. It would be great If you can update the code as suggested above, and add a reminder to remove the patch later, similar to: Lines 7881 to 7882 in e8f70f8
Thanks 🙏 |
@althonos Follow on bug and a question for you. I am attempting to write a test but realize that users don't have read access to I thought instead to expose nincluded/nreported for all the hits in the state dictionary. This okay to add here? for i in range(self._th.N):
offset = (<ptrdiff_t> self._th.hit[i] - <ptrdiff_t> &self._th.unsrt[0]) // sizeof(P7_HIT)
hits.append(offset)
hits_nreported.append(self._th.hit[i].nreported)
hits_nincluded.append(self._th.hit[i].nincluded) I've done this locally in a branch and found that, actually, to my surprise that for i, offset in enumerate(state["hit"]):
self._th.hit[i] = &self._th.unsrt[offset]
self._th.hit[i].nreported = state["hits_nreported"][i]
self._th.hit[i].nincluded = state["hits_nincluded"][i] Is this desirable to have or do you want to have another way to return/set this data other than the state dictionary? |
339f3af
to
b4a72d0
Compare
You can get the That's indeed a problem if the attributes are not reset after a |
Actually that should be handle by |
Looks like my updated test caught another bug on the number of included domains, when merging with an empty TopHit object (and maybe more generally?) from my branch:
It looks like like empty TopHit gets initialized with inclusion thresholds of domain score with something other than the Pipeline defaults: empty.incdomT
# 0.0
empty_merged.incdomT
# 0.0
hits.incdomT
# None I understand this is extremely edge case and probably could just remove the test, but noting it here because it looks like merging TopHit objects with different inclusion/reporting cutoffs isn't being explicitly handled and maybe that should raise an Error? |
Good catch! Merging TopHits with different parameters should indeed be disallowed. But I'm thinking how to fix this when TopHits are created from the Python constructor, I can't think of a simple solution... |
@@ -71,6 +71,8 @@ def assertHitEqual(self, h1, h2): | |||
"attribute {!r} differs".format(attr) | |||
) | |||
self.assertEqual(len(h1.domains), len(h2.domains)) | |||
self.assertEqual(len(h1.domains.reported), len(h2.domains.reported)) | |||
# self.assertEqual(len(h1.domains.included), len(h2.domains.included)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests start failing if I uncomment this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually again an issue with HMMER, in p7_tophits_Threshold
:
if (p7_pli_DomainReportable(pli, th->hit[h]->dcl[d].bitscore, th->hit[h]->dcl[d].lnP))
th->hit[h]->dcl[d].is_reported = TRUE;
if ((th->hit[h]->flags & p7_IS_INCLUDED) &&
p7_pli_DomainIncludable(pli, th->hit[h]->dcl[d].bitscore, th->hit[h]->dcl[d].lnP))
th->hit[h]->dcl[d].is_included = TRUE;
means that if a domain was previously included/reported, calling p7_tophits_Threshold
again with a different threshold will not exclude it, it can only include/report new domains. Changing it to:
th->hit[h]->dcl[d].is_reported = p7_pli_DomainReportable(pli, th->hit[h]->dcl[d].bitscore, th->hit[h]->dcl[d].lnP);
th->hit[h]->dcl[d].is_included = (th->hit[h]->flags & p7_IS_INCLUDED) && p7_pli_DomainIncludable(pli, th->hit[h]->dcl[d].bitscore, th->hit[h]->dcl[d].lnP);
should fix it :)
Okay, thanks a lot 👍 |
Stub of a fix for #46. It would probably be better if there was a true fix in libhmmer.
Does this look safe to do? I can write a formal test if so.