De-increment nincluded/nreported after merge #47

zdk123 · 2023-07-11T16:30:35Z

Stub of a fix for #46. It would probably be better if there was a true fix in libhmmer.

Does this look safe to do? I can write a formal test if so.

codecov · 2023-07-11T16:34:49Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.12% 🎉

Comparison is base (43e7d76) 76.52% compared to head (0d85341) 76.65%.
Report is 2 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #47      +/-   ##
==========================================
+ Coverage   76.52%   76.65%   +0.12%     
==========================================
  Files           7        7              
  Lines        6965     6968       +3     
==========================================
+ Hits         5330     5341      +11     
+ Misses       1635     1627       -8

Flag	Coverage Δ
v3.10	`76.65% <100.00%> (+0.12%)`	⬆️
v3.11	`76.65% <100.00%> (+0.12%)`	⬆️
v3.7	`76.62% <100.00%> (+0.12%)`	⬆️
v3.8	`76.65% <100.00%> (+0.12%)`	⬆️
v3.9	`76.65% <100.00%> (+0.12%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
pyhmmer/plan7.pyx	`73.87% <100.00%> (+0.26%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

althonos · 2023-07-17T15:32:11Z

You can actually set nincluded and nreported to zero for all hits before calling p7_tophits_Threshold, I think the threshold will set the counts to the right number on its own.

zdk123 · 2023-07-18T21:29:07Z

Close this in favor of linking to the fixed hmmer library?

althonos · 2023-07-18T22:34:26Z

Nah, until a new numbered release of HMMER3 I'm not gonna bump the local version, so a temporary patch is welcome. It would be great If you can update the code as suggested above, and add a reminder to remove the patch later, similar to:

pyhmmer/pyhmmer/plan7.pyx

Lines 7881 to 7882 in e8f70f8

    
           # TODO(@althonos): Replace with `p7_tophits_Clone` as implemented 
        
           #                  in EddyRivasLab/hmmer#273 when formally released.

Thanks 🙏

zdk123 · 2023-07-20T16:09:47Z

@althonos Follow on bug and a question for you.

I am attempting to write a test but realize that users don't have read access to tophits._th.hit - this only gets exposed by the internal hmmer domain table writer.

I thought instead to expose nincluded/nreported for all the hits in the state dictionary.

This okay to add here?

        for i in range(self._th.N):
            offset = (<ptrdiff_t> self._th.hit[i] - <ptrdiff_t> &self._th.unsrt[0]) // sizeof(P7_HIT)
            hits.append(offset)
            hits_nreported.append(self._th.hit[i].nreported)
            hits_nincluded.append(self._th.hit[i].nincluded)

I've done this locally in a branch and found that, actually, to my surprise that nreported/nincluded might not being set for each hit after re-serializing in TopHits.__setstate__. I think we need the following addition here:

        for i, offset in enumerate(state["hit"]):
            self._th.hit[i] = &self._th.unsrt[offset]
            self._th.hit[i].nreported = state["hits_nreported"][i]
            self._th.hit[i].nincluded = state["hits_nincluded"][i]

Is this desirable to have or do you want to have another way to return/set this data other than the state dictionary?

althonos · 2023-07-21T21:14:06Z

You can get the nreported and nincluded using len(hit.domains.included) and len(hit.domains.reported) because I made these properties return a sized iterator ☺️ These numbers should be read-only because the TopHits instances are immutable on the Python side, ignoring the option to sort the hits.

That's indeed a problem if the attributes are not reset after a __setstate__.

althonos · 2023-07-22T18:59:40Z

I've done this locally in a branch and found that, actually, to my surprise that nreported/nincluded might not being set for each hit after re-serializing in TopHits.setstate. I think we need the following addition here:

Actually that should be handle by p7_hit_Deserialize for each hit I think! So no need to do it in the Cython code.

zdk123 · 2023-07-29T01:36:41Z

Looks like my updated test caught another bug on the number of included domains, when merging with an empty TopHit object (and maybe more generally?)

from my branch:

from pyhmmer.plan7 import HMMFile, Pipeline, TopHits
from pyhmmer.easel import SequenceFile

thioesterase = HMMFile("pyhmmer/tests/data/hmms/db/Thioesterase.hmm").read()

with SequenceFile('pyhmmer/tests/data/seqs/938293.PRJEB85.HG003687.faa', digital=True) as seqfile:
    proteins = seqfile.read_block()

pli = Pipeline(thioesterase.alphabet)

hits = pli.search_hmm(thioesterase, proteins)

[len(hit.domains.reported) for hit in hits]
# 1
[len(hit.domains.included) for hit in hits]
# 0

empty = TopHits()
empty_merged = empty.merge(hits)
[len(hit.domains.reported) for hit in empty_merged]
# 1
[len(hit.domains.included) for hit in empty_merged]
# 1

It looks like like empty TopHit gets initialized with inclusion thresholds of domain score with something other than the Pipeline defaults:

empty.incdomT
# 0.0
empty_merged.incdomT
# 0.0
hits.incdomT
# None

I understand this is extremely edge case and probably could just remove the test, but noting it here because it looks like merging TopHit objects with different inclusion/reporting cutoffs isn't being explicitly handled and maybe that should raise an Error?

althonos · 2023-07-29T18:27:47Z

Good catch! Merging TopHits with different parameters should indeed be disallowed. But I'm thinking how to fix this when TopHits are created from the Python constructor, I can't think of a simple solution...

zdk123 · 2023-07-31T14:35:23Z

pyhmmer/tests/test_plan7/test_tophits.py

@@ -71,6 +71,8 @@ def assertHitEqual(self, h1, h2):
                "attribute {!r} differs".format(attr)
            )
        self.assertEqual(len(h1.domains), len(h2.domains))
+        self.assertEqual(len(h1.domains.reported), len(h2.domains.reported))
+#        self.assertEqual(len(h1.domains.included), len(h2.domains.included))


tests start failing if I uncomment this line

Actually again an issue with HMMER, in p7_tophits_Threshold:

if (p7_pli_DomainReportable(pli, th->hit[h]->dcl[d].bitscore, th->hit[h]->dcl[d].lnP)) th->hit[h]->dcl[d].is_reported = TRUE; if ((th->hit[h]->flags & p7_IS_INCLUDED) && p7_pli_DomainIncludable(pli, th->hit[h]->dcl[d].bitscore, th->hit[h]->dcl[d].lnP)) th->hit[h]->dcl[d].is_included = TRUE;

means that if a domain was previously included/reported, calling p7_tophits_Threshold again with a different threshold will not exclude it, it can only include/report new domains. Changing it to:

th->hit[h]->dcl[d].is_reported = p7_pli_DomainReportable(pli, th->hit[h]->dcl[d].bitscore, th->hit[h]->dcl[d].lnP); th->hit[h]->dcl[d].is_included = (th->hit[h]->flags & p7_IS_INCLUDED) && p7_pli_DomainIncludable(pli, th->hit[h]->dcl[d].bitscore, th->hit[h]->dcl[d].lnP);

should fix it :)

althonos · 2023-08-02T07:47:40Z

Okay, thanks a lot 👍
I'm gonna merge as-is, then try to figure out a way to fix the second issue you raised :)

add hits nreported/nincluded to TopHits state data

b4a72d0

zdk123 force-pushed the patch-domain-counts branch from 339f3af to b4a72d0 Compare July 20, 2023 16:14

Zachary Kurtz added 5 commits July 28, 2023 19:21

rollback state setter

7f08a96

add comment

5ae4673

add merge test

8463c75

one more rollback

3daf0c3

update tests

0d85341

zdk123 commented Jul 31, 2023

View reviewed changes

althonos merged commit 60776bc into althonos:master Aug 2, 2023
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

De-increment nincluded/nreported after merge #47

De-increment nincluded/nreported after merge #47

zdk123 commented Jul 11, 2023

codecov bot commented Jul 11, 2023 •

edited

althonos commented Jul 17, 2023

zdk123 commented Jul 18, 2023

althonos commented Jul 18, 2023

zdk123 commented Jul 20, 2023

althonos commented Jul 21, 2023 •

edited

althonos commented Jul 22, 2023

zdk123 commented Jul 29, 2023 •

edited

althonos commented Jul 29, 2023

zdk123 Jul 31, 2023

althonos Aug 2, 2023

althonos commented Aug 2, 2023

De-increment nincluded/nreported after merge #47

De-increment nincluded/nreported after merge #47

Conversation

zdk123 commented Jul 11, 2023

codecov bot commented Jul 11, 2023 • edited

Codecov Report

althonos commented Jul 17, 2023

zdk123 commented Jul 18, 2023

althonos commented Jul 18, 2023

zdk123 commented Jul 20, 2023

althonos commented Jul 21, 2023 • edited

althonos commented Jul 22, 2023

zdk123 commented Jul 29, 2023 • edited

althonos commented Jul 29, 2023

zdk123 Jul 31, 2023

Choose a reason for hiding this comment

althonos Aug 2, 2023

Choose a reason for hiding this comment

althonos commented Aug 2, 2023

codecov bot commented Jul 11, 2023 •

edited

althonos commented Jul 21, 2023 •

edited

zdk123 commented Jul 29, 2023 •

edited