Fixes for two critical bugs in KOfam annotation step #95
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses the bugs described in issue #94.
To ensure the correct bit score threshold is used for distinguishing between strong and weak hits in the
hmmer_filter()
function, we've moved the code for obtaining the current model's threshold to within the loop that iterates over each hit from the HMMER results. To ensure that a numerical comparison is used to identify the best match to a given gene in thebest_match_selector()
function, we've added explicit float conversions to the bit scores loaded from the intermediate file of HMM hits.To confirm that these fixes resolve the bugs, we ran the code from this branch on our test genome from issue #94, Bradyrhizobium manausense BR3351 (NCBI RefSeq GCF_001440035.1). We analyzed the results with the scripts (also provided in issue #94) that count the number of false positives (from bug 1) and incorrect 'best matches' (from bug 2), respectively. In both cases, the number of errors with the fixed code was 0.