Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chimeric alignment scores inconsistently checked #723

Closed
mengxiao opened this issue Aug 26, 2019 · 3 comments
Closed

Chimeric alignment scores inconsistently checked #723

mengxiao opened this issue Aug 26, 2019 · 3 comments
Labels
issue: code Likely to be an issue with STAR code resolved problem or issue that has been resolved

Comments

@mengxiao
Copy link
Contributor

mengxiao commented Aug 26, 2019

In STAR 2.7.2a, I believe there's a bug in chimeric alignment multimapper detection wherein the scores of candidate alignments are not consistently checked against all criteria.

For example, I observe the following line in Chimeric.out.junction:

6     91012516        +       2       89865676        +       2       0       0       NB501164:392:HHK3HBGX2:1:11106:2376:12280       91012498        18M20S  89865677        18S20M5372p10S28M       1       76      41      52      52      0

The default value of 20 was kept for --chimScoreDropMax, so the minimum acceptable chimeric score alignment is 76 - 20 = 56, but this alignment is actually scored at 52.

This issue arises during the looping behavior to identify the best possible chimeric alignment. Specifically, at ChimericDetection_chimericDetectionMult.cpp#L67, 4 criteria are checked:

  1. score exceeds that of best non-chimeric alignment
  2. score is within allowable range of best possible alignment score (i.e. read length)
  3. score exceeds minimum allowable chimeric score
  4. score is acceptably close to best chimeric alignment score

However, after stitching, which may reduce the score, only condition 1 is checked. If the alignment score after stitching exceeds the current high water mark but nonetheless fails conditions 2 or 3, this causes later code, which checks only condition 4 in counting chimeric multimapping alignments or outputting chimeric junctions, to potentially produce alignments that should be discarded based on the parameters.

I have attempted a fix for this in #722, but I'm not sure I fully understand the intended behavior. @brianjohnhaas would you be willing to chime in? Please note that this PR includes the change I suggested in #721, which addresses a memory leak but should not change outputs.

Rerunning with the suggested patch causes the line reported above to no longer appear in Chimeric.out.junction. Most of the outputs are identical, a non-trivial number of alignments are now removed, and a small number of additional alignments are output in situations where with consistent filtering, the cap on the number of multimappers is not exceeded.

Thanks!

@brianjohnhaas
Copy link
Contributor

Excellent find and fix!

@alexdobin alexdobin added the resolved problem or issue that has been resolved label Aug 29, 2019
@alexdobin
Copy link
Owner

Thanks a lot, Meng Xiao.
Release 2.7.2b is out.

Cheers
Alex

@mengxiao
Copy link
Contributor Author

Thanks, Alex!

@BiocondaBot autobump star

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue: code Likely to be an issue with STAR code resolved problem or issue that has been resolved
Projects
None yet
Development

No branches or pull requests

3 participants