Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename SetNmAndUqTags and fix #622 (MergeBamAlignment MD tag). #636

Merged
merged 1 commit into from Sep 19, 2016

Conversation

nh13
Copy link
Collaborator

@nh13 nh13 commented Aug 20, 2016

@yfarjoun @tfenne

  1. Rename SetNmAndUqTags to SetNmMDAndUqTags.
    WIP
  2. Fixes: Update the NM and MD tags when clipping overlapping inserts in #622

Fixes #634 as an alternate to reverting (#635).

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 69.24% when pulling 9d2018a on nh_update_md_and_nm into f4fcb3e on master.

final boolean calculateMd = rec.getStringAttribute(SAMTag.MD.name()) != null;
if (calculateNm || calculateMd) {
final byte[] referenceBases = this.refSeq.get(this.refSeq.getSequenceDictionary().getSequenceIndex(rec.getReferenceName())).getBases();
SequenceUtil.calculateMdAndNmTags(rec, referenceBases, calculateNm, calculateMd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this code that you removed calculateNm, calculateMd were reversed!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it. I really like scala where we can have named params. Sigh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. especially important for booleans. but all params could benefit. not sure why java don't have it too....

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 69.289% when pulling 6463c53 on nh_update_md_and_nm into f4fcb3e on master.

@yfarjoun
Copy link
Contributor

if I understand correctly, this reverts the fixing of tags when the cigar changes, but includes MD in the fixTags CLP. Should we purge MD NM and UQ when we change the cigar? At-least we will not be leaving around incorrect information....

@nh13
Copy link
Collaborator Author

nh13 commented Aug 20, 2016

@yfarjoun: If downstream users are impacted (i.e. they want MD tags but no longer have them), then they have been relying on potentially the wrong values. I vote for removing NM/MD/UQ when we clip overlapping reads.

Also, should we do the same for clipping adapters (see updateCigarForTrimmedOrClippedBases and this.clipAdapters)?

public class SetNmAndUqTags extends CommandLineProgram {
static final String USAGE_SUMMARY = "Fixes the UQ and NM tags in a SAM file. ";
static final String USAGE_DETAILS = "This tool takes in a SAM or BAM file (sorted by coordinate) and calculates the NM and UQ tags by comparing with the reference."+
public class SetNmMDAndUqTags extends CommandLineProgram {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to drive me crazy that it's Nm, Uq and MD in the name. Can you change to Md please?

@tfenne
Copy link
Collaborator

tfenne commented Aug 20, 2016

@nh13 @yfarjoun Agreed that MergeBamAlignment should probably remove tags rather than output invalid values. The other option is to add back @nh13's change but have it use either the indexed fasta file (potentially loading the same region many times) or load the entire reference into memory (requiring more memory).

@yfarjoun
Copy link
Contributor

that would require a new type of ReferenceFile reader, no?

On Sat, Aug 20, 2016 at 5:24 AM, Tim Fennell notifications@github.com
wrote:

@nh13 https://github.com/nh13 @yfarjoun https://github.com/yfarjoun
Agreed that MergeBamAlignment should probably remove tags rather than
output invalid values. The other option is to add back @nh13
https://github.com/nh13's change but have it use either the indexed
fasta file (potentially loading the same region many times) or load the
entire reference into memory (requiring more memory).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#636 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACnk0ptUenR033on5rmpL9wnQw8ju-nWks5qhsfZgaJpZM4Jo-Dm
.

@nh13
Copy link
Collaborator Author

nh13 commented Aug 23, 2016

@yfarjoun I think I will remove the tags, and in the case folks output queryname, the old behavior stands. I will add some documentation somewhere about this too.

@nh13 nh13 self-assigned this Aug 23, 2016
@yfarjoun
Copy link
Contributor

but even if outputting in queryname the tags will be incorrect...so why not
remove them from the reads that are clipped?

On Tue, Aug 23, 2016 at 2:34 PM, Nils Homer notifications@github.com
wrote:

@yfarjoun https://github.com/yfarjoun I think I will remove the tags,
and in the case folks output queryname, the old behavior stands. I will add
some documentation somewhere about this too.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#636 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACnk0kp__OdulpB9mvhUqpjn68QZJkFzks5qiz0agaJpZM4Jo-Dm
.

@nh13
Copy link
Collaborator Author

nh13 commented Aug 26, 2016

@yfarjoun see my latest changes. I remove the NM/MD/UQ tags for reads that are clipped due to adapters or overlapping bases. I also added some doc to MergeBamAlignment.

@nh13 nh13 assigned yfarjoun and unassigned nh13 Aug 26, 2016
@@ -477,7 +476,7 @@ public void mergeAlignment(final File referenceFasta) {

for (final SAMRecord rec : sink.sorter) {
if (!rec.getReadUnmappedFlag() && refSeq != null) {
fixNMandUQ(rec, refSeq, bisulfiteSequence);
fixNmMdAndUQ(rec, refSeq, bisulfiteSequence);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UQ -> Uq

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@yfarjoun
Copy link
Contributor

can I haz A small test that shows that the tags are actually removed?

@yfarjoun
Copy link
Contributor

back to you @nh13

@yfarjoun
Copy link
Contributor

why assigning me here? this has some work to be done:

  • tests are not passing
  • need to add a small test
  • add a deprecated wrapper with the old name?

Thanks! :-)

@yfarjoun yfarjoun removed their assignment Aug 29, 2016
@nh13
Copy link
Collaborator Author

nh13 commented Aug 29, 2016

@yfarjoun you need to un-assign yourself even if you assign someone else. Multiple folks can be assigned an issue now.

@nh13
Copy link
Collaborator Author

nh13 commented Sep 6, 2016

@yfarjoun added the test and rebased onto master. After a final 👍 this is ready to go.

@nh13 nh13 assigned yfarjoun and unassigned nh13 Sep 6, 2016
@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 69.394% when pulling 57c0ac1 on nh_update_md_and_nm into 302da16 on master.


private final Log log = Log.getInstance(SetNmMdAndUqTags.class);

public static void main(final String[] argv) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this main still needed with the new system?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but I am going to leave it here as other classes have it; cleaning this up should be in all classes could be in another PR.


final ReferenceSequenceFileWalker refSeq = new ReferenceSequenceFileWalker(REFERENCE_SEQUENCE);

StreamSupport.stream(reader.spliterator(),false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space missing here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aligning the line, and after comma.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@yfarjoun
Copy link
Contributor

sorry for the delay. thanks!! 👍

1. Rename SetNmAndUqTags to SetNmMDAndUqTags.
2. Fixes: #622
@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 70.007% when pulling 0ea9d4c on nh_update_md_and_nm into e7d6c8f on master.

@nh13 nh13 merged commit 915ffa7 into master Sep 19, 2016
@nh13 nh13 deleted the nh_update_md_and_nm branch September 19, 2016 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants