Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omitting slash mates (/1 or /2) in read names before QNAME truncating #265

Closed
ronin-gw opened this issue Sep 24, 2019 · 3 comments
Closed

Comments

@ronin-gw
Copy link

In printReadName, read names are truncated by following order:

  1. Omit slash mate (/1 or /2) from the end of a read name
  2. Keep first 255 characters
  3. Ignore characters after the first whitespace

However, read names in some fastq files (maybe following the old illumina format) can have both slash mates and additional strings. For example, these are the heads of the paired-end fastq files I got from the ENCODE.

@BI:SL-HEA:C3BFGACXX:2:2316:10503:45861/1 1:X:0:CATAGCGA
TAGGGTTAGGGTTAGGGTTAGGGTT
+
@@@FFADDFHHCFHJJJCGDHIJGG
@BI:SL-HEA:C3BFGACXX:2:2316:10503:45861/2 1:X:0:CATAGCGA
TAACCCTAACCCTAACCCTAACCCT
+
CCCFFFFFHHHHHJJIJIIJJJJII

So I think "slash mates" should be omitted after the other truncations to match read names as pairs.

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Nov 27, 2019

Good catch. I'll have this fix committed soon and will include this change in the next release.

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Nov 28, 2019

I think this recent commit should address this issue. Here's the output using the example reads you provided:

BI:SL-HEA:C3BFGACXX:2:2316:10503:45861  77      *       0       0       *       *       0       0       TAGGGTTAGGGTTAGGGTTAGGGTT       @@@FFADDFHHCFHJJJCGDHIJGG       YT:Z:UP
BI:SL-HEA:C3BFGACXX:2:2316:10503:45861  141     *       0       0       *       *       0       0       TAACCCTAACCCTAACCCTAACCCT       CCCFFFFFHHHHHJJIJIIJJJJII       YT:Z:UP

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Apr 28, 2020

This change is available in v2.4.0 and later.

@ch4rr0 ch4rr0 closed this as completed Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants