Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filterRNAstrand processe the bad strand #376

Closed
lelouar opened this issue Jun 20, 2016 · 7 comments
Closed

filterRNAstrand processe the bad strand #376

lelouar opened this issue Jun 20, 2016 · 7 comments

Comments

@lelouar
Copy link

lelouar commented Jun 20, 2016

Hi,

I tried to use bamCoverage for single end RNA-seq data and test the --filterRNAstrand option but the results are inverted as what I expected. I plot this picture to show you the behavior :

selection_054

What is wrong ? Is it a mistake of me ?

bamCoverage --version : 2.2.4

@dpryan79
Copy link
Collaborator

That looks correct. Read #1 is antisense in 99.99% of currently produced libraries. So, if you exclude alignments with bit 16 set, then you're getting only reads arising from the - strand, which is why the track is identical to that produced by --filterRNAstrand reverse. Yes, this looks backward, but it's correct (and also one of the single most confusing things in bioinformatics).

@lelouar
Copy link
Author

lelouar commented Jun 21, 2016

Sorry but flag 16 is read mapped on reverse strand (-) so if you exclude 16 you are in the forward strand (+). I read an other time the doc of samtools to be sure that I am not confusing and also the readthedoc of DeepTools for bamCoverage and indeed you write that --samFlagExclude 16 = forward strand and --samFlagInclude 16 = reverse one.
Thx

@dpryan79
Copy link
Collaborator

A flag of 16 only means that a read is reverse complemented. In RNAseq, that happens for read #1 when it arises from a + strand fragment. This is due to how library prep. works. This is also why you have to set -s reverse with current RNAseq datasets if you use htseq-count.

Further, the results visibly agree with the genes. Your reverse (-) strand gene visibly corresponds to the reverse track, and the forward strand gene to the forward track.

@lelouar
Copy link
Author

lelouar commented Jun 21, 2016

It totally depend on the RNA protocol of library preparation. There is currently 2 different protocol. The picture from my first post is a reverse strand protocol meaning that read mapped in the forward strand came from genes positioned in the minus strand (and the filterRNAstrand give you, in this case, the "good result" but not what you expected when you take in account the protocol). I add a new picture made with the other protocol of library preparation (read and transcript came from the same strand) :

selection_055

In this case the result is not what we expected.

For the htseq-count software you have to take in account of the protocol of library preparation to be sure that you count the "right" reads and use the good parameters.

So, if you want to keep read on the + strand when you use --filterRNAstrand forward you do a mistake, if you want to filter/remove such reads it's only a misunderstanding of me.

@dpryan79
Copy link
Collaborator

As the overwhelming majority of produced libraries are dUTP-based, that's what the --filterRNAstrand option assumes. This will not be changed and people will need to simply understand their data.

@vivekbhr
Copy link
Member

That's right the filterRNAstrand method is based on standard Illumina protocol (R2 in strand orientation), while other protocols also exist. we should probably mention this in the documentation.

@lelouar
Copy link
Author

lelouar commented Jun 21, 2016

ok, sorry. I agree with vivekbhr that you should at least mention this in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants