Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNA-Seq by Example, error in "5. Feature counting in RNA-Seq" #316

Closed
BioinfGuru opened this issue Jan 5, 2024 · 4 comments
Closed

RNA-Seq by Example, error in "5. Feature counting in RNA-Seq" #316

BioinfGuru opened this issue Jan 5, 2024 · 4 comments

Comments

@BioinfGuru
Copy link

Hi,

First, apologies if this issue has already been raised, I haven't found it.

In the section "How to count features", the commands should include --countReadPairs.

When I ran the commands as shown in the book, all counts are approximately double those shown in the example results image.

So the following command:
cat ids.txt | parallel -j 1 echo "bam/{}.bam" | xargs featureCounts -p -a refs/features.gff -o counts.txt

Should be:
cat ids.txt | parallel -j 1 echo "bam/{}.bam" | xargs featureCounts -p --countReadPairs -a refs/features.gff -o counts.txt

I don't understand enough about featureCounts. Could anyone explain why --countReadPairs has this effect?

Thanks for the great book

Regards,
Bioinfguru

@ialbert
Copy link
Member

ialbert commented Jan 5, 2024

Correct.

Unfortunately, what is going on here is that the featureCount program has changed its behavior via an ill-informed decision.

Starting with a specific version, one needs to pass both the -p and --countReadPairs flags, but before that version, only -p was needed for the same behavior, and the presence of the second flag raised an error.

Even as of last year the installation process installed the older version of featureCounts so I chose the first form. Now it seems the updated version gets installed so the second, two flag form will be required.

I will test the various installation methods and will make the change soon.

@ialbert
Copy link
Member

ialbert commented Jan 5, 2024

I made the changes in the book, thanks for reporting and reminding me to make this change.

Nice job noticing it.

You are well on your way to bioinfoguru-ness, the bioinformatics world is full of inconsistencies!

@ialbert ialbert closed this as completed Jan 5, 2024
@ialbert
Copy link
Member

ialbert commented Jan 5, 2024

I just realized that I did not explain the effect itself,

when we run a single-end sequencing each read corresponds to a transcript fragment

when we run a paired-end sequencing each transcript fragment produces two reads.

Note that in the second case two measurements come from a single fragment. Hence during paired-end sequencing at the same coverage, only half as many independent transcripts will be sampled. This is the reason why the counts are half as much.

This is to say that paired-end sequencing is disadvantageous in any situation where we are counting reads since we lose half the data. We might gain more mapping accuracy - though that is debatable - but the net effect is losing half the coverage and we lose a lot of statistical power. So in general paired-end RNA-Seq is not advisable.

the only time it paired-end RNA-Seq makes sense is when we are assembling transcripts, for all other cases it leads to coverage loss

@BioinfGuru
Copy link
Author

Thanks for that, cheers.

So for well annotated genomes, there's really no need for paired-end, especially if the goal is differential expression. Makes perfect sense. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants