RNA-Seq by Example, error in "5. Feature counting in RNA-Seq" #316

BioinfGuru · 2024-01-05T17:06:51Z

Hi,

First, apologies if this issue has already been raised, I haven't found it.

In the section "How to count features", the commands should include --countReadPairs.

When I ran the commands as shown in the book, all counts are approximately double those shown in the example results image.

So the following command:
cat ids.txt | parallel -j 1 echo "bam/{}.bam" | xargs featureCounts -p -a refs/features.gff -o counts.txt

Should be:
cat ids.txt | parallel -j 1 echo "bam/{}.bam" | xargs featureCounts -p --countReadPairs -a refs/features.gff -o counts.txt

I don't understand enough about featureCounts. Could anyone explain why --countReadPairs has this effect?

Thanks for the great book

Regards,
Bioinfguru

The text was updated successfully, but these errors were encountered:

ialbert · 2024-01-05T17:37:36Z

Correct.

Unfortunately, what is going on here is that the featureCount program has changed its behavior via an ill-informed decision.

Starting with a specific version, one needs to pass both the -p and --countReadPairs flags, but before that version, only -p was needed for the same behavior, and the presence of the second flag raised an error.

Even as of last year the installation process installed the older version of featureCounts so I chose the first form. Now it seems the updated version gets installed so the second, two flag form will be required.

I will test the various installation methods and will make the change soon.

ialbert · 2024-01-05T17:54:07Z

I made the changes in the book, thanks for reporting and reminding me to make this change.

Nice job noticing it.

You are well on your way to bioinfoguru-ness, the bioinformatics world is full of inconsistencies!

ialbert · 2024-01-05T18:04:19Z

I just realized that I did not explain the effect itself,

when we run a single-end sequencing each read corresponds to a transcript fragment

when we run a paired-end sequencing each transcript fragment produces two reads.

Note that in the second case two measurements come from a single fragment. Hence during paired-end sequencing at the same coverage, only half as many independent transcripts will be sampled. This is the reason why the counts are half as much.

This is to say that paired-end sequencing is disadvantageous in any situation where we are counting reads since we lose half the data. We might gain more mapping accuracy - though that is debatable - but the net effect is losing half the coverage and we lose a lot of statistical power. So in general paired-end RNA-Seq is not advisable.

the only time it paired-end RNA-Seq makes sense is when we are assembling transcripts, for all other cases it leads to coverage loss

BioinfGuru · 2024-01-20T12:59:48Z

Thanks for that, cheers.

So for well annotated genomes, there's really no need for paired-end, especially if the goal is differential expression. Makes perfect sense. Thanks.

ialbert closed this as completed Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNA-Seq by Example, error in "5. Feature counting in RNA-Seq" #316

RNA-Seq by Example, error in "5. Feature counting in RNA-Seq" #316

BioinfGuru commented Jan 5, 2024

ialbert commented Jan 5, 2024 •

edited

Loading

ialbert commented Jan 5, 2024

ialbert commented Jan 5, 2024 •

edited

Loading

BioinfGuru commented Jan 20, 2024

RNA-Seq by Example, error in "5. Feature counting in RNA-Seq" #316

RNA-Seq by Example, error in "5. Feature counting in RNA-Seq" #316

Comments

BioinfGuru commented Jan 5, 2024

ialbert commented Jan 5, 2024 • edited Loading

ialbert commented Jan 5, 2024

ialbert commented Jan 5, 2024 • edited Loading

BioinfGuru commented Jan 20, 2024

ialbert commented Jan 5, 2024 •

edited

Loading

ialbert commented Jan 5, 2024 •

edited

Loading