counts of rows of fpkm_matrix inconsistent with that of counts_matrix #7

yusukesano46 · 2020-07-19T07:10:19Z

Hello,

I performed below command. Why counts of rows of "fpkm_matrix" inconsistent with that of "counts"?

================
library(countToFPKM)
counts <- read.delim("XXX.txt", header=T, sep="\t",row.names=1) #read counts were calculated by htseq-counts. annotation file was "gencode.v22.annotation.gtf"

gene.annotations <- read.table("featurelength.txt", sep="\t", header=TRUE) #featurelength were calculated by "GenomicFeatures". annotation file was "gencode.v22.annotation.gtf"
featureLength <- gene.annotations$featurelength

samples.metrics <- read.table("meanFragmentLength_adapter.txt", sep="\t", header=TRUE) #meanFragmentLength were calculated by Picard
meanFragmentLength <- samples.metrics$meanFragmentLength

fpkm_matrix <- fpkm (counts, featureLength, meanFragmentLength)

nrow(counts)
[1] 60483

nrow(gene.annotations)
[1] 60483

nrow(fpkm_matrix)
[1] 42954

=================
Why are these results ("nrow(counts)" and "nrow(fpkm_matrix)") not consistent?

AAlhendi1707 · 2020-07-19T08:22:37Z

Hi there

Thanks for reporting this issue.

For accurate quantification of FPKM of RNA-Seq data, the read counts need to be normalised by feature effective length Lee et al. 2011 paper. To compute the effective length, the meanFragmentLength will be deducted from the feature length. Thus, the features lengthened less than the meanFragmentLength will be automatically dropped off. In other word, you cannot calculate the fpkm for features smaller than the meanFragmentLength, and that is why your fpkm_matrix is shorter than counts.

To get stats about the genes that drop off due to featureLength < meanFragmentLength
Please try to use the latest version from Github

if(!require(devtools)) install.packages("devtools")
devtools::install_github("AAlhendi1707/countToFPKM", build_vignettes = TRUE)

Hope it helps!
A

yusukesano46 · 2020-07-21T00:26:32Z

Dear Ahmed Alhendi, Many thanks for your reply, this has now all been done for you. I understand. 2020年7月19日(日) 17:22 Ahmed Alhendi <notifications@github.com>:

…

Hi there Thanks for reporting this issue. For accurate quantification of FPKM of RNA-Seq data, the read counts need to be normalised by feature effective length Lee et al. 2011 paper <https://academic.oup.com/nar/article/39/2/e9/2409022>. To compute the effective length, the meanFragmentLength will be deducted from the feature length. Thus, the features lengthened less than the meanFragmentLength will be automatically dropped off. You cannot count the fpkm for features smaller than the meanFragmentLength, and that is why your fpkm_matrix is shorter than counts. I'll make sure that the verision 1.2 of countToFPKM will return summary of features that fpkm() cannot return the fpkm value due to meanFragmentLength < featureLength Hope it helps! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APF7FYNAGGPPFSGGW5Z3W23R4KUNTANCNFSM4PA67YMQ> .

Golden-proteogenomics · 2020-09-17T01:05:29Z

Hello，
there is a question for me to understand for the countToFPKM,which is what is meanFragmentLength？It was ued in example code. So, could you give more details description what is that or how got that？
sincerely hope your reply.
Thanks！

AAlhendi1707 · 2021-07-13T10:31:59Z

Hello，
there is a question for me to understand for the countToFPKM,which is what is meanFragmentLength？It was ued in example code. So, could you give more details description what is that or how got that？
sincerely hope your reply.
Thanks！

Hi there,

Please find the answer in the below link
#1

kind regards
A

AAlhendi1707 closed this as completed Jul 13, 2021

AAlhendi1707 pinned this issue Jul 13, 2021

AAlhendi1707 unpinned this issue Jul 13, 2021

AAlhendi1707 reopened this Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

counts of rows of fpkm_matrix inconsistent with that of counts_matrix #7

counts of rows of fpkm_matrix inconsistent with that of counts_matrix #7

yusukesano46 commented Jul 19, 2020 •

edited

Loading

AAlhendi1707 commented Jul 19, 2020 •

edited

Loading

yusukesano46 commented Jul 21, 2020 via email

Golden-proteogenomics commented Sep 17, 2020

AAlhendi1707 commented Jul 13, 2021

counts of rows of fpkm_matrix inconsistent with that of counts_matrix #7

counts of rows of fpkm_matrix inconsistent with that of counts_matrix #7

Comments

yusukesano46 commented Jul 19, 2020 • edited Loading

AAlhendi1707 commented Jul 19, 2020 • edited Loading

yusukesano46 commented Jul 21, 2020 via email

Golden-proteogenomics commented Sep 17, 2020

AAlhendi1707 commented Jul 13, 2021

yusukesano46 commented Jul 19, 2020 •

edited

Loading

AAlhendi1707 commented Jul 19, 2020 •

edited

Loading