Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

counts of rows of fpkm_matrix inconsistent with that of counts_matrix #7

Open
yusukesano46 opened this issue Jul 19, 2020 · 4 comments

Comments

@yusukesano46
Copy link

yusukesano46 commented Jul 19, 2020

Hello,

I performed below command. Why counts of rows of "fpkm_matrix" inconsistent with that of "counts"?

================
library(countToFPKM)
counts <- read.delim("XXX.txt", header=T, sep="\t",row.names=1) #read counts were calculated by htseq-counts. annotation file was "gencode.v22.annotation.gtf"

gene.annotations <- read.table("featurelength.txt", sep="\t", header=TRUE) #featurelength were calculated by "GenomicFeatures". annotation file was "gencode.v22.annotation.gtf"
featureLength <- gene.annotations$featurelength

samples.metrics <- read.table("meanFragmentLength_adapter.txt", sep="\t", header=TRUE) #meanFragmentLength were calculated by Picard
meanFragmentLength <- samples.metrics$meanFragmentLength

fpkm_matrix <- fpkm (counts, featureLength, meanFragmentLength)

nrow(counts)
[1] 60483

nrow(gene.annotations)
[1] 60483

nrow(fpkm_matrix)
[1] 42954

=================
Why are these results ("nrow(counts)" and "nrow(fpkm_matrix)") not consistent?

@AAlhendi1707
Copy link
Owner

AAlhendi1707 commented Jul 19, 2020

Hi there

Thanks for reporting this issue.

For accurate quantification of FPKM of RNA-Seq data, the read counts need to be normalised by feature effective length Lee et al. 2011 paper. To compute the effective length, the meanFragmentLength will be deducted from the feature length. Thus, the features lengthened less than the meanFragmentLength will be automatically dropped off. In other word, you cannot calculate the fpkm for features smaller than the meanFragmentLength, and that is why your fpkm_matrix is shorter than counts.

To get stats about the genes that drop off due to featureLength < meanFragmentLength
Please try to use the latest version from Github

if(!require(devtools)) install.packages("devtools")
devtools::install_github("AAlhendi1707/countToFPKM", build_vignettes = TRUE)

Hope it helps!
A

@yusukesano46
Copy link
Author

yusukesano46 commented Jul 21, 2020 via email

@Golden-proteogenomics
Copy link

Hello,
there is a question for me to understand for the countToFPKM,which is what is meanFragmentLength?It was ued in example code. So, could you give more details description what is that or how got that?
sincerely hope your reply.
Thanks!

@AAlhendi1707
Copy link
Owner

Hello,
there is a question for me to understand for the countToFPKM,which is what is meanFragmentLength?It was ued in example code. So, could you give more details description what is that or how got that?
sincerely hope your reply.
Thanks!

Hi there,

Please find the answer in the below link
#1

kind regards
A

@AAlhendi1707 AAlhendi1707 pinned this issue Jul 13, 2021
@AAlhendi1707 AAlhendi1707 unpinned this issue Jul 13, 2021
@AAlhendi1707 AAlhendi1707 reopened this Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants