Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Inconsistent gene ordering in pbta-gene-expression-rsem-fpkm.stranded.rds #370

Closed
jashapiro opened this issue Dec 23, 2019 · 2 comments
Closed
Labels

Comments

@jashapiro
Copy link
Member

What data file(s) does this issue pertain to?

pbta-gene-expression-rsem-fpkm.stranded.rds

What release are you using?

v12

Put a link to the relevant section of the [OpenPBTA-manuscript]

Put your question or report your issue here.

The gene order in file pbta-gene-expression-rsem-fpkm.stranded.rds does not match the other rsem data files. In particular the following gene_ids are not in the same positions between files.

> cbind(fpkm_rsem_polyA$gene_id, fpkm_rsem_stranded$gene_id)[fpkm_rsem_polyA$gene_id!=fpkm_rsem_stranded$gene_id,]
      [,1]                                [,2]                               
 [1,] "ENSG00000124333.15_PAR_Y_VAMP7"    "ENSG00000124333.15_VAMP7"         
 [2,] "ENSG00000124333.15_VAMP7"          "ENSG00000124333.15_PAR_Y_VAMP7"   
 [3,] "ENSG00000167393.17_PAR_Y_PPP2R3B"  "ENSG00000167393.17_PPP2R3B"       
 [4,] "ENSG00000167393.17_PPP2R3B"        "ENSG00000167393.17_PAR_Y_PPP2R3B" 
 [5,] "ENSG00000168939.11_PAR_Y_SPRY3"    "ENSG00000168939.11_SPRY3"         
 [6,] "ENSG00000168939.11_SPRY3"          "ENSG00000168939.11_PAR_Y_SPRY3"   
 [7,] "ENSG00000169100.13_PAR_Y_SLC25A6"  "ENSG00000169100.13_SLC25A6"       
 [8,] "ENSG00000169100.13_SLC25A6"        "ENSG00000169100.13_PAR_Y_SLC25A6" 
 [9,] "ENSG00000182378.13_PAR_Y_PLCXD1"   "ENSG00000182378.13_PLCXD1"        
[10,] "ENSG00000182378.13_PLCXD1"         "ENSG00000182378.13_PAR_Y_PLCXD1"  
[11,] "ENSG00000182484.15_PAR_Y_WASH6P"   "ENSG00000182484.15_WASH6P"        
[12,] "ENSG00000182484.15_WASH6P"         "ENSG00000182484.15_PAR_Y_WASH6P"  
[13,] "ENSG00000185203.12_PAR_Y_WASIR1"   "ENSG00000185203.12_WASIR1"        
[14,] "ENSG00000185203.12_WASIR1"         "ENSG00000185203.12_PAR_Y_WASIR1"  
[15,] "ENSG00000185960.13_PAR_Y_SHOX"     "ENSG00000185960.13_SHOX"          
[16,] "ENSG00000185960.13_SHOX"           "ENSG00000185960.13_PAR_Y_SHOX"    
[17,] "ENSG00000214717.11_PAR_Y_ZBED1"    "ENSG00000214717.11_ZBED1"         
[18,] "ENSG00000214717.11_ZBED1"          "ENSG00000214717.11_PAR_Y_ZBED1"   
[19,] "ENSG00000223274.6_PAR_Y_RNA5SP498" "ENSG00000223274.6_RNA5SP498"      
[20,] "ENSG00000223274.6_RNA5SP498"       "ENSG00000223274.6_PAR_Y_RNA5SP498"
[21,] "ENSG00000223484.7_PAR_Y_TRPC6P"    "ENSG00000223484.7_TRPC6P"         
[22,] "ENSG00000223484.7_TRPC6P"          "ENSG00000223484.7_PAR_Y_TRPC6P"   
[23,] "ENSG00000225661.7_PAR_Y_RPL14P5"   "ENSG00000225661.7_RPL14P5"        
[24,] "ENSG00000225661.7_RPL14P5"         "ENSG00000225661.7_PAR_Y_RPL14P5"  
@jashapiro jashapiro added the data label Dec 23, 2019
@kgaonkar6
Copy link
Collaborator

Sorry I just checked ,the inconsistency arise because of the way I merged the gene fpkm files.

I used tximport https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html (rsem mode) to merge the all (new and existing) stranded rsem gene FPKM files while rsem polya FPKM files were merged by @tkoganti using this script https://github.com/kgaonkar6/OpenPBTA-analysis/blob/master/analyses/collapse-rnaseq/00-create-rsem-files.R.

I'll redo the merging using the same script which will make it consistent with the other rsem files.

@jharenza jharenza mentioned this issue Dec 23, 2019
7 tasks
@jaclyn-taroni
Copy link
Member

Looking at the v13:

> identical(`pbta-gene-expression-rsem-fpkm.polya`$gene_id, `pbta-gene-expression-rsem-fpkm.stranded`$gene_id)
[1] TRUE

Closing this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants