Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Collapsed RNA-seq matrices with unique gene symbols #248

Closed
komalsrathi opened this issue Nov 8, 2019 · 17 comments
Closed

Collapsed RNA-seq matrices with unique gene symbols #248

komalsrathi opened this issue Nov 8, 2019 · 17 comments
Labels

Comments

@komalsrathi
Copy link
Collaborator

File(s)

Collapsed RNA-seq matrices (merging multiple Ensembl identifers to get unique gene symbols)

Release

v9

Link to OpenPBTA-manuscript

Put a link to the relevant section of the OpenPBTA manuscript here.

Question/issue

Reg. #198, should we make the collapsed RNA-seq matrices available in v10?

@jharenza
Copy link
Collaborator

Updated with #273

@jaclyn-taroni
Copy link
Member

I am reopening this issue. Version 10 includes the following files:

pbta-gene-expression-rsem-fpkm-collapsed_table.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed_table.stranded.rds

These files are the tables that report whether or not a gene was dropped, not the summarized matrices themselves which would have been:

pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds

See

# -m ~/Projects/OpenPBTA-analysis/data/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds \

I think it's fine to continue to include the tables above, but we need to include the summarized matrices as well.

@komalsrathi
Copy link
Collaborator Author

@jaclyn-taroni Actually, that was exactly why I had created this issue - to include the summarized matrices (not the dropped tables). Should I upload those files here: https://cavatica.sbgenomics.com/u/cavatica/pbta/files/#q?path=processed-data-merge%2FV10-data

@jaclyn-taroni
Copy link
Member

That was my understanding of this issue when you filed it too @komalsrathi. I am better equipped to speak to the download-data.sh method for obtaining data. On that front, we probably want to have a new release (v11) that includes these matrices. Tagging @jharenza @yuankunzhu for input on the CAVATICA side of things.

@jharenza
Copy link
Collaborator

@jaclyn-taroni Actually, that was exactly why I had created this issue - to include the summarized matrices (not the dropped tables). Should I upload those files here: https://cavatica.sbgenomics.com/u/cavatica/pbta/files/#q?path=processed-data-merge%2FV10-data

Ahhhh, sorry I thought the summarized matrices were the ones we put into CAVATICA and the data download - yes, we should swap those out, but I will make a V11 folder @komalsrathi - can you add them here: https://cavatica.sbgenomics.com/u/cavatica/pbta/files/#q?path=processed-data-merge%2FV11-data

@jharenza
Copy link
Collaborator

Thinking back, I think this was miscommunication. I asked @kgaonkar6 to grab the collapsed files, but never realized there were also tables in the analysis from that PR, but we will swap out. Thanks!

@jharenza jharenza mentioned this issue Nov 22, 2019
2 tasks
@komalsrathi
Copy link
Collaborator Author

@jharenza done!

@jharenza
Copy link
Collaborator

thanks!

@komalsrathi
Copy link
Collaborator Author

@jharenza I have also added the new collapsed tables with correlations in V11 (discussed here) @jaclyn-taroni will submit a pull request with updated code tomorrow.

@jharenza
Copy link
Collaborator

@jaclyn-taroni @komalsrathi do you want the collapsed tables in the release or just the summarized matrices? Was thinking just the latter...

@komalsrathi
Copy link
Collaborator Author

komalsrathi commented Nov 26, 2019 via email

@jharenza
Copy link
Collaborator

Hmm, I don't know if we want to add all processed data to releases, just data people will need for downstream work. If we do release those tables, I would suggest we rename to something like "collapsed matrices-genes removed" or something. Thoughts, @jaclyn-taroni ?

@komalsrathi
Copy link
Collaborator Author

komalsrathi commented Nov 26, 2019 via email

@jharenza
Copy link
Collaborator

Ahh, ok. Sounds like a good thing to reference in the paper, just don't want people to use that by accident which would defeat the purpose of the collapsing.

@jashapiro
Copy link
Member

jashapiro commented Nov 26, 2019

I feel like there is some miscommunication happening here. There are two kinds of files we are discussing:

  1. Tables of what genes were kept and removed, with no expression data. i.e.
    ‘pbta-gene-expression-rsem-fpkm-collapsed_table.stranded.rds’

  2. Collapsed expression matrixes with only one of each duplicated set of genes. i.e.
    ‘pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds’

What was included in the data download was (1). What I think we want is (2), but we may be using different words.

The reasoning here is that we want people not to have to recreate the logic of the collapsing that @komalsrathi did, or run the scripts repeatedly. Having them in the data repository also makes them “approved” in a way that is helpful to new contributors.

@komalsrathi
Copy link
Collaborator Author

komalsrathi commented Nov 26, 2019 via email

@jharenza
Copy link
Collaborator

jharenza commented Dec 2, 2019

closed with #293

@jharenza jharenza closed this as completed Dec 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants