Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

v23 run generate analysis files (1/n) #1631

Merged
merged 4 commits into from
Jan 5, 2023
Merged

v23 run generate analysis files (1/n) #1631

merged 4 commits into from
Jan 5, 2023

Conversation

jharenza
Copy link
Collaborator

@jharenza jharenza commented Dec 14, 2022

Purpose/implementation Section

What scientific question is your analysis addressing?

Prep for v23 release

What was your approach?

Run generate-analysis-files.sh for V23

What GitHub issue does your pull request address?

NA

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Will look into the changes in files to see if something is strikingly wrong or maybe a change was missed previously.
Update: these look ok, mainly due to updates in independent specimens and notebook HTML files. The cnv_consensus.tsv file has changed, but the final CNV files used in release do not change:

fa0adcd26f408d840f25339d2a1ea09f  consensus_seg_annotated_cn_autosomes.tsv.gz
9589cd18d0e6c1f7c2d939126c63ec6c  consensus_seg_annotated_cn_x_and_y.tsv.gz
b8b97483b4d65e65c8ae34ff89b3ef94  consensus_seg_with_status.tsv
b9284650be04df3538e6c6dba29b8eb0  pbta-cnv-consensus.seg.gz

I am going to investigate the SNV consensus maf changes, but my hunch is the row ordering is the cuplrit because I would also not expect those to change.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

yes

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Files being updated in release:

fusion_summary_ependymoma_foi.tsv (expected with #1619)
ae559544f5e8baf0f0f21ab7c00f0041  independent-specimens.rnaseq.primary-plus-stranded.tsv
44b1e1d483465798221799cf63b93fb8  independent-specimens.wgs.primary-plus.tsv
43bf4bd1f0d073607276ce6eef989951  independent-specimens.wgs.primary.tsv
372db726c453efded2340da8ad536e81  independent-specimens.wgswxs.primary-plus.tsv
94283581188cc87427b3b58b1fe75860  independent-specimens.wgswxs.primary.tsv
d4251fcd7f7bea0f64a9a247a40a21e0  pbta-cnv-consensus-gistic.zip
757159a9d864d78ef65c8b68453e2f86  pbta-fusion-recurrently-fused-genes-byhistology.tsv
95d6b0c3401f8c6c10c4c013cb78e275  pbta-fusion-recurrently-fused-genes-bysample.tsv
2be7929f8fc130fc2048cf8f8d0b1c55  pbta-snv-consensus-mutation.maf.tsv.gz
21126513a05c43427af774884aaeeb46  tcga-snv-consensus-snv.maf.tsv.gz

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jaclyn-taroni
Copy link
Member

@jharenza the CI file-related updates should go into a different PR (analyses/create-subset-files/create_subset_files.sh and analyses/create-subset-files/biospecimen_ids_for_subset.RDS)

@jharenza
Copy link
Collaborator Author

@jharenza the CI file-related updates should go into a different PR (analyses/create-subset-files/create_subset_files.sh and analyses/create-subset-files/biospecimen_ids_for_subset.RDS)

Oh let me remove those, I think that was leftover from my testing that script in a different branch

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I will wait to hear back from you about the MAF changes. I expect many of the changes are related to the independent specimen files, including those to the fusion results because of:

# remove samples which have WGS/WXS because those would have been captured from the independent-wgswxs-sample
clinical_rna_v2<-clinical %>%
dplyr::filter(experimental_strategy == "RNA-Seq",!Kids_First_Participant_ID %in% clinical_wgs$Kids_First_Participant_ID)

However, the new EWSR1--FLI1 fusion in a specimen that was already captured in the recurrent fusions file surprised me?

@jharenza
Copy link
Collaborator Author

unzipped mafs are identical:

harenzaj@38f9d38f36c9 data % md5sum release-v22-20220505/*snv-consensus*maf.tsv   
337cc86a1c62eb2cef3cc9d8669c2eda  release-v22-20220505/pbta-snv-consensus-mutation.maf.tsv
f979aad447c9f8bcbac70b1fd2270a73  release-v22-20220505/tcga-snv-consensus-snv.maf.tsv
harenzaj@38f9d38f36c9 data % md5sum release-v23-20230115/*snv-consensus*maf.tsv
337cc86a1c62eb2cef3cc9d8669c2eda  release-v23-20230115/pbta-snv-consensus-mutation.maf.tsv
f979aad447c9f8bcbac70b1fd2270a73  release-v23-20230115/tcga-snv-consensus-snv.maf.tsv

@jharenza jharenza changed the title v23 run generate analysis files v23 run generate analysis files (1/n) Dec 16, 2022
Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes seem to be expected. The zip file change (i.e., the GISTIC results) makes sense since other file attributes are probably being included, regardless of whether or not the underlying results change.

I would not merge this yet, I would wait until all the PRs that will be stacked on this branch are merged into this branch first.

* v23 tp53 score

* V23 mb subtyping (3/n) (#1634)

* V23 MB subtyping

* v23 epn add YAP1--MAML2 to fusions list and YAP1 calling, rerun (4/n) (#1638)

* v23 epn add YAP1--MAML2 to fusions list and YAP1 calling, rerun

* v23 chordoma (updating to 5/n) (#1635) (#1647)

* v23 chordoma

* v23 ews, neurocytoma, hgg (updating to 6/n) (#1636)

* v23 ews, neurocytoma, hgg

* v23 LGG (updating to 7/n) (#1637)

* v23 LGG

* v23 LGG

* V23 compile (8/n) (#1639)

* v23 compile

* V23 integrate (9/n) (#1640)

* add aliquot extraction status files and code

* add extraction type to histology file

* add comments to code

* update README

* add table of counts

* remove old input file

* add notebook output /rerun

* add all RNA samples with NA for extraction type

* add 00- to bash script, use base hist, rerun

response to review

Co-Authored-By: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>

Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>

Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>

Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>

Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>

Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>

Co-authored-by: Jo Lynne Rokita <jolynnerokita@d3b.center>

Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>

Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>

Co-authored-by: Jaclyn Taroni <19534205+jaclyn-taroni@users.noreply.github.com>
@jharenza jharenza merged commit 18c7064 into master Jan 5, 2023
@jharenza jharenza deleted the v23-analysis-files branch January 5, 2023 23:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants