Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Updated analysis: GISTIC with ploidy instead of NA #1180

Closed
kgaonkar6 opened this issue Sep 21, 2021 · 3 comments · Fixed by #1400
Closed

Updated analysis: GISTIC with ploidy instead of NA #1180

kgaonkar6 opened this issue Sep 21, 2021 · 3 comments · Fixed by #1400

Comments

@kgaonkar6
Copy link
Collaborator

What analysis module should be updated and why?

run-gistic, errors out with

Running GISTIC on the entire OpenPBTA cohort...
GISTIC version 2.0.23
required parameters:
      base_dir: ../results/pbta-cnv-consensus-gistic
       segfile: ../../../scratch/uncompressed-seg-for-gistic/consensus_seg_file_for_gistic.seg
marker spacing: 10000
   refgenefile: /home/rstudio/gistic_install/refgenefiles/hg38.UCSC.add_miR.160920.refgene.mat
optional_params = 
         array_list_file: []
                cnv_file: []
                   t_amp: 0.1000
                   t_del: 0.1000
       join_segment_size: 2
                     ext: []
               qv_thresh: 0.2500
                remove_X: 0
                 markers: []
      max_marker_spacing: 10000
      run_broad_analysis: 1
        broad_len_cutoff: 0.9800
                   ziggs: [1x1 struct]
                     res: 0.0500
              conf_level: 0.9000
                     cap: 1.5000
          do_gene_gistic: 1
     conserve_disk_space: 0
         save_data_files: 1
            use_segarray: 1
        write_gene_files: 1
           use_two_sided: 1
          do_arbitration: 1
           save_seg_data: 1
                   fname: []
              peak_types: {'robust'}
             genepattern: 1
             arm_peeloff: 1
    gene_collapse_method: 'extreme'
           sample_center: 'median'

Checking inputs...
Making D struct
No markers file specified: generating pseudo-markers!
Reading Seg File '../../../scratch/uncompressed-seg-for-gistic/consensus_seg_file_for_gistic.seg'
Warning: Shortened 17856 segments in '../../../scratch/uncompressed-seg-for-gistic/consensus_seg_file_for_gistic.seg' that overlap by one marker.
> In make_D_from_segseq_data at 122
  In run_gistic2_from_seg at 193
  In gp_gistic2_from_seg at 97 
D = 
               sdesc: {915x1 cell}
                chrn: [317435x1 double]
                 pos: [317435x1 double]
    suppress_history: 1
               islog: 0
                isMB: 0
                 dat: [317435x915 SegArray]

D = 
               sdesc: {915x1 cell}
                chrn: [317435x1 double]
                 pos: [317435x1 double]
    suppress_history: 1
               islog: 0
                isMB: 0
                 dat: [317435x915 SegArray]

Matrix size 317435     915
Removing NaN probes...
Removing 317435 markers with NaNs
Matrix size 0  915
 
GISTIC 2.0 input error detected:
All input data were removed after NaN processing.
updating: pbta-cnv-consensus-gistic/ (stored 0%)
updating: pbta-cnv-consensus-gistic/gistic_inputs.mat (deflated 2%)
Running GISTIC on specific histologies...

We believe that the change from 2 to NA might be causing this issue, gistic then removes all the markers marked as NaNs

What changes need to be made? Please provide enough detail for another participant to make the update.

Suggested by @jaclyn-taroni we could try to convert to NA to ploidy as we do in chromotripsis ( OR should we update the consensus cnv file to use ploidy instead of NA ?)
#1127 (comment)

What input data should be used? Which data were used in the version being updated?

pbta-cnv-consensus.seg.gz

When do you expect the revised analysis will be completed?

Who will complete the updated analysis?

@runjin326
Copy link
Collaborator

@kgaonkar6, thanks for looking into this! I did some digging and I am pretty sure it is coming from the fact that we change copy.num=NA for all the neutral calls. I tried last night to change those back to 2 and it ran just fine.
I see that previously you did some analyses and copy.num=NA gives the least amount of noise in focal CN annotation so I think we can keep that for that module. I was actually thinking maybe we can use a separate seg file just for GISTIC so that the other module would not be affected and either use ploidy or just keep the copy.num=2. Any thought, @jaclyn-taroni, @kgaonkar6?

@jaclyn-taroni jaclyn-taroni changed the title Updated analysis: GISTIC with ploudy instead of NA Updated analysis: GISTIC with ploidy instead of NA Sep 21, 2021
@kgaonkar6
Copy link
Collaborator Author

I think as recommended by @jaclyn-taroni, we should replace the NA with tumor ploidy within an internal step in run-gistic as we did for chromothripsis here

# Replace rows of NA copy number with ploidy for that particular tumor
# This is necessary because the latest cnvconsensus data reports all regions lacking CNVs as "NA",
# but ShatterSeek needs complete CN data to identify oscillating CN regions.
# Here, we assume the NA regions match the tumor ploidy. (Although it's important to note that some
# regions marked NA could be uncallable regions, in addition to the regions lacking CNVs.)
cnvconsensus <- metadata %>%
dplyr::select(Kids_First_Biospecimen_ID, tumor_ploidy) %>%
dplyr::rename(ID = Kids_First_Biospecimen_ID) %>%
dplyr::inner_join(cnvconsensus, by="ID") %>%
dplyr::mutate(copy.num = dplyr::case_when(is.na(copy.num) ~ tumor_ploidy, TRUE ~ copy.num))

@sjspielman
Copy link
Member

sjspielman commented Mar 29, 2022

Partly addressed by updates here: https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/119/files

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants