Updated analysis: GISTIC with ploidy instead of NA #1180

kgaonkar6 · 2021-09-21T04:58:16Z

What analysis module should be updated and why?

run-gistic, errors out with

Running GISTIC on the entire OpenPBTA cohort...
GISTIC version 2.0.23
required parameters:
      base_dir: ../results/pbta-cnv-consensus-gistic
       segfile: ../../../scratch/uncompressed-seg-for-gistic/consensus_seg_file_for_gistic.seg
marker spacing: 10000
   refgenefile: /home/rstudio/gistic_install/refgenefiles/hg38.UCSC.add_miR.160920.refgene.mat
optional_params = 
         array_list_file: []
                cnv_file: []
                   t_amp: 0.1000
                   t_del: 0.1000
       join_segment_size: 2
                     ext: []
               qv_thresh: 0.2500
                remove_X: 0
                 markers: []
      max_marker_spacing: 10000
      run_broad_analysis: 1
        broad_len_cutoff: 0.9800
                   ziggs: [1x1 struct]
                     res: 0.0500
              conf_level: 0.9000
                     cap: 1.5000
          do_gene_gistic: 1
     conserve_disk_space: 0
         save_data_files: 1
            use_segarray: 1
        write_gene_files: 1
           use_two_sided: 1
          do_arbitration: 1
           save_seg_data: 1
                   fname: []
              peak_types: {'robust'}
             genepattern: 1
             arm_peeloff: 1
    gene_collapse_method: 'extreme'
           sample_center: 'median'

Checking inputs...
Making D struct
No markers file specified: generating pseudo-markers!
Reading Seg File '../../../scratch/uncompressed-seg-for-gistic/consensus_seg_file_for_gistic.seg'
Warning: Shortened 17856 segments in '../../../scratch/uncompressed-seg-for-gistic/consensus_seg_file_for_gistic.seg' that overlap by one marker.
> In make_D_from_segseq_data at 122
  In run_gistic2_from_seg at 193
  In gp_gistic2_from_seg at 97 
D = 
               sdesc: {915x1 cell}
                chrn: [317435x1 double]
                 pos: [317435x1 double]
    suppress_history: 1
               islog: 0
                isMB: 0
                 dat: [317435x915 SegArray]

D = 
               sdesc: {915x1 cell}
                chrn: [317435x1 double]
                 pos: [317435x1 double]
    suppress_history: 1
               islog: 0
                isMB: 0
                 dat: [317435x915 SegArray]

Matrix size 317435     915
Removing NaN probes...
Removing 317435 markers with NaNs
Matrix size 0  915
 
GISTIC 2.0 input error detected:
All input data were removed after NaN processing.
updating: pbta-cnv-consensus-gistic/ (stored 0%)
updating: pbta-cnv-consensus-gistic/gistic_inputs.mat (deflated 2%)
Running GISTIC on specific histologies...

We believe that the change from 2 to NA might be causing this issue, gistic then removes all the markers marked as NaNs

What changes need to be made? Please provide enough detail for another participant to make the update.

Suggested by @jaclyn-taroni we could try to convert to NA to ploidy as we do in chromotripsis ( OR should we update the consensus cnv file to use ploidy instead of NA ?)
#1127 (comment)

What input data should be used? Which data were used in the version being updated?

pbta-cnv-consensus.seg.gz

When do you expect the revised analysis will be completed?

Who will complete the updated analysis?

The text was updated successfully, but these errors were encountered:

runjin326 · 2021-09-21T12:10:03Z

@kgaonkar6, thanks for looking into this! I did some digging and I am pretty sure it is coming from the fact that we change copy.num=NA for all the neutral calls. I tried last night to change those back to 2 and it ran just fine.
I see that previously you did some analyses and copy.num=NA gives the least amount of noise in focal CN annotation so I think we can keep that for that module. I was actually thinking maybe we can use a separate seg file just for GISTIC so that the other module would not be affected and either use ploidy or just keep the copy.num=2. Any thought, @jaclyn-taroni, @kgaonkar6?

kgaonkar6 · 2021-09-21T13:46:23Z

I think as recommended by @jaclyn-taroni, we should replace the NA with tumor ploidy within an internal step in run-gistic as we did for chromothripsis here

OpenPBTA-analysis/analyses/chromothripsis/02-run-shatterseek-and-classify-confidence.R

Lines 67 to 76 in d31c927

    
           # Replace rows of NA copy number with ploidy for that particular tumor 
        
             # This is necessary because the latest cnvconsensus data reports all regions lacking CNVs as "NA", 
        
             # but ShatterSeek needs complete CN data to identify oscillating CN regions. 
        
             # Here, we assume the NA regions match the tumor ploidy. (Although it's important to note that some  
        
             # regions marked NA could be uncallable regions, in addition to the regions lacking CNVs.) 
        
           cnvconsensus <- metadata %>%  
        
             dplyr::select(Kids_First_Biospecimen_ID, tumor_ploidy) %>% 
        
             dplyr::rename(ID = Kids_First_Biospecimen_ID) %>% 
        
             dplyr::inner_join(cnvconsensus, by="ID") %>% 
        
             dplyr::mutate(copy.num = dplyr::case_when(is.na(copy.num) ~ tumor_ploidy, TRUE ~ copy.num))

sjspielman · 2022-03-29T15:41:08Z

Partly addressed by updates here: https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/119/files

kgaonkar6 added the updated analysis label Sep 21, 2021

kgaonkar6 mentioned this issue Sep 21, 2021

Run gistic update d3b-center/OpenPedCan-analysis#119

Merged

5 tasks

jaclyn-taroni changed the title ~~Updated analysis: GISTIC with ploudy instead of NA~~ Updated analysis: GISTIC with ploidy instead of NA Sep 21, 2021

sjspielman closed this as completed Mar 29, 2022

jaclyn-taroni reopened this Mar 29, 2022

jaclyn-taroni mentioned this issue May 11, 2022

Use ploidy for copy.num when running GISTIC #1400

Merged

5 tasks

jaclyn-taroni closed this as completed in #1400 May 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated analysis: GISTIC with ploidy instead of NA #1180

Updated analysis: GISTIC with ploidy instead of NA #1180

kgaonkar6 commented Sep 21, 2021

runjin326 commented Sep 21, 2021

kgaonkar6 commented Sep 21, 2021

sjspielman commented Mar 29, 2022 •

edited

Updated analysis: GISTIC with ploidy instead of NA #1180

Updated analysis: GISTIC with ploidy instead of NA #1180

Comments

kgaonkar6 commented Sep 21, 2021

What analysis module should be updated and why?

What changes need to be made? Please provide enough detail for another participant to make the update.

What input data should be used? Which data were used in the version being updated?

When do you expect the revised analysis will be completed?

Who will complete the updated analysis?

runjin326 commented Sep 21, 2021

kgaonkar6 commented Sep 21, 2021

sjspielman commented Mar 29, 2022 • edited

sjspielman commented Mar 29, 2022 •

edited