Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutation data all "1" for gCSI_2017 using either summary.stat #71

Open
khughitt opened this issue Oct 2, 2020 · 5 comments
Open

Mutation data all "1" for gCSI_2017 using either summary.stat #71

khughitt opened this issue Oct 2, 2020 · 5 comments
Assignees

Comments

@khughitt
Copy link

khughitt commented Oct 2, 2020

Greetings!

In going through the gCSI_2017 dataset, I noticed that the mutation data appears to have either been incorrectly parsed, or is otherwise not very informative: all non-missing values returned by a called to summarizeMolecularProfiles have the same value, "1".

To Reproduce:

library(PharmacoGx)
library(SummarizedExperiment)

pset <- downloadPSet('gCSI_2017', saveDir = '/tmp')

# summary.stat = 'or'
se <- summarizeMolecularProfiles(pset, mDataType = 'mutation', summary.stat = 'or')
dat <- assay(se, 1)

#table(dat == 1)
# 
#  TRUE 
# 13480 
#

# summary.stat = 'and'
se <- summarizeMolecularProfiles(pset, mDataType = 'mutation', summary.stat = 'and')
dat <- assay(se, 1)

System information:

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libopenblasp-r0.3.10.so
LAPACK: /usr/lib/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] SummarizedExperiment_1.19.6 DelayedArray_0.15.7         matrixStats_0.56.0          Matrix_1.2-18
 [5] Biobase_2.49.1              GenomicRanges_1.41.6        GenomeInfoDb_1.25.11        IRanges_2.23.10
 [9] S4Vectors_0.27.12           BiocGenerics_0.35.4         PharmacoGx_2.1.10           CoreGx_1.1.4
[13] nvimcom_0.9-102

loaded via a namespace (and not attached):
 [1] lsa_0.73.2             bitops_1.0-6           RColorBrewer_1.1-2     SnowballC_0.7.0        repr_1.1.0
 [6] tools_4.0.2            R6_2.4.1               DT_0.15                KernSmooth_2.23-17     sm_2.2-5.6
[11] colorspace_1.4-1       tidyselect_1.1.0       gridExtra_2.3          curl_4.3               compiler_4.0.2
[16] shinyjs_2.0.0          slam_0.1-47            caTools_1.18.0         scales_1.1.1           relations_0.6-9
[21] stringr_1.4.0          digest_0.6.25          XVector_0.29.3         base64enc_0.1-3        pkgconfig_2.0.3
[26] htmltools_0.5.0        plotrix_3.7-8          fastmap_1.0.1          limma_3.45.14          maps_3.3.0
[31] htmlwidgets_1.5.1      rlang_0.4.7            shiny_1.5.0            visNetwork_2.0.9       generics_0.0.2
[36] jsonlite_1.7.1         txtplot_1.0-4          BiocParallel_1.23.2    gtools_3.8.2           dplyr_1.0.2
[41] RCurl_1.98-1.2         magrittr_1.5           GenomeInfoDbData_1.2.3 celestial_1.4.6        Rcpp_1.0.5
[46] munsell_0.5.0          lifecycle_0.2.0        stringi_1.5.3          piano_2.5.0            MASS_7.3-53
[51] RJSONIO_1.3-1.4        zlibbioc_1.35.0        plyr_1.8.6             gplots_3.0.4           grid_4.0.2
[56] gdata_2.18.0           promises_1.1.1         shinydashboard_0.7.1   crayon_1.3.4           lattice_0.20-41
[61] mapproj_1.2.7          knitr_1.29             pillar_1.4.6           fgsea_1.15.2           tcltk_4.0.2
[66] igraph_1.2.5           reshape2_1.4.4         marray_1.67.0          fastmatch_1.1-0        NISTunits_1.0.1
[71] glue_1.4.2             downloader_0.4         data.table_1.13.0      BiocManager_1.30.10    vctrs_0.3.4
[76] httpuv_1.5.4           testthat_2.3.2         RANN_2.6.1             gtable_0.3.0           purrr_0.3.4
[81] ggplot2_3.3.2          xfun_0.17              mime_0.9               skimr_2.1.2            xtable_1.8-4
[86] pracma_2.2.9           later_1.1.0.1          tibble_3.0.3           sets_1.0-18            cluster_2.1.0
[91] ellipsis_0.3.1         magicaxis_2.0.10
@ChristopherEeles
Copy link
Contributor

Hi @khughitt,

I will look into this and get back to you early next week.

Best,
Chris

@ChristopherEeles ChristopherEeles self-assigned this Oct 6, 2020
@ChristopherEeles
Copy link
Contributor

Hi @khughitt,

I just ran through debugging for your code. Looks like the issue is with the gCSI_2017 PharmacoSet mutation data. I am reaching out to my colleagues now to look into resolving the issue.

I will keep you updated on our progress and share the correct data as soon as it is available.

Best,
Chris

@khughitt
Copy link
Author

khughitt commented Oct 6, 2020

Great! Thanks for taking the time to look into the issue and report it upstream!

@ChristopherEeles
Copy link
Contributor

Hey @khughitt,

Just checking in so you know we didn't forget about you. The problem with the mutation data goes all the way upstream to Genentech. We are currently working with them to resolve the issue but it may take some time.

Best,
Chris

@khughitt
Copy link
Author

Hi @ChristopherEeles

No problem -- Thanks for taking the time to follow-up!

Cheers,
Keith

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants