Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between the example object and the one parsed from file #68

Closed
afaissa opened this issue Jun 9, 2021 · 4 comments
Closed

Comments

@afaissa
Copy link

afaissa commented Jun 9, 2021

Hi, thank you for the package!

The structure of object from the example is different from a object that is parsed from a file.

In the second I can't retrieve the columns metadata correctly and all the metadata is on the format "REP.A001_A375_24H:A03"

What I am doing wrong?

library(cmapR)
Example_dsPath <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR")
Example_ds <- parse_gctx(Example_dsPath)
reading ......Documents/R/win-library/4.1/cmapR/extdata/modzs_n25x50.gctx
done

GSE70138_Level5_Path <- "./GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx"
GSE70138_Level5_ds <- parse_gctx(GSE70138_Level5_Path)
reading ./GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx
done

Example_ds
Formal class 'GCT' [package "cmapR"] with 7 slots
..@ mat : num [1:50, 1:25] -1.145 -1.165 0.437 0.139 -0.673 ...
.. ..- attr(, "dimnames")=List of 2
.. .. ..$ : chr [1:50] "200814_at" "222103_at" "201453_x_at" "204131_s_at" ...
.. .. ..$ : chr [1:25] "CPC004_PC3_24H:BRD-A51714012-001-03-1:10" "BRAF001_HEK293T_24H:BRD-U73308409-000-01-9:0.625" "CPC006_HT29_24H:BRD-U88459701-000-01-8:10" "CVD001_HEPG2_24H:BRD-U88459701-000-01-8:10" ...
..@ rid : chr [1:50] "200814_at" "222103_at" "201453_x_at" "204131_s_at" ...
..@ cid : chr [1:25] "CPC004_PC3_24H:BRD-A51714012-001-03-1:10" "BRAF001_HEK293T_24H:BRD-U73308409-000-01-9:0.625" "CPC006_HT29_24H:BRD-U88459701-000-01-8:10" "CVD001_HEPG2_24H:BRD-U88459701-000-01-8:10" ...
..@ rdesc :'data.frame': 50 obs. of 6 variables:
.. ..$ id : chr [1:50] "200814_at" "222103_at" "201453_x_at" "204131_s_at" ...
.. ..$ is_bing : int [1:50] 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ is_lm : int [1:50] 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ pr_gene_id : int [1:50] 5720 466 6009 2309 387 3553 427 5898 23365 6657 ...
.. ..$ pr_gene_symbol: chr [1:50] "PSME1" "ATF1" "RHEB" "FOXO3" ...
.. ..$ pr_gene_title : chr [1:50] "proteasome (prosome, macropain) activator subunit 1 (PA28 alpha)" "activating transcription factor 1" "Ras homolog enriched in brain" "forkhead box O3" ...
..@ cdesc :'data.frame': 25 obs. of 16 variables:
.. ..$ brew_prefix : chr [1:25] "CPC004_PC3_24H" "BRAF001_HEK293T_24H" "CPC006_HT29_24H" "CVD001_HEPG2_24H" ...
.. ..$ cell_id : chr [1:25] "PC3" "HEK293T" "HT29" "HEPG2" ...
.. ..$ distil_cc_q75 : num [1:25] 0.05 0.1 0.17 0.45 0.24 ...
.. ..$ distil_nsample : int [1:25] 5 9 4 3 4 5 2 3 2 2 ...
.. ..$ distil_ss : num [1:25] 2.9 1.88 2.71 4.06 3.83 ...
.. ..$ id : chr [1:25] "CPC004_PC3_24H:BRD-A51714012-001-03-1:10" "BRAF001_HEK293T_24H:BRD-U73308409-000-01-9:0.625" "CPC006_HT29_24H:BRD-U88459701-000-01-8:10" "CVD001_HEPG2_24H:BRD-U88459701-000-01-8:10" ...
.. ..$ is_gold : int [1:25] 0 0 0 1 1 0 1 0 0 0 ...
.. ..$ ngenes_modulated_dn_lm: int [1:25] 11 3 8 38 36 23 12 11 33 13 ...
.. ..$ ngenes_modulated_up_lm: int [1:25] 10 7 25 40 16 17 23 14 37 22 ...
.. ..$ pct_self_rank_q25 : num [1:25] 26.904 17.125 7.06 0.229 4.686 ...
.. ..$ pert_id : chr [1:25] "BRD-A51714012" "BRD-U73308409" "BRD-U88459701" "BRD-U88459701" ...
.. ..$ pert_idose : chr [1:25] "10 M" "500 nM" "10 M" "10 M" ...
.. ..$ pert_iname : chr [1:25] "venlafaxine" "vemurafenib" "atorvastatin" "atorvastatin" ...
.. ..$ pert_itime : chr [1:25] "24 h" "24 h" "24 h" "24 h" ...
.. ..$ pert_type : chr [1:25] "trt_cp" "trt_cp" "trt_cp" "trt_cp" ...
.. ..$ pool_id : chr [1:25] "epsilon" "epsilon" "epsilon" "epsilon" ...
..@ version: chr(0)
..@ src : chr "....../Documents/R/win-library/4.1/cmapR/extdata/modzs_n25x50.gctx"
GSE70138_Level5_ds
Formal class 'GCT' [package "cmapR"] with 7 slots
..@ mat : num [1:12328, 1:118050] 4.2641 0.0572 -1.0125 0.3089 -0.1041 ...
.. ..- attr(
, "dimnames")=List of 2
.. .. ..$ : chr [1:12328] "780" "7849" "2978" "2049" ...
.. .. ..$ : chr [1:118050] "REP.A001_A375_24H:A03" "REP.A001_A375_24H:A04" "REP.A001_A375_24H:A05" "REP.A001_A375_24H:A06" ...
..@ rid : chr [1:12328] "780" "7849" "2978" "2049" ...
..@ cid : chr [1:118050] "REP.A001_A375_24H:A03" "REP.A001_A375_24H:A04" "REP.A001_A375_24H:A05" "REP.A001_A375_24H:A06" ...
..@ rdesc :'data.frame': 12328 obs. of 1 variable:
.. ..$ id: chr [1:12328] "780" "7849" "2978" "2049" ...
..@ cdesc :'data.frame': 118050 obs. of 1 variable:
.. ..$ id: chr [1:118050] "REP.A001_A375_24H:A03" "REP.A001_A375_24H:A04" "REP.A001_A375_24H:A05" "REP.A001_A375_24H:A06" ...
..@ version: chr(0)
..@ src : chr "./GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx"

sessionInfo()
R version 4.1.0 Patched (2021-05-29 r80415)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] cmapR_1.5.0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 XVector_0.33.0
[3] GenomicRanges_1.45.0 BiocGenerics_0.39.0
[5] zlibbioc_1.39.0 IRanges_2.27.0
[7] flowCore_2.4.0 lattice_0.20-44
[9] GenomeInfoDb_1.29.0 tools_4.1.0
[11] SummarizedExperiment_1.23.0 parallel_4.1.0
[13] grid_4.1.0 rhdf5_2.36.0
[15] Biobase_2.53.0 matrixStats_0.59.0
[17] RcppParallel_5.1.4 Matrix_1.3-4
[19] GenomeInfoDbData_1.2.6 Rhdf5lib_1.14.0
[21] cytolib_2.4.0 RProtoBufLib_2.4.0
[23] rhdf5filters_1.4.0 S4Vectors_0.31.0
[25] bitops_1.0-7 RCurl_1.98-1.3
[27] DelayedArray_0.19.0 compiler_4.1.0
[29] MatrixGenerics_1.5.0 stats4_4.1.0

@tnat1031
Copy link
Contributor

tnat1031 commented Jun 9, 2021

Hi @afaissa, it looks like you're doing everything right and both files are being parsed correctly. Can you provide more detail about what you think is the problem?

Thanks,
Ted

@afaissa
Copy link
Author

afaissa commented Jun 9, 2021

Hi Ted

Thank you for your quickly reply.

Since I would like to subset the object by drug, time, etc, I can do that only using the example object but not for the real one. In other words, the object parsed from file gives me only one column of ids.

Please, let me know if I should provide any additional information.

Thank you,
Alex

col_metaExample <- read_gctx_meta(Example_dsPath, dim="col")
head(col_metaExample)
brew_prefix cell_id distil_cc_q75 distil_nsample distil_ss
1 CPC004_PC3_24H PC3 0.05 5 2.904230
2 BRAF001_HEK293T_24H HEK293T 0.10 9 1.879494
3 CPC006_HT29_24H HT29 0.17 4 2.707330
4 CVD001_HEPG2_24H HEPG2 0.45 3 4.061800
5 NMH001_NEU_6H NEU 0.24 4 3.833890
6 CPC020_VCAP_6H VCAP 0.05 5 2.958420
id is_gold ngenes_modulated_dn_lm
1 CPC004_PC3_24H:BRD-A51714012-001-03-1:10 0 11
2 BRAF001_HEK293T_24H:BRD-U73308409-000-01-9:0.625 0 3
3 CPC006_HT29_24H:BRD-U88459701-000-01-8:10 0 8
4 CVD001_HEPG2_24H:BRD-U88459701-000-01-8:10 1 38
5 NMH001_NEU_6H:BRD-K69726342-001-02-6:10 1 36
6 CPC020_VCAP_6H:BRD-A82307304-001-01-8:10 0 23
ngenes_modulated_up_lm pct_self_rank_q25 pert_id pert_idose pert_iname
1 10 26.9041100 BRD-A51714012 10 M venlafaxine
2 7 17.1252003 BRD-U73308409 500 nM vemurafenib
3 25 7.0596299 BRD-U88459701 10 M atorvastatin
4 40 0.2293578 BRD-U88459701 10 M atorvastatin
5 16 4.6864233 BRD-K69726342 10 M atorvastatin
6 17 26.9961987 BRD-A82307304 10 M atorvastatin
pert_itime pert_type pool_id
1 24 h trt_cp epsilon
2 24 h trt_cp epsilon
3 24 h trt_cp epsilon
4 24 h trt_cp epsilon
5 6 h trt_cp epsilon
6 6 h trt_cp epsilon

col_metaGSE70138 <- read_gctx_meta(GSE70138_Level5_Path, dim="col")

head(col_metaGSE70138)
id
1 REP.A001_A375_24H:A03
2 REP.A001_A375_24H:A04
3 REP.A001_A375_24H:A05
4 REP.A001_A375_24H:A06
5 REP.A001_A375_24H:A07
6 REP.A001_A375_24H:A08

@tnat1031
Copy link
Contributor

tnat1031 commented Jun 9, 2021

Hi @afaissa, Ok I see what the problem is. The issue is that the second GCTX file is not annotated. That is, the file contains only the matrix of data but no sample annotations. These sample annotations can be obtained in the same GEO repository where you downloaded the level 5 data file. You will want the 'siginfo.txt' file in the same repository. Please see this section of the tutorial for how to add annotations to the GCT object once you've read them in.

Hopefully that helps but please let me know if you have any other questions.

Thanks a lot,
Ted

@tnat1031 tnat1031 closed this as completed Jun 9, 2021
@afaissa
Copy link
Author

afaissa commented Jun 9, 2021

That worked! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants