Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Following tutorial yields error #32

Closed
tstoeger opened this issue Mar 4, 2018 · 12 comments
Closed

Following tutorial yields error #32

tstoeger opened this issue Mar 4, 2018 · 12 comments

Comments

@tstoeger
Copy link

tstoeger commented Mar 4, 2018

Following the tutorial cmapPy_pandasGEXpress_tutorial.ipynb currently (2018-March-03) yields an error.

Since it uses an external data set GEO GSE70138 (rather than a test contained within cmapPy) it isn't clear, if this error reflects upon an update or problem within cmapPy, the tutorial, or GSE70138. (Besides not being able to follow a tutorial, this error hence makes it difficult for new users to become familiar with gctx files / cmapPy.)

works: upper part of tutorial

import pandas as pd
sig_info = pd.read_csv("GSE70138_Broad_LINCS_sig_info.txt", sep="\t") # updated file name

vorinostat_ids = sig_info["sig_id"][sig_info["pert_iname"] == "vorinostat"]
# Let us additionally report on the data
print("number of samples treated with vorinostat:", len(vorinostat_ids))
print('\n---- show first ones for debugging ----')
[print(x) for x in vorinostat_ids.values[:5]];

number of samples treated with vorinostat: 210

---- show first ones for debugging ----
LJP007_A375_24H:A03
LJP007_A549_24H:A03
LJP007_ASC.C_24H:A03
LJP007_ASC_24H:A03
LJP007_CD34_24H:A03

creates error: loading of records

from cmapPy.pandasGEXpress import parse
vorinostat_only_gctoo = parse(
    "GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx",   # updated file name
    cid=vorinostat_ids)
/Users/tstoeger/apps/anaconda/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
some of the ids being used to subset the data are not present in the metadata for the file being parsed - mismatch_ids:  {'LJP009_HT29_24H:A03', 'LJP007_SKL.C_24H:A03', 'LJP008_PC3_24H:G08', 'LJP008_HCC515_24H:G07', 'LJP008_HA1E_24H:G07', 'LJP008_ASC_24H:G10', 'LJP008_NPC.CAS9_24H:A03', 'LJP008_SKL_24H:G08', 'LPROT003_PC3_6H:O11', 'LJP008_A549_24H:G09', 'LJP008_HCC515_24H:G10', 'LJP008_MCF7_24H:G07', 'LJP008_PC3_24H:G11', 'LJP008_HUVEC_24H:G11', 'LJP008_PC3_24H:A03', 'LJP009_A375_24H:A03', 'LJP008_ASC_24H:G11', 'LJP008_A549_24H:G11', 'LJP008_HEPG2_24H:G11', 'LJP008_HT29_24H:G07', 'LPROT003_A549_6H:O12', 'LJP008_HUVEC_24H:G07', 'LJP008_HUVEC_24H:G10', 'LJP008_HME1_24H:G12', 'LJP007_A375_24H:A03', 'LJP008_SKL.C_24H:G09', 'LJP008_NPC.CAS9_24H:G09', 'LJP007_HEPG2_24H:A03', 'LJP007_CD34_24H:A03', 'LPROT003_NPC_6H:P11', 'LJP008_HT29_24H:G12', 'LPROT001_A375_6H:P11', 'LJP008_HUVEC_24H:G12', 'LJP008_PC3_24H:G07', 'LJP008_ASC.C_24H:G11', 'LJP008_NEU_24H:G11', 'LJP008_SKL_24H:G10', 'LPROT003_A375_6H:P08', 'LPROT002_MCF7_6H:P12', 'LJP008_NEU_24H:G08', 'LJP008_HCC515_24H:G09', 'LJP008_ASC_24H:G08', 'LJP008_HME1_24H:A03', 'LJP008_NEU_24H:G09', 'LPROT001_PC3_6H:P10', 'LJP008_HEPG2_24H:G08', 'LJP008_HCC515_24H:A03', 'LJP009_SKL.C_24H:A03', 'LPROT003_A549_6H:O08', 'LJP009_HCC515_24H:A03', 'LJP008_ASC.C_24H:G10', 'LJP008_SKL.C_24H:G08', 'LJP008_CD34_24H:G12', 'LJP007_MCF7_24H:A03', 'LJP008_NPC_24H:G08', 'LJP008_SKL.C_24H:A03', 'LJP008_HEPG2_24H:G09', 'LJP008_HT29_24H:A03', 'LJP008_HA1E_24H:A03', 'LJP008_NPC_24H:G12', 'LJP008_A375_24H:G11', 'LJP009_CD34_24H:A03', 'LJP007_HME1_24H:A03', 'LJP009_MCF7_24H:A03', 'LJP008_A549_24H:G07', 'LJP008_NEU_24H:G12', 'LJP007_HT29_24H:A03', 'LJP008_HUVEC_24H:G08', 'LJP008_HUVEC_24H:A03', 'LJP008_A375_24H:G08', 'LJP008_HT29_24H:G10', 'LJP008_NPC.CAS9_24H:G11', 'LJP008_A375_24H:G09', 'LJP008_NEU_24H:G07', 'LJP008_SKL.C_24H:G10', 'LJP008_NEU_24H:A03', 'LJP009_NPC.CAS9_24H:A03', 'LPROT002_A549_6H:O09', 'LJP008_CD34_24H:G11', 'LJP008_NPC.CAS9_24H:G12', 'LJP009_ASC_24H:A03', 'LJP008_ASC_24H:G09', 'LJP008_HA1E_24H:G08', 'LJP008_SKL_24H:G07', 'LPROT001_MCF7_6H:O11', 'LJP008_A375_24H:A03', 'LJP008_CD34_24H:G07', 'LJP008_NPC.TAK_24H:G08', 'LPROT001_MCF7_6H:O07', 'LJP008_ASC_24H:A03', 'LJP008_PC3_24H:G10', 'LPROT001_A375_6H:P07', 'LPROT003_A375_6H:P10', 'LJP009_ASC.C_24H:A03', 'LPROT002_NPC.TAK_6H:O10', 'LJP009_SKL_24H:A03', 'LJP008_HT29_24H:G08', 'LJP008_PC3_24H:G09', 'LJP008_HCC515_24H:G08', 'LJP008_HME1_24H:G07', 'LJP008_SKL.C_24H:G07', 'LJP008_ASC.C_24H:G07', 'LJP008_ASC.C_24H:G09', 'LJP008_A375_24H:G12', 'LPROT003_NPC_6H:P09', 'LJP008_HT29_24H:G09', 'LPROT001_MCF7_6H:O09', 'LJP009_HA1E_24H:A03', 'LPROT003_PC3_6H:O07', 'LJP008_CD34_24H:A03', 'LJP007_A549_24H:A03', 'LJP008_HA1E_24H:G11', 'LJP007_HUES3_24H:A03', 'LPROT002_A375_6H:P07', 'LJP008_CD34_24H:G08', 'LJP008_MCF7_24H:G11', 'LJP008_A549_24H:G08', 'LJP009_HEPG2_24H:A03', 'LPROT001_PC3_6H:P08', 'LPROT003_NPC_6H:P07', 'LJP008_HME1_24H:G10', 'LJP007_SKL_24H:A03', 'LJP008_HA1E_24H:G10', 'LJP008_PC3_24H:G12', 'LJP008_SKL_24H:G09', 'LPROT001_PC3_6H:P12', 'LJP008_ASC_24H:G07', 'LPROT002_A375_6H:P11', 'LPROT003_A375_6H:P12', 'LJP008_NPC.TAK_24H:G11', 'LJP009_HUVEC_24H:A03', 'LJP009_HME1_24H:A03', 'LJP008_HCC515_24H:G12', 'LJP007_MNEU.E_24H:A03', 'LJP008_SKL_24H:G12', 'LJP008_A375_24H:G10', 'LJP009_NPC_24H:A03', 'LJP008_CD34_24H:G09', 'LJP008_HME1_24H:G09', 'LJP008_NEU_24H:G10', 'LJP008_MCF7_24H:G10', 'LJP008_A549_24H:A03', 'LJP008_HEPG2_24H:A03', 'LJP008_HME1_24H:G08', 'LJP008_NPC_24H:G07', 'LJP008_NPC.CAS9_24H:G08', 'LPROT002_MCF7_6H:P08', 'LJP008_NPC_24H:G09', 'LPROT001_A375_6H:P09', 'LJP008_ASC.C_24H:G08', 'LJP009_PC3_24H:A03', 'LJP008_HT29_24H:G11', 'LJP008_MCF7_24H:A03', 'LJP007_ASC_24H:A03', 'LJP008_NPC.CAS9_24H:G07', 'LPROT002_A549_6H:O07', 'LJP009_NPC.TAK_24H:A03', 'LJP007_NPC.TAK_24H:A03', 'LJP008_HEPG2_24H:G12', 'LJP008_NPC.CAS9_24H:G10', 'LPROT002_NPC.TAK_6H:O12', 'LJP008_NPC.TAK_24H:G10', 'LJP008_SKL_24H:A03', 'LJP008_SKL.C_24H:G11', 'LPROT001_NPC.TAK_6H:O10', 'LJP008_HCC515_24H:G11', 'LJP008_SKL.C_24H:G12', 'LJP008_ASC.C_24H:G12', 'LJP008_NPC_24H:A03', 'LJP007_NPC_24H:A03', 'LJP008_NPC.TAK_24H:G12', 'LPROT002_A549_6H:O11', 'LJP008_NPC.TAK_24H:A03', 'LJP008_HME1_24H:G11', 'LJP007_ASC.C_24H:A03', 'LJP008_MCF7_24H:G08', 'LJP007_HA1E_24H:A03', 'LJP008_MCF7_24H:G09', 'LJP008_ASC.C_24H:A03', 'LJP008_SKL_24H:G11', 'LJP008_A549_24H:G12', 'LPROT003_PC3_6H:O09', 'LJP007_HUVEC_24H:A03', 'LJP008_NPC_24H:G11', 'LPROT003_A549_6H:O10', 'LJP008_NPC.TAK_24H:G09', 'LJP008_HUVEC_24H:G09', 'LPROT001_NPC.TAK_6H:O08', 'LJP007_NEU_24H:A03', 'LJP008_NPC_24H:G10', 'LJP008_HA1E_24H:G09', 'LJP008_HEPG2_24H:G07', 'LJP008_A375_24H:G07', 'LJP008_MCF7_24H:G12', 'LJP008_NPC.TAK_24H:G07', 'LJP008_HEPG2_24H:G10', 'LPROT001_NPC.TAK_6H:O12', 'LJP007_JURKAT_24H:A03', 'LJP009_A549_24H:A03', 'LJP007_PC3_24H:A03', 'LPROT002_A375_6H:P09', 'LPROT002_NPC.TAK_6H:O08', 'LJP007_NPC.CAS9_24H:A03', 'LPROT002_MCF7_6H:P10', 'LJP008_HA1E_24H:G12', 'LJP009_NEU_24H:A03', 'LJP008_CD34_24H:G10', 'LJP007_HCC515_24H:A03', 'LJP008_ASC_24H:G12', 'LJP008_A549_24H:G10'}
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-3-f03c31e62771> in <module>()
      2 vorinostat_only_gctoo = parse(
      3     "GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx",   # updated file name
----> 4     cid=vorinostat_ids)

~/apps/anaconda/anaconda3/lib/python3.6/site-packages/cmapPy/pandasGEXpress/parse.py in parse(file_path, convert_neg_666, rid, cid, ridx, cidx, row_meta_only, col_meta_only, make_multiindex)
     60     elif file_path.endswith(".gctx"):
     61         curr = parse_gctx.parse(file_path, convert_neg_666, rid, cid, ridx, cidx, row_meta_only, col_meta_only,
---> 62                                 make_multiindex)
     63     else:
     64         err_msg = "File to parse must be .gct or .gctx!"

~/apps/anaconda/anaconda3/lib/python3.6/site-packages/cmapPy/pandasGEXpress/parse_gctx.py in parse(gctx_file_path, convert_neg_666, rid, cid, ridx, cidx, row_meta_only, col_meta_only, make_multiindex)
    101 
    102         # validate optional input ids & get indexes to subset by
--> 103         (sorted_ridx, sorted_cidx) = check_and_order_id_inputs(rid, ridx, cid, cidx, row_meta, col_meta)
    104 
    105         data_dset = gctx_file[data_node]

~/apps/anaconda/anaconda3/lib/python3.6/site-packages/cmapPy/pandasGEXpress/parse_gctx.py in check_and_order_id_inputs(rid, ridx, cid, cidx, row_meta_df, col_meta_df)
    140     ordered_ridx = get_ordered_idx(row_type, row_ids, row_meta_df)
    141 
--> 142     col_ids = check_and_convert_ids(col_type, col_ids, col_meta_df)
    143     ordered_cidx = get_ordered_idx(col_type, col_ids, col_meta_df)
    144     return (ordered_ridx, ordered_cidx)

~/apps/anaconda/anaconda3/lib/python3.6/site-packages/cmapPy/pandasGEXpress/parse_gctx.py in check_and_convert_ids(id_type, id_list, meta_df)
    173         if id_type == "id":
    174             id_list = convert_ids_to_meta_type(id_list, meta_df)
--> 175             check_id_validity(id_list, meta_df)
    176         else:
    177             check_idx_validity(id_list, meta_df)

~/apps/anaconda/anaconda3/lib/python3.6/site-packages/cmapPy/pandasGEXpress/parse_gctx.py in check_id_validity(id_list, meta_df)
    189             mismatch_ids)
    190         logger.error(msg)
--> 191         raise Exception("parse_gctx check_id_validity " + msg)
    192 
    193 

Exception: parse_gctx check_id_validity some of the ids being used to subset the data are not present in the metadata for the file being parsed - mismatch_ids:  {'LJP009_HT29_24H:A03', 'LJP007_SKL.C_24H:A03', 'LJP008_PC3_24H:G08', 'LJP008_HCC515_24H:G07', 'LJP008_HA1E_24H:G07', 'LJP008_ASC_24H:G10', 'LJP008_NPC.CAS9_24H:A03', 'LJP008_SKL_24H:G08', 'LPROT003_PC3_6H:O11', 'LJP008_A549_24H:G09', 'LJP008_HCC515_24H:G10', 'LJP008_MCF7_24H:G07', 'LJP008_PC3_24H:G11', 'LJP008_HUVEC_24H:G11', 'LJP008_PC3_24H:A03', 'LJP009_A375_24H:A03', 'LJP008_ASC_24H:G11', 'LJP008_A549_24H:G11', 'LJP008_HEPG2_24H:G11', 'LJP008_HT29_24H:G07', 'LPROT003_A549_6H:O12', 'LJP008_HUVEC_24H:G07', 'LJP008_HUVEC_24H:G10', 'LJP008_HME1_24H:G12', 'LJP007_A375_24H:A03', 'LJP008_SKL.C_24H:G09', 'LJP008_NPC.CAS9_24H:G09', 'LJP007_HEPG2_24H:A03', 'LJP007_CD34_24H:A03', 'LPROT003_NPC_6H:P11', 'LJP008_HT29_24H:G12', 'LPROT001_A375_6H:P11', 'LJP008_HUVEC_24H:G12', 'LJP008_PC3_24H:G07', 'LJP008_ASC.C_24H:G11', 'LJP008_NEU_24H:G11', 'LJP008_SKL_24H:G10', 'LPROT003_A375_6H:P08', 'LPROT002_MCF7_6H:P12', 'LJP008_NEU_24H:G08', 'LJP008_HCC515_24H:G09', 'LJP008_ASC_24H:G08', 'LJP008_HME1_24H:A03', 'LJP008_NEU_24H:G09', 'LPROT001_PC3_6H:P10', 'LJP008_HEPG2_24H:G08', 'LJP008_HCC515_24H:A03', 'LJP009_SKL.C_24H:A03', 'LPROT003_A549_6H:O08', 'LJP009_HCC515_24H:A03', 'LJP008_ASC.C_24H:G10', 'LJP008_SKL.C_24H:G08', 'LJP008_CD34_24H:G12', 'LJP007_MCF7_24H:A03', 'LJP008_NPC_24H:G08', 'LJP008_SKL.C_24H:A03', 'LJP008_HEPG2_24H:G09', 'LJP008_HT29_24H:A03', 'LJP008_HA1E_24H:A03', 'LJP008_NPC_24H:G12', 'LJP008_A375_24H:G11', 'LJP009_CD34_24H:A03', 'LJP007_HME1_24H:A03', 'LJP009_MCF7_24H:A03', 'LJP008_A549_24H:G07', 'LJP008_NEU_24H:G12', 'LJP007_HT29_24H:A03', 'LJP008_HUVEC_24H:G08', 'LJP008_HUVEC_24H:A03', 'LJP008_A375_24H:G08', 'LJP008_HT29_24H:G10', 'LJP008_NPC.CAS9_24H:G11', 'LJP008_A375_24H:G09', 'LJP008_NEU_24H:G07', 'LJP008_SKL.C_24H:G10', 'LJP008_NEU_24H:A03', 'LJP009_NPC.CAS9_24H:A03', 'LPROT002_A549_6H:O09', 'LJP008_CD34_24H:G11', 'LJP008_NPC.CAS9_24H:G12', 'LJP009_ASC_24H:A03', 'LJP008_ASC_24H:G09', 'LJP008_HA1E_24H:G08', 'LJP008_SKL_24H:G07', 'LPROT001_MCF7_6H:O11', 'LJP008_A375_24H:A03', 'LJP008_CD34_24H:G07', 'LJP008_NPC.TAK_24H:G08', 'LPROT001_MCF7_6H:O07', 'LJP008_ASC_24H:A03', 'LJP008_PC3_24H:G10', 'LPROT001_A375_6H:P07', 'LPROT003_A375_6H:P10', 'LJP009_ASC.C_24H:A03', 'LPROT002_NPC.TAK_6H:O10', 'LJP009_SKL_24H:A03', 'LJP008_HT29_24H:G08', 'LJP008_PC3_24H:G09', 'LJP008_HCC515_24H:G08', 'LJP008_HME1_24H:G07', 'LJP008_SKL.C_24H:G07', 'LJP008_ASC.C_24H:G07', 'LJP008_ASC.C_24H:G09', 'LJP008_A375_24H:G12', 'LPROT003_NPC_6H:P09', 'LJP008_HT29_24H:G09', 'LPROT001_MCF7_6H:O09', 'LJP009_HA1E_24H:A03', 'LPROT003_PC3_6H:O07', 'LJP008_CD34_24H:A03', 'LJP007_A549_24H:A03', 'LJP008_HA1E_24H:G11', 'LJP007_HUES3_24H:A03', 'LPROT002_A375_6H:P07', 'LJP008_CD34_24H:G08', 'LJP008_MCF7_24H:G11', 'LJP008_A549_24H:G08', 'LJP009_HEPG2_24H:A03', 'LPROT001_PC3_6H:P08', 'LPROT003_NPC_6H:P07', 'LJP008_HME1_24H:G10', 'LJP007_SKL_24H:A03', 'LJP008_HA1E_24H:G10', 'LJP008_PC3_24H:G12', 'LJP008_SKL_24H:G09', 'LPROT001_PC3_6H:P12', 'LJP008_ASC_24H:G07', 'LPROT002_A375_6H:P11', 'LPROT003_A375_6H:P12', 'LJP008_NPC.TAK_24H:G11', 'LJP009_HUVEC_24H:A03', 'LJP009_HME1_24H:A03', 'LJP008_HCC515_24H:G12', 'LJP007_MNEU.E_24H:A03', 'LJP008_SKL_24H:G12', 'LJP008_A375_24H:G10', 'LJP009_NPC_24H:A03', 'LJP008_CD34_24H:G09', 'LJP008_HME1_24H:G09', 'LJP008_NEU_24H:G10', 'LJP008_MCF7_24H:G10', 'LJP008_A549_24H:A03', 'LJP008_HEPG2_24H:A03', 'LJP008_HME1_24H:G08', 'LJP008_NPC_24H:G07', 'LJP008_NPC.CAS9_24H:G08', 'LPROT002_MCF7_6H:P08', 'LJP008_NPC_24H:G09', 'LPROT001_A375_6H:P09', 'LJP008_ASC.C_24H:G08', 'LJP009_PC3_24H:A03', 'LJP008_HT29_24H:G11', 'LJP008_MCF7_24H:A03', 'LJP007_ASC_24H:A03', 'LJP008_NPC.CAS9_24H:G07', 'LPROT002_A549_6H:O07', 'LJP009_NPC.TAK_24H:A03', 'LJP007_NPC.TAK_24H:A03', 'LJP008_HEPG2_24H:G12', 'LJP008_NPC.CAS9_24H:G10', 'LPROT002_NPC.TAK_6H:O12', 'LJP008_NPC.TAK_24H:G10', 'LJP008_SKL_24H:A03', 'LJP008_SKL.C_24H:G11', 'LPROT001_NPC.TAK_6H:O10', 'LJP008_HCC515_24H:G11', 'LJP008_SKL.C_24H:G12', 'LJP008_ASC.C_24H:G12', 'LJP008_NPC_24H:A03', 'LJP007_NPC_24H:A03', 'LJP008_NPC.TAK_24H:G12', 'LPROT002_A549_6H:O11', 'LJP008_NPC.TAK_24H:A03', 'LJP008_HME1_24H:G11', 'LJP007_ASC.C_24H:A03', 'LJP008_MCF7_24H:G08', 'LJP007_HA1E_24H:A03', 'LJP008_MCF7_24H:G09', 'LJP008_ASC.C_24H:A03', 'LJP008_SKL_24H:G11', 'LJP008_A549_24H:G12', 'LPROT003_PC3_6H:O09', 'LJP007_HUVEC_24H:A03', 'LJP008_NPC_24H:G11', 'LPROT003_A549_6H:O10', 'LJP008_NPC.TAK_24H:G09', 'LJP008_HUVEC_24H:G09', 'LPROT001_NPC.TAK_6H:O08', 'LJP007_NEU_24H:A03', 'LJP008_NPC_24H:G10', 'LJP008_HA1E_24H:G09', 'LJP008_HEPG2_24H:G07', 'LJP008_A375_24H:G07', 'LJP008_MCF7_24H:G12', 'LJP008_NPC.TAK_24H:G07', 'LJP008_HEPG2_24H:G10', 'LPROT001_NPC.TAK_6H:O12', 'LJP007_JURKAT_24H:A03', 'LJP009_A549_24H:A03', 'LJP007_PC3_24H:A03', 'LPROT002_A375_6H:P09', 'LPROT002_NPC.TAK_6H:O08', 'LJP007_NPC.CAS9_24H:A03', 'LPROT002_MCF7_6H:P10', 'LJP008_HA1E_24H:G12', 'LJP009_NEU_24H:A03', 'LJP008_CD34_24H:G10', 'LJP007_HCC515_24H:A03', 'LJP008_ASC_24H:G12', 'LJP008_A549_24H:G10'}

@tstoeger
Copy link
Author

tstoeger commented Mar 4, 2018

Overlooked the need for a very specific Python 2.7 environment (outlined in https://clue.io/cmapPy/build.html#install - and exceeding the information provided in readme - and being inconsistent with tutorial by leading to the setup of a cmappy version that would require parse.parse() instead of parse()).

To add to confusion the file names had changed between the tutorial and the public version of GSE70138 (which could have opened the possibility for a change of the file format ..).

@tstoeger tstoeger closed this as completed Mar 4, 2018
@oena
Copy link
Contributor

oena commented Mar 5, 2018

Hi @tstoeger, sorry you had difficulties in using the tutorial. If you have suggestions as to how to make installation instructions more clear, feel free to let us know; the README currently links out to ReadTheDocs in order to help us keep documentation in a centralized place and (hopefully) up to date.

Regarding the tutorial, I'll update the inconsistencies regarding use of parse methods. With regard to scope, we definitely hope to add more tutorials in the future, but for the time being only have one with GEO data because we guessed that would be the most common use case for the package. Just for the record--should you want to investigate error messages/bugs without dealing with external datasets in the future--we do already have a variety of files used for testing to disambiguate code vs. file issues; these are located in cmapPy/cmapPy/pandasGEXpress/tests/functional_tests.

@tstoeger
Copy link
Author

tstoeger commented Mar 5, 2018

Hi @oena ; Let me thank you at first - both for your inquiry, and the already existing documentation of cmapPY, which already has been very useful. Indeed the tutorial is a very nice extra.

My troubles had arisen from running into slightly different problems, and noticing that at least three distinct aspects seemed to have changed (version of used dataset, something related to external Python code, something related to cmapPY); As I'd take the tutorial as reference, this would hint at me overlooking something - but also not knowing for sure, which aspect I should trust or follow.

Possibly, the tutorial could:

  • prominently state that cmapPY - somewhat unexpectedly - needs a Python 2.7 environment; (possibly adding a check) (I misread the statement on the virtual environment as: "Follow good practice, and set up a dedicated virtual environment for individual tasks - and then import the packages listed in requirements.txt")
  • have a check on the cmapPY version number, and print this number
  • be complemented by a second tutorial (similar to the supplied unit tests) dedicated to testing the most basic workings (and using one of the supplied data sets rather than an external one)

@oena
Copy link
Contributor

oena commented Mar 5, 2018

Those points all seem very reasonable to me, thanks! I'll see what we can do to address them better than we do currently.

@benanbardak
Copy link

Hi,
Although I'm using Python version 2.7, I get the error "Exception: parse_gctx check_id_validity" that you received above, but not the metadata for the file being parsed - mismatch_ids: ... The file I'm trying to run is GSE92742. I would appreciate it if you could tell me how you solved the above problem.

@tstoeger
Copy link
Author

I made a Python 3 compatible version of cmapPy; Credits for identifying critical section go to @heltena

In my usage scenario a single line addition was sufficient.

curr_dset.read_direct(temp_array)
temp_array = np.core.defchararray.decode(temp_array, 'utf8')  # <- introduced for Python3 compatibility
header_values[str(k)] = temp_array

My usage scenario was restricted to gctx files, which simplifies the problem of Python 3 compatibility. I didn’t check definition of gctx regarding future compatibility of encoding.I have only constructed tests with GSE92742 level 5, and I additionally bypassed GCToo instances as output I have always been only using the data frame contained within them (hence, I did not check their creation for compatibility with Python3). The above covers my usage of cmapPy.

@saksham219
Copy link
Contributor

Hi @benanbardak
It would be helpful if you can mention which file you are using from GEO to read in the metadata. There are five files given here

@benanbardak
Copy link

Firstly thank you for response,
I am using "GSE92742_Broad_LINCS_Level3_INF_mlr12k_n1319138x12328.gctx.gz". But I get an error "Exception: parse_gctx check_id_validity some of the ids being used to subset the data are not present in the metadata for the file being parsed - mismatch_ids:.."

@saksham219
Copy link
Contributor

That is a 48 GB file so I will take some time to try to download it. I tried it with another file from the same series "GSE92742_Broad_LINCS_Level2_GEX_delta_n49216x978.gctx.gz" and metadata parsing is working in python2.
If you can try it with this file, and it fails then the issue might be with your version of cmapPy. If it does not fail with this smaller file, it might be the case that the 48gb file has something different going on that the package is not able to handle

@benanbardak
Copy link

And please check your email.. @tstoeger

@benanbardak
Copy link

@saksham219 I tried to run tutorial with this data "GSE92742_Broad_LINCS_Level2_GEX_delta_n49216x978.gctx.gz". But again I get an same error. How can I solve this problem? What does mean "the issue might be with your version of cmapPy. " How can I fixed version of cmapPy?
Thank you so much.

@saksham219
Copy link
Contributor

saksham219 commented Aug 23, 2019

@benanbardak What I mean is that you might not be using the latest version on the master branch of this repo.
you can try running this from the terminal

$ git clone https://github.com/cmap/cmapPy
$ pip install cmapPy/

and then trying to read the file again in a new python environment.

If the problem still persists, it would be helpful if you could list down the versions of the packages in your python by
$ pip freeze

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants