mismatch between metadata and gctx #62

ElyasMo · 2019-12-12T16:05:29Z

I am trying to parse the:

1-GSE70138_Broad_LINCS_Level3_INF_mlr12k_n345976x12328_2017-03-06.gctx.gz
2-GSE70138_Broad_LINCS_Level3_INF_mlr12k_n78980x22268_2015-06-30.gct.gz
3-GSE70138_Broad_LINCS_Level4_ZSPCINF_mlr12k_n113012x22268_2015-12-31.gct.gz

files with:

1-GSE70138_Broad_LINCS_sig_info_2017-03-06.txt.gz
or
2-GSE70138_Broad_LINCS_inst_info_2017-03-06.txt.gz

metadata files. I am trying to make a subset of files to make the process possible and easy to handle.

import pandas as pd
sig_info = pd.read_csv("GSE70138_Broad_LINCS_sig_info_2017-03-06.txt", sep="\t")
mcf7_cell = sig_info["pert_id"][sig_info["cell_id"] == "MCF7"][sig_id["pert_idose"]=="10.0 um"][sig_info["pert_itime"]=="24 h"]
from cmapPy.pandasGEXpress.parse import parse
MCF7_details = parse("GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328_2017-03-06.gctx", cid=mcf7_cell)

Each time I do this with the:
GSE70138_Broad_LINCS_Level3_INF_mlr12k_n345976x12328_2017-03-06.gctx.gz
I see an error:

some of the ids being used to subset the data are not present in the metadata for the file being parsed - mimatch_ids: {'neratinib'}
Traceback (most recent call last):
File "", line 1, in
File "/home/sysmedicine/anaconda3/envs/my_conda/lib/python3.8/site-packages/cmapPy/pandasGEXpress/parse.py, line 65, in parse
out = parse_gctx.parse(file_path, convert_neg_666=convert_neg_666,
File "/home/sysmedicine/anaconda3/envs/my_conda/lib/python3.8/site-packages/cmapPy/pandasGEXpress/parse_gcx.py", line 107, in parse
(sorted_ridx, sorted_cidx) = check_and_order_id_inputs(rid, ridx, cid, cidx, row_meta, col_meta)
File "/home/sysmedicine/anaconda3/envs/my_conda/lib/python3.8/site-packages/cmapPy/pandasGEXpress/parse_gcx.py", line 146, in check_and_order_id_inputs
col_ids = check_and_convert_ids(col_type, col_ids, col_meta_df)
File "/home/sysmedicine/anaconda3/envs/my_conda/lib/python3.8/site-packages/cmapPy/pandasGEXpress/parse_gcx.py", line 179, in check_and_convert_ids
check_id_validity(id_list, meta_df)
File "/home/sysmedicine/anaconda3/envs/my_conda/lib/python3.8/site-packages/cmapPy/pandasGEXpress/parse_gcx.py", line 195, in check_id_validity
raise Exception("parse_gctx check_id_validity " + msg)
Exception: parse_gctx check_id_validity some of the ids being used to subset the data are not present in themetadata for the file being parsed - mismatch_ids: {'neratinib'}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mismatch between metadata and gctx #62

mismatch between metadata and gctx #62

ElyasMo commented Dec 12, 2019

mismatch between metadata and gctx #62

mismatch between metadata and gctx #62

Comments

ElyasMo commented Dec 12, 2019

How can I fix this problem???