Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mismatch between metadata and gctx #62

Open
ElyasMo opened this issue Dec 12, 2019 · 0 comments
Open

mismatch between metadata and gctx #62

ElyasMo opened this issue Dec 12, 2019 · 0 comments

Comments

@ElyasMo
Copy link

ElyasMo commented Dec 12, 2019

I am trying to parse the:

1-GSE70138_Broad_LINCS_Level3_INF_mlr12k_n345976x12328_2017-03-06.gctx.gz
2-GSE70138_Broad_LINCS_Level3_INF_mlr12k_n78980x22268_2015-06-30.gct.gz
3-GSE70138_Broad_LINCS_Level4_ZSPCINF_mlr12k_n113012x22268_2015-12-31.gct.gz

files with:

1-GSE70138_Broad_LINCS_sig_info_2017-03-06.txt.gz
or
2-GSE70138_Broad_LINCS_inst_info_2017-03-06.txt.gz

metadata files. I am trying to make a subset of files to make the process possible and easy to handle.

import pandas as pd
sig_info = pd.read_csv("GSE70138_Broad_LINCS_sig_info_2017-03-06.txt", sep="\t")
mcf7_cell = sig_info["pert_id"][sig_info["cell_id"] == "MCF7"][sig_id["pert_idose"]=="10.0 um"][sig_info["pert_itime"]=="24 h"]
from cmapPy.pandasGEXpress.parse import parse
MCF7_details = parse("GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328_2017-03-06.gctx", cid=mcf7_cell)

Each time I do this with the:
GSE70138_Broad_LINCS_Level3_INF_mlr12k_n345976x12328_2017-03-06.gctx.gz
I see an error:

some of the ids being used to subset the data are not present in the metadata for the file being parsed - mimatch_ids: {'neratinib'}
Traceback (most recent call last):
File "", line 1, in
File "/home/sysmedicine/anaconda3/envs/my_conda/lib/python3.8/site-packages/cmapPy/pandasGEXpress/parse.py, line 65, in parse
out = parse_gctx.parse(file_path, convert_neg_666=convert_neg_666,
File "/home/sysmedicine/anaconda3/envs/my_conda/lib/python3.8/site-packages/cmapPy/pandasGEXpress/parse_gcx.py", line 107, in parse
(sorted_ridx, sorted_cidx) = check_and_order_id_inputs(rid, ridx, cid, cidx, row_meta, col_meta)
File "/home/sysmedicine/anaconda3/envs/my_conda/lib/python3.8/site-packages/cmapPy/pandasGEXpress/parse_gcx.py", line 146, in check_and_order_id_inputs
col_ids = check_and_convert_ids(col_type, col_ids, col_meta_df)
File "/home/sysmedicine/anaconda3/envs/my_conda/lib/python3.8/site-packages/cmapPy/pandasGEXpress/parse_gcx.py", line 179, in check_and_convert_ids
check_id_validity(id_list, meta_df)
File "/home/sysmedicine/anaconda3/envs/my_conda/lib/python3.8/site-packages/cmapPy/pandasGEXpress/parse_gcx.py", line 195, in check_id_validity
raise Exception("parse_gctx check_id_validity " + msg)
Exception: parse_gctx check_id_validity some of the ids being used to subset the data are not present in themetadata for the file being parsed - mismatch_ids: {'neratinib'}

How can I fix this problem???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant