<a href="https://polly.elucidata.io/manage/workspaces?action=open_polly_notebook&source=github&path=path_place_holder&kernel=elucidata/Python 3.10&machine=medium" target="_parent"><img src="https://elucidatainc.github.io/PublicAssets/open_polly.svg" alt="Open in Polly"/></a>


# Ontology recommendations for disease and tissue using polly-python

Ontology recommendation functionality for disease, tissue are added in Polly-Python. In the existing SQL query itslef, the users would now be able to call a function - 'recommend' on disease and tissue column of metadata to get recommendations. 

Usage of 'recommend' function - 

recommend(field_name, search_term, key - ['match' | 'related'])

field_name -> It can take value: disease, tissue, curated_disease, curated_tissue based on V1 or V2 APIs.

search_term -> Disease or tissue name for which recommendations are required.

key -> Can be "match" or "related"

    match - Only the terms that have an exact match of the keyword in them will be returned as an output.
        
    related - The list of expanded terms would contain the matched terms, the synonyms, and hypernyms of the keyword as per MeSH ontology. 

## For users querying V2 infrastructure

For 'match' query in disease - 

query = """SELECT * FROM geo.datasets WHERE CONTAINS(curated_disease, recommend('curated_disease', 'breast neoplasms', 'match'))"""

For 'related' query in tissue - 

query = """SELECT * FROM geo.datasets WHERE CONTAINS(curated_tissue, recommend( 'curated_tissue', 'liver', 'related'))"""


In [1]:
# please do not modify
from IPython.display import display_html
def restartkernel() :
    display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)

# Import polly-python

In [2]:
!sudo pip3 install polly-python  #Restart kernel after the cell executes.

Collecting polly-python
  Downloading polly_python-0.1.5-py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 2.7 MB/s eta 0:00:011
Collecting retrying==1.3.3
  Downloading retrying-1.3.3.tar.gz (10 kB)
Collecting black
  Downloading black-22.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 8.0 MB/s eta 0:00:01
[?25hCollecting pytz==2021.1
  Downloading pytz-2021.1-py2.py3-none-any.whl (510 kB)
[K     |████████████████████████████████| 510 kB 132.2 MB/s eta 0:00:01
[?25hCollecting elucidatacloudpathlib==0.6.6
  Downloading elucidatacloudpathlib-0.6.6-py3-none-any.whl (50 kB)
[K     |████████████████████████████████| 50 kB 20.5 MB/s  eta 0:00:01
[?25hCollecting rst2txt
  Downloading rst2txt-1.1.0-py2.py3-none-any.whl (12 kB)
Collecting Deprecated
  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Collecting Cerberus==1.3.2
  Downloading Cerberus-1.3.2.tar.gz (52 kB)
[K     |███

In [None]:
restartkernel() #Pause for a few seconds before the kernel is refreshed

# Import Dependencies

In [1]:
import os
from polly.auth import Polly
from polly.omixatlas import OmixAtlas

# Auth With Token on Polly

In [2]:
POLLY_REFRESH_TOKEN = os.environ['POLLY_REFRESH_TOKEN']
omixatlas = OmixAtlas(POLLY_REFRESH_TOKEN)

# SQL Queries for V2 storage infrastructure

## Previous query on V2 infrastructure
Before implementation of this feature, users query for a given tissue and disease as shown below. 

For this query, user is able to fetch 1388 datasets for the given disease and tissue combination.

In [4]:
sql_query = """SELECT dataset_id, curated_disease, curated_tissue FROM geo.datasets WHERE 
        CONTAINS(curated_disease,'Breast Neoplasms') AND 
        CONTAINS(curated_tissue,'breast')""" 
result = omixatlas.query_metadata(sql_query)
result

Query execution succeeded (time taken: 1.96 seconds, data scanned: 0.908 MB)
Fetched 1388 rows


Unnamed: 0,dataset_id,curated_disease,curated_tissue
0,GSE9691_GPL3921,[Breast Neoplasms],[breast]
1,GSE97221_GPL10558,[Breast Neoplasms],[breast]
2,GSE97317_GPL11154,[Breast Neoplasms],[breast]
3,GSE9734_GPL4742,"[Breast Neoplasms, Carcinoma, Pancreatic Duc...","[pancreas, breast, kidney, colon]"
4,GSE97482_GPL10332,[Breast Neoplasms],[breast]
...,...,...,...
1383,GSE9483_GPL6071,"[Neoplasms, Basal Cell, Breast Cancer, Fami...",[breast]
1384,GSE95035_GPL10558,[Breast Neoplasms],[breast]
1385,GSE95087_GPL16956,[Breast Neoplasms],[breast]
1386,GSE95304_GPL11154,[Breast Neoplasms],[breast]


## New queries after implementation of ontology recommendations
Now the users can query as shown below. 

For query with ontology recommendations, the user is able to fetch 2223 datasets for the given disease and tissue combination. This is ~60% higher than previous ones.

In [6]:
sql_query = """SELECT dataset_id, curated_disease, curated_tissue FROM geo.datasets WHERE 
        CONTAINS(curated_disease, recommend('curated_disease', 'breast neoplasms', 'related')) AND 
        CONTAINS(curated_tissue, recommend('curated_tissue', 'breast', 'related'))""" 
result = omixatlas.query_metadata(sql_query)
result

Query execution succeeded (time taken: 2.04 seconds, data scanned: 0.908 MB)
Fetched 2223 rows


Unnamed: 0,dataset_id,curated_disease,curated_tissue
0,GSE96058_GPL11154,"[Triple Negative Breast Neoplasms, Brittle co...",[breast]
1,GSE96058_GPL18573,"[Triple Negative Breast Neoplasms, Brittle co...",[breast]
2,GSE96085_GPL15084,"[Carcinoma, Neoplasms, Second Primary, Brea...",[mammary gland]
3,GSE96520_GPL4135,"[Mammary Neoplasms, Animal, Breast Neoplasms]",[mammary gland]
4,GSE96567_GPL15084,"[Breast Neoplasms, Carcinoma, Neoplasms, Se...",[mammary gland]
...,...,...,...
2218,GSE38912_GPL11154,[Breast Neoplasms],"[colon, breast]"
2219,GSE38912_GPL15433,[Breast Neoplasms],"[colon, breast]"
2220,GSE38912_GPL3921,[Breast Neoplasms],"[colon, breast]"
2221,GSE38912_GPL9052,[Breast Neoplasms],"[breast, colon]"


## Other query examples on V2 infrastructure

In [7]:
sql_query = """SELECT * FROM geo.datasets WHERE 
                CONTAINS(curated_disease, recommend('curated_disease', 'hepatitis', 'match'))"""
result = omixatlas.query_metadata(sql_query)
result

Query execution succeeded (time taken: 2.07 seconds, data scanned: 38.845 MB)
Fetched 554 rows


Unnamed: 0,data_matrix_available,curated_organism,src_uri,total_num_samples,year,description,curated_cell_line,data_table_name,data_table_version,platform,...,abstract,version,curated_strain,bucket,curated_tissue,dataset_source,data_type,overall_design,is_current,region
0,,[Homo sapiens],polly:data://GEO_data_lake/data/Microarray/GSE...,34.0,2018.0,Role of Humoral Immunity against Hepatitis B V...,[None],geo__gse96851_gpl570,0.0,Microarray,...,,0,[None],discover-prod-datalake-v1,[liver],GEO,Transcriptomics,Liver samples were obtained from 4 patients wi...,True,us-west-2
1,False,[Homo sapiens],polly:data://GEO_data_lake/data/GEO_metadata/G...,54.0,2017.0,A Pharmacogenomic Landscape in Human Liver Can...,"[SK-HEP-1, CLC33, CLC26, CLC17, CLC30, HL...",,,RNAseq,...,,0,[None],discover-prod-datalake-v1,[None],GEO,Transcriptomics,RNAseq for 81 liver cancer cell models was per...,True,us-west-2
2,False,[Homo sapiens],polly:data://GEO_data_lake/data/GEO_metadata/G...,3.0,2017.0,A Pharmacogenomic Landscape in Human Liver Can...,"[SK-HEP-1, CLC49, CLC26, SNU-354, Mahlavu,...",,,RNAseq,...,,0,[None],discover-prod-datalake-v1,[None],GEO,Transcriptomics,RNAseq for 81 liver cancer cell models was per...,True,us-west-2
3,False,[Homo sapiens],polly:data://GEO_data_lake/data/GEO_metadata/G...,16.0,2018.0,A Pharmacogenomic Landscape in Human Liver Can...,"[SK-HEP-1, SNU-398, JHH-4, SNU-886, CLC43,...",,,RNAseq,...,,0,[None],discover-prod-datalake-v1,[None],GEO,Transcriptomics,RNAseq for 81 liver cancer cell models was per...,True,us-west-2
4,,[Mus musculus],polly:data://GEO_data_lake/data/RNASeq/GSE9723...,10.0,2018.0,Pyroptosis by Caspase11/4-Gasdermin-D Pathway ...,[None],geo__gse97234_gpl13112,0.0,RNASeq,...,,0,[C57BL/6],discover-prod-datalake-v1,[liver],GEO,Transcriptomics,"9 total samples = 3 AH liver, 3 ASH liver, 3 c...",True,us-west-2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
549,,[Mus musculus],polly:data://GEO_data_lake/data/Microarray/GSE...,20.0,2019.0,Analysis of differentially expressed genes in ...,[None],geo__gse138916_gpl21163,0.0,Microarray,...,,0,[None],discover-prod-datalake-v1,[liver],GEO,Transcriptomics,snap-frozen liver samples were obtained from g...,True,us-west-2
550,,[Mus musculus],polly:data://GEO_data_lake/data/RNASeq/GSE1389...,10.0,2019.0,MBOAT7's role in the progression of Non-alcoho...,[None],geo__gse138945_gpl13112,0.0,RNASeq,...,Recent studies have identified a genetic varia...,0,[None],discover-prod-datalake-v1,[liver],GEO,Transcriptomics,RNAseq of liver homogenates from high fat diet...,True,us-west-2
551,,[Mus musculus],polly:data://GEO_data_lake/data/RNASeq/GSE1389...,10.0,2019.0,LPI's role in the progression of Non-alcoholic...,[None],geo__gse138946_gpl13112,0.0,RNASeq,...,Recent studies have identified a genetic varia...,0,[None],discover-prod-datalake-v1,[liver],GEO,Transcriptomics,RNAseq of Liver homogenate +/- 18:0 Lysophosph...,True,us-west-2
552,False,[Homo sapiens],polly:data://GEO_data_lake/data/GEO_metadata/G...,3.0,2019.0,Rimonabant suppresses RNA transcription of hep...,[Hep-G2],,,RNAseq,...,,0,[None],discover-prod-datalake-v1,[None],GEO,Transcriptomics,Transcriptome analysis of PHH treated with DMS...,True,us-west-2


In [19]:
sql_query = """SELECT dataset_id, curated_tissue FROM geo.datasets WHERE 
            CONTAINS(curated_tissue, recommend('curated_tissue', 'liver', 'related'))"""
result = omixatlas.query_metadata(sql_query)
result

Query execution succeeded (time taken: 3.07 seconds, data scanned: 0.574 MB)
Fetched 7216 rows


Unnamed: 0,dataset_id,curated_tissue
0,GSE9581_GPL6119,"[brain, liver, testis, heart]"
1,GSE9581_GPL6120,"[brain, liver, testis, heart]"
2,GSE9588_GPL4372,[liver]
3,GSE96059_GPL17021,[liver]
4,GSE96093_GPL17021,[liver]
...,...,...
7211,GSE75277_GPL1261,[liver]
7212,GSE75285_GPL16298,"[liver, blood, blastema]"
7213,GSE75285_GPL570,"[blood, liver, blastema]"
7214,GSE75285_GPL6801,"[blood, liver, blastema]"


In [9]:
sql_query = """SELECT dataset_id, curated_disease, curated_tissue FROM geo.datasets 
WHERE (CONTAINS(curated_disease, recommend('curated_disease', 'breast neoplasms', 'related')) OR
CONTAINS(curated_disease, recommend('curated_disease', 'pancreatic neoplasms', 'related')))AND 
(CONTAINS(curated_tissue, recommend('curated_tissue', 'breast', 'related')) OR 
CONTAINS(curated_tissue, recommend('curated_tissue', 'pancreas', 'related')))"""
result = omixatlas.query_metadata(sql_query)
result

Query execution succeeded (time taken: 2.15 seconds, data scanned: 0.908 MB)
Fetched 2565 rows


Unnamed: 0,dataset_id,curated_disease,curated_tissue
0,GSE96058_GPL11154,"[Triple Negative Breast Neoplasms, Brittle co...",[breast]
1,GSE96058_GPL18573,"[Triple Negative Breast Neoplasms, Brittle co...",[breast]
2,GSE96085_GPL15084,"[Carcinoma, Neoplasms, Second Primary, Brea...",[mammary gland]
3,GSE96520_GPL4135,"[Mammary Neoplasms, Animal, Breast Neoplasms]",[mammary gland]
4,GSE96567_GPL15084,"[Breast Neoplasms, Carcinoma, Neoplasms, Se...",[mammary gland]
...,...,...,...
2560,GSE95304_GPL11154,[Breast Neoplasms],[breast]
2561,GSE95472_GPL6244,[Triple Negative Breast Neoplasms],[breast]
2562,GSE95554_GPL17117,[Breast Neoplasms],"[breast, oil secretion]"
2563,GSE95700_GPL570,[Triple Negative Breast Neoplasms],[breast]


# Examples of Ontology Recommendation for Sample-level queries

In [3]:
sql_query = """SELECT sample_id, curated_tissue, curated_disease, curated_cell_line FROM geo.samples
WHERE (CONTAINS(curated_disease, recommend('curated_disease', 'neo', 'match'))
AND CONTAINS(curated_tissue, recommend('curated_tissue', 'breast', 'match')))"""
result = omixatlas.query_metadata(sql_query)
result

Query execution succeeded (time taken: 76.97 seconds, data scanned: 6.683 MB)
Fetched 59628 rows


Unnamed: 0,sample_id,curated_tissue,curated_disease,curated_cell_line
0,GSM3495050,breast,[Breast Neoplasms],none
1,GSM3495051,breast,[Breast Neoplasms],none
2,GSM3495054,breast,[Breast Neoplasms],none
3,GSM3495055,breast,[Breast Neoplasms],none
4,GSM3495057,breast,[Breast Neoplasms],none
...,...,...,...,...
59623,GSM810959,breast,[Breast Neoplasms],MCF-7
59624,GSM810960,breast,[Breast Neoplasms],MCF-7
59625,GSM810961,breast,[Breast Neoplasms],MCF-7
59626,GSM810962,breast,[Breast Neoplasms],MCF-7


In [9]:
sql_query = """SELECT * FROM geo.samples WHERE CONTAINS(curated_disease, recommend('curated_disease', 'Breast Neoplasms'))"""
result = omixatlas.query_metadata(sql_query)
result

Query execution succeeded (time taken: 39.76 seconds, data scanned: 128.050 MB)


  df = pd.read_csv(local_file_path)
  data_df = pd.read_csv(local_file_path, converters=converter_dict)


Fetched 97885 rows


Unnamed: 0,growth_protocol_ch1,src_uri,sample_id,curated_gene_modified,dose_ch1,curated_cohort_name,curated_control,src_dataset_id,extract_protocol_ch1,characteristics_ch2,...,label_ch1,time_point_ch1,characteristics_ch1_3,characteristics_ch1_2,curated_tissue,curated_drug_smiles_code,hyb_protocol,platform_id,is_current,characteristics_ch1_1
0,MCF7 cells were cultivated in Dulbecco’s modif...,polly:data://GEO_data_lake/data/Microarray/GSE...,GSM1152733,[none],,,0.0,GSE47583_GPL10558,RNA isolation was done using mirVana™ (Life te...,,...,Cyanine3-streptavidin,,,,none,[],Standard Illumina hybridization protocol. 750 ...,GPL10558,True,culture method: 2D monolayer cell culture
1,MCF7 cells were cultivated in Dulbecco’s modif...,polly:data://GEO_data_lake/data/Microarray/GSE...,GSM1152734,[none],,,0.0,GSE47583_GPL10558,RNA isolation was done using mirVana™ (Life te...,,...,Cyanine3-streptavidin,,,,none,[],Standard Illumina hybridization protocol. 750 ...,GPL10558,True,culture method: Matrigel on-top 3D culture for...
2,MCF7 cells were cultivated in Dulbecco’s modif...,polly:data://GEO_data_lake/data/Microarray/GSE...,GSM1152735,[none],,,0.0,GSE47583_GPL10558,RNA isolation was done using mirVana™ (Life te...,,...,Cyanine3-streptavidin,,,,none,[],Standard Illumina hybridization protocol. 750 ...,GPL10558,True,culture method: Matrigel on-top 3D culture for...
3,MCF7 cells were cultivated in Dulbecco’s modif...,polly:data://GEO_data_lake/data/Microarray/GSE...,GSM1152736,[none],,,0.0,GSE47583_GPL10558,RNA isolation was done using mirVana™ (Life te...,,...,Cyanine3-streptavidin,,,,none,[],Standard Illumina hybridization protocol. 750 ...,GPL10558,True,culture method: polyHEMA anchorage independent...
4,MCF7 cells were cultivated in Dulbecco’s modif...,polly:data://GEO_data_lake/data/Microarray/GSE...,GSM1152737,[none],,,0.0,GSE47583_GPL10558,RNA isolation was done using mirVana™ (Life te...,,...,Cyanine3-streptavidin,,,,none,[],Standard Illumina hybridization protocol. 750 ...,GPL10558,True,culture method: polyHEMA anchorage independent...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97880,3T3-L1 cells were cultured in DMEM supplemente...,polly:data://GEO_data_lake/data/Microarray/GSE...,GSM2526597,[none],,MDA-MB-; 3T3-L1_cocultured with_MDA-MB-,0.0,GSE95827_GPL16570,Trizol extraction of total RNA was performed a...,,...,biotin,,,cocultured with: MDA-MB-231,none,[],Approximately 5.5 μg of labeled DNA target was...,GPL16570,True,cell type: adipocyte cell line
97881,3T3-L1 cells were cultured in DMEM supplemente...,polly:data://GEO_data_lake/data/Microarray/GSE...,GSM2526598,[none],,MDA-MB-; 3T3-L1_cocultured with_MDA-MB-,0.0,GSE95827_GPL16570,Trizol extraction of total RNA was performed a...,,...,biotin,,,cocultured with: MDA-MB-231,none,[],Approximately 5.5 μg of labeled DNA target was...,GPL16570,True,cell type: adipocyte cell line
97882,3T3-L1 cells were cultured in DMEM supplemente...,polly:data://GEO_data_lake/data/Microarray/GSE...,GSM2526599,[none],,MDA-MB-; 3T3-L1_cocultured with_MDA-MB-,0.0,GSE95827_GPL16570,Trizol extraction of total RNA was performed a...,,...,biotin,,,cocultured with: MDA-MB-231,none,[],Approximately 5.5 μg of labeled DNA target was...,GPL16570,True,cell type: adipocyte cell line
97883,MCF-7 and MDA-MB-231 are maintained in a base ...,polly:data://GEO_data_lake/data/RNASeq/GSE7801...,GSM2064540,[none],,MCF-; MCF-7 cell line,1.0,GSE78011_GPL18573,After treatment medium is removed and a total ...,,...,,,,,none,[],,GPL18573,True,


In [17]:
sql_query = """SELECT * FROM geo.samples WHERE CONTAINS(curated_tissue, recommend('curated_tissue', 'liver'))"""
result = omixatlas.query_metadata(sql_query)
result

Query execution succeeded (time taken: 37.99 seconds, data scanned: 128.050 MB)


  df = pd.read_csv(local_file_path)
  data_df = pd.read_csv(local_file_path, converters=converter_dict)


Fetched 138351 rows


Unnamed: 0,growth_protocol_ch1,src_uri,sample_id,curated_gene_modified,dose_ch1,curated_cohort_name,curated_control,src_dataset_id,extract_protocol_ch1,characteristics_ch2,...,label_ch1,time_point_ch1,characteristics_ch1_3,characteristics_ch1_2,curated_tissue,curated_drug_smiles_code,hyb_protocol,platform_id,is_current,characteristics_ch1_1
0,Adult (5-6 weeks of age) male B6C3F1/J mice we...,polly:data://GEO_data_lake/data/RNASeq/GSE1002...,GSM2677453,[none],0.22,,0.0,GSE100296_GPL13112,Total RNA was extracted from liver or kidney s...,,...,,,dose: 0.22,chemical (mmol/kg ): TCE,liver,[ClC=C(Cl)Cl],,GPL13112,True,tissue: liver
1,Adult (5-6 weeks of age) male B6C3F1/J mice we...,polly:data://GEO_data_lake/data/RNASeq/GSE1002...,GSM2677454,[none],0.22,,0.0,GSE100296_GPL13112,Total RNA was extracted from liver or kidney s...,,...,,,dose: 0.22,chemical (mmol/kg ): TCE,liver,[ClC=C(Cl)Cl],,GPL13112,True,tissue: liver
2,Adult (5-6 weeks of age) male B6C3F1/J mice we...,polly:data://GEO_data_lake/data/RNASeq/GSE1002...,GSM2677455,[none],0.22,,0.0,GSE100296_GPL13112,Total RNA was extracted from liver or kidney s...,,...,,,dose: 0.22,chemical (mmol/kg ): TCE,liver,[ClC=C(Cl)Cl],,GPL13112,True,tissue: liver
3,Adult (5-6 weeks of age) male B6C3F1/J mice we...,polly:data://GEO_data_lake/data/RNASeq/GSE1002...,GSM2677456,[none],0.67,,0.0,GSE100296_GPL13112,Total RNA was extracted from liver or kidney s...,,...,,,dose: 0.67,chemical (mmol/kg ): TCE,liver,[ClC=C(Cl)Cl],,GPL13112,True,tissue: liver
4,Adult (5-6 weeks of age) male B6C3F1/J mice we...,polly:data://GEO_data_lake/data/RNASeq/GSE1002...,GSM2677457,[none],0.67,,0.0,GSE100296_GPL13112,Total RNA was extracted from liver or kidney s...,,...,,,dose: 0.67,chemical (mmol/kg ): TCE,liver,[ClC=C(Cl)Cl],,GPL13112,True,tissue: liver
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
138346,Fetal liver progenitors from e14.5 mouse embry...,polly:data://GEO_data_lake/data/RNASeq/GSE6534...,GSM1592863,[SPI1],,Retroviral vector expressing PU.1-ETS construct,0.0,GSE65344_GPL13112,Sorted cells were lysed in the TRIzol reagent ...,,...,,,,retroviral transduction: Retroviral vector exp...,liver,[],,GPL13112,True,cell type: In-vitro differentiated DN2 cells
138347,Fetal liver progenitors from e14.5 mouse embry...,polly:data://GEO_data_lake/data/RNASeq/GSE6534...,GSM1592864,[SPI1],,Retroviral vector expressing PU.1-Eng construct,0.0,GSE65344_GPL13112,Sorted cells were lysed in the TRIzol reagent ...,,...,,,,retroviral transduction: Retroviral vector exp...,liver,[],,GPL13112,True,cell type: In-vitro differentiated DN2 cells
138348,Fetal liver progenitors from e14.5 mouse embry...,polly:data://GEO_data_lake/data/RNASeq/GSE6534...,GSM1592859,[none],,Retroviral vector backbone without insert,1.0,GSE65344_GPL13112,Sorted cells were lysed in the TRIzol reagent ...,,...,,,,retroviral transduction: Retroviral vector bac...,liver,[],,GPL13112,True,cell type: In-vitro differentiated DN2 cells
138349,Fetal liver progenitors from e14.5 mouse embry...,polly:data://GEO_data_lake/data/RNASeq/GSE6534...,GSM1592861,[SPI1],,Retroviral vector expressing PU.1-Eng construct,0.0,GSE65344_GPL13112,Sorted cells were lysed in the TRIzol reagent ...,,...,,,,retroviral transduction: Retroviral vector exp...,liver,[],,GPL13112,True,cell type: In-vitro differentiated DN2 cells


In [4]:
sql_query = """SELECT * FROM geo.samples WHERE CONTAINS(curated_cell_line, recommend('curated_cell_line', 'neo', 'match disease'))"""
result = omixatlas.query_metadata(sql_query)
result

Query execution succeeded (time taken: 8.69 seconds, data scanned: 128.054 MB)
Fetched 77 rows


Unnamed: 0,growth_protocol_ch1,src_uri,sample_id,curated_gene_modified,dose_ch1,curated_cohort_name,curated_control,src_dataset_id,extract_protocol_ch1,characteristics_ch2,...,label_ch1,time_point_ch1,characteristics_ch1_3,characteristics_ch1_2,curated_tissue,curated_drug_smiles_code,hyb_protocol,platform_id,is_current,characteristics_ch1_1
0,,polly:data://GEO_data_lake/data/RNASeq/GSE1219...,GSM3450407,[none],,cells transfected with unloaded plasmid,1,GSE121951_GPL21290,"RNA was extracted by RNeasy Kits (Qiagen, Hild...",,...,,,,,none,[],,GPL21290,True,treatment: cells transfected with unloaded pla...
1,,polly:data://GEO_data_lake/data/RNASeq/GSE1219...,GSM3450408,[none],,cells transfected with unloaded plasmid,1,GSE121951_GPL21290,"RNA was extracted by RNeasy Kits (Qiagen, Hild...",,...,,,,,none,[],,GPL21290,True,treatment: cells transfected with unloaded pla...
2,,polly:data://GEO_data_lake/data/RNASeq/GSE1219...,GSM3450410,[none],,cells transfected with overexpressed lncRNA,0,GSE121951_GPL21290,"RNA was extracted by RNeasy Kits (Qiagen, Hild...",,...,,,,,none,[],,GPL21290,True,treatment: cells transfected with overexpresse...
3,,polly:data://GEO_data_lake/data/RNASeq/GSE1219...,GSM3450411,[none],,cells transfected with overexpressed lncRNA,0,GSE121951_GPL21290,"RNA was extracted by RNeasy Kits (Qiagen, Hild...",,...,,,,,none,[],,GPL21290,True,treatment: cells transfected with overexpresse...
4,"HTR-8/SVneo cells, maintained at passage 38-45...",polly:data://GEO_data_lake/data/RNASeq/GSE1750...,GSM437411,[none],,invasive cytotrophoblast; mRNA from invasive h...,0,GSE17501_GPL6104,Total RNA was isolated using the RNeasy kit (Q...,,...,Biotin,,,sample type: mRNA from invasive human cytotrop...,none,[],Hybridization to Illumina Sentrix Expression B...,GPL6104,True,cell line: HTR-8/SVneo
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
72,,polly:data://GEO_data_lake/data/GEO_metadata/G...,GSM5573056,[none],,HTR-8/SVneo; Trophoblast cells; Untreated,1,GSE183879_GPL24676,Total RNA from each sample was extracted using...,,...,,,,,placenta,[],,GPL24676,True,
73,,polly:data://GEO_data_lake/data/GEO_metadata/G...,GSM5573057,[none],,HTR-8/SVneo; Trophoblast cells; Untreated,1,GSE183879_GPL24676,Total RNA from each sample was extracted using...,,...,,,,,placenta,[],,GPL24676,True,
74,,polly:data://GEO_data_lake/data/GEO_metadata/G...,GSM5573058,[none],,HTR-8/SVneo; Trophoblast cells; Treated with 1...,0,GSE183879_GPL24676,Total RNA from each sample was extracted using...,,...,,,,,placenta,[],,GPL24676,True,
75,,polly:data://GEO_data_lake/data/GEO_metadata/G...,GSM5573059,[none],,HTR-8/SVneo; Trophoblast cells; Treated with 1...,0,GSE183879_GPL24676,Total RNA from each sample was extracted using...,,...,,,,,placenta,[],,GPL24676,True,
