<a href="https://polly.elucidata.io/manage/workspaces?action=open_polly_notebook&amp;source=github&amp;path=ElucidataInc%2Fpolly-python%2Fblob%2Fmain%2FDiscover%2Fontology_driven_recommendation.ipynb&amp;kernel=elucidata%2FPython+3.10&amp;machine=medium" target="_parent"><img alt="Open in Polly" src="https://elucidatainc.github.io/PublicAssets/open_polly.svg"/></a>


# Ontology-driven recommendations using polly-python

### What is Ontology-driven Recommendation?

Ontology-driven recommendation is a feature that enables users to find similar or related suggestions for a given input term. The word "ontology" means that the feature is able to leverage the synonyms and relationships that exist between different related terms described in the ontology.

### Which biomedical entities are currently enabled for recommendations?

The feature is currently enabled on the **curated fields** on Polly, as mentioned below:
1. Disease
2. Tissue
3. Cell-line
4. Drug
5. Cell-Type

### Special features that are introduced for cell-line, drug and cell-type recommendation

1. **Find cell-lines using disease and tissue**:
   
   **Why is this useful?**
       A user may not be aware of all the cell-lines that have been studied in the context of a disease. Or all cell-lines that have been sampled from a tissue.
   
   **How is this used?**
       A user can simply enter a disease or tissue for which they are interested in knowing all the related cell-lines.  Semantic Enrichment of the input is performed and specific terms are recommended
       
2. **Finding related cell-lines**:
        
   **How is this used?**
       A user can simply enter a cell-line for which they are interested in knowing all the related cell-lines. The cell-lines returned are related to the input by a common disease or tissue.
   
3. **Find drugs using interacting genes:**

    **Why is this useful?**
       A user may be interested in finding drugs that have known interactions with a particular gene of interest, this can potentially help discover associations that were previously unknown.
   
   **How is this used?**
       A user can simply enter a gene identifier (HUGO Symbol, Entrez ID, Ensembl ID or Gene Alias) for which they are interested in knowing all the related drugs.  Semantic Enrichment of the input is performed and specific terms are recommended.
       
4. **Find related drugs that are similar in their 3D structure:**
    
    **Why is this useful?**
        The current drug recommendation system can recommend drugs that are similar in structure. This is useful for exploring drugs for repurposing.
        
    **How is this used?**
        When a user enter a drug identifier of interest, similar drugs are recommended which are related by 3D structure.
        
5. **Find related cell-types that are related to a tissue:**
    
    **Why is this useful?**
        The current cell-type recommendation system can recommend cell-types that are a part of a given tissue group. This is useful for exploring a large number of cells that belong to a tissue.
        
    **How is this used?**
        When a user enter a tissue identifier of interest, related cell-types are recommended.

In [1]:
# please do not modify
from IPython.display import display_html
def restartkernel() :
    display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)

## Install polly-python

In [3]:
!pip3 install polly-python --quiet #Restart kernel after the cell executes.


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.10 -m pip install --upgrade pip[0m


In [None]:
restartkernel() #Pause for a few seconds before the kernel is refreshed

## Import Dependencies

In [1]:
import os
import pandas as pd
from polly.auth import Polly
from polly.omixatlas import OmixAtlas

## Authentication

In [2]:
#POLLY_REFRESH_TOKEN = os.environ['POLLY_REFRESH_TOKEN']
POLLY_REFRESH_TOKEN = "eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiUlNBLU9BRVAifQ.Edv4u-laD8INLyfj5l_tpu9NqvDzSMxMpzk-JHzx89yLvmHDZfUhCyy2rWox2mkFrGIkJGmEu-urMcvgePvQ587bryK0O7lmhAYgLUFGbHU_Xj-nou5xBzvZvDXLp3uK96ZqrxY4LV8Zic7IJMSH6sC8UpeWqEuJ50Jg9EtKCRXZbiDUFwdGHaftW1AIUVSnikdKKdJChMYbG2H-HVjSSpNcNKS6D85MeUplyI3ydulO4NpA40yjeFfnkpDQpxYYryYwtqMrNCwgg9zhAzSWl2f2yeDUUN_Mckes0CGWPY6rXVeC9uIVhcba3FzTzSDNalLBa3ifPC7fhXm1v2-H1g.7d--z3Zr3FaUUSb_.oBqqdj9gc_1U06yDbWTtHrYpNmq3gKwNYnUlb-kf5gYa091qMdYlsM9IvnSBCoWu9qeMQdYCSRDbbRvbsx90mfNMsGmDKKrqYb2pJW8hmj8ilKU1gML8A6lIWdydlu01mpvW8iWrY2PQTkrpH9Zj6OUuDK-OGsDC4wM6Ocql8IsXs0csQyomQnC9Aeyn9NOkBKcEGGo7QuFR-kI52W_sCkFtnQHi5evrKO4STz2xPY6VARVUYvCK4IstpjQBFOV2WT1sq08Pt13WhtrBceTjFM6D2Pzzz6m87ihU77Zl71CmJlikkSqODdgYilVzlFH1p3OGn3GiG_2JxC77-b2pO0rCbX6IVR60osI4hmP6m97wIdg-OwlN8uaZ0yfZfMtuxyCnDICB-NNdFl8xMi8cLwLhq15wUGS-FHSCNwyidWiRSP-HTgMiK9ujNMe1z1gVq6f7yGkua0Sc34Tvr8WRQ3UfLBeM0dT8C4kmoxGm993ocwkADvlPRAc26kiuo6v0ADLdwPjQsYtKUbBbCZh6ay-oykGm_lfS541T5JdbHCUy-GcXbwumVYtQC2KV6HZ1YVxCEy0aGmZJ5YX9GInU6pguxDb52uX6hNsrOyo249KvWZ6mHq1EDEOER8ozS4nt0lRLJxVFMB629aqvoW6Aeue7HMxUwAMa0JIorSw-spLhzeaRRQ_ais-pcOJu9mAjTEu_6WDAru_w1gCRjdGnswDX8--SDEMHTuZzVEi2P-QdaNgmpn80gp426_XEw-92VgGpKgF4u4DdBskPsb5eqAzdu21DEFy6AdECpd2MP9_O5jj0O22CIW5WlTm_3QavXmGpxd7PODarDhgUA-958NBABI2QGQzf8XAf7FcpYuV0ZCaVmYJuY1p75xrXQOox-6OL-2h4FaFW-r-kxhTurOwKvg9LTrZQ937NN-laiFTdSp3A_05K1c2PwDtd9naQDH31FK_3xAQh3wWbbIOW-14rKcr5V6L4cc7-tN3vbbrcBwS2hu2lM1JhS8phY2x07kL61k5cPZg6ULQWJTF4miEQh6eqOBTEHB5_tnPSU5xxvbXvgLLV4lG-K2T3stRGeZeQx4CwpQpfQjEH69p8_PIYOKqa3LSfOg-hyWsbpvJ0StJwmaSJs5WnvnsUXzPqTNtfLNX0ZFjHhkLDXzmpFR8jlkRXjc8mywA2OQBMsOUckQPqqZ_u-0w87xQ4jN7WeAa0xqXtCch_k_jS4Lpi-b97F8vOVo9UbMs9FWXMivzSZmZFa5Bty9YIttFP5FG1HErvDcVx0K161W_LbevixLVyS5rfgRDfQIwJSfH54fSUSfyKn7p_75Nqci0.YvjvdvcbCC2TK0AK8aFn3w"
omixatlas = OmixAtlas(POLLY_REFRESH_TOKEN)

## SQL Query examples for querying datasets using Ontology-driven recommendations

## Find datasets using disease and tissue

**Query Structure**

`field`: the curated field on which to query  
`keyword`: the input keyword for which datasets are to be queried  
`match|related`: match returns terms where the curated term contains the keyword and related returns terms that are related to the keyword such as hypernyms and hyponyms  

`recommend(field, keyword, 'match|related')`

### Example 1: Find datasets using disease and tissue

In [3]:
sql_query = """SELECT dataset_id, curated_disease, curated_tissue FROM geo.datasets WHERE 
        CONTAINS(curated_disease, recommend('curated_disease', 'breast neoplasms', 'related')) AND 
        CONTAINS(curated_tissue, recommend('curated_tissue', 'breast', 'related'))""" 
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 2.64 seconds, data scanned: 1.222 MB)
Fetched 2672 rows


Unnamed: 0,dataset_id,curated_disease,curated_tissue
0,GSE96859_GPL11154,"[Fibrocystic Breast Disease, Triple Negative B...","[mammary gland, epithelium]"
1,GSE96860_GPL11154,"[Fibrocystic Breast Disease, Triple Negative B...","[mammary gland, epithelium]"
2,GSE96867_GPL11154,"[Fibrocystic Breast Disease, Triple Negative B...","[mammary gland, epithelium]"
3,GSE9691_GPL3921,[Breast Neoplasms],[breast]
4,GSE97177_GPL6947,"[Thoracic Neoplasms, Carcinoma, Ductal, Breast...",[breast]


### Example 2

In [14]:
sql_query = """SELECT dataset_id, curated_disease FROM geo.datasets WHERE 
                CONTAINS(curated_disease, recommend('curated_disease', 'hepatitis', 'match'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 2.38 seconds, data scanned: 1.023 MB)
Fetched 675 rows


Unnamed: 0,dataset_id,curated_disease
0,GSE95403_GPL14844,"[Hepatitis, Alcoholic, Albinism deafness syndr..."
1,GSE95424_GPL21103,"[Liver Neoplasms, Liver Cirrhosis, Hepatitis B..."
2,GSE96851_GPL570,"[Acute-On-Chronic Liver Failure, Hepatitis B, ..."
3,GSE97098_GPL11154,"[CATSHL syndrome, Cornea Plana 1, Liver and in..."
4,GSE97098_GPL20301,"[CATSHL syndrome, Cornea Plana 1, Liver and in..."


### Example 3

In [13]:
sql_query = """SELECT dataset_id, curated_tissue FROM geo.datasets WHERE 
            CONTAINS(curated_tissue, recommend('curated_tissue', 'liver', 'related'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 3.00 seconds, data scanned: 0.629 MB)
Fetched 8185 rows


Unnamed: 0,dataset_id,curated_tissue
0,GSE95341_GPL19057,[liver]
1,GSE95345_GPL6885,[liver]
2,GSE95359_GPL19057,[liver]
3,GSE9536_GPL96,[liver]
4,GSE95401_GPL17021,"[lung, kidney, adult, spinal cord, brain, liver]"


## Find datasets using cell-lines

### Query Structure

`field`: the curated field on which to query   
`keyword`: the input keyword for which datasets are to be queried  
`keyword_field`: the field using which cell-lines are recommended. default = `cell_line`  
        - `match|related disease`: find cell-lines using match or related disease  
        - `match|related tissue`: find cell-lines using match or related tissue  
        
`recommend(field, keyword, keyword_field)`

### Example 1: Find datasets using cell-lines that match the keyword

In [23]:
sql_query = """SELECT dataset_id, curated_cell_line FROM geo.datasets 
    WHERE CONTAINS(curated_cell_line, recommend('curated_cell_line', 'a549', 'match'))"""
result = pd.DataFrame(omixatlas.query_metadata(sql_query))
result.head()

Query execution succeeded (time taken: 2.45 seconds, data scanned: 0.593 MB)
Fetched 954 rows


Unnamed: 0,dataset_id,curated_cell_line
0,GSE96649_GPL18573,[A-549]
1,GSE96677_GPL17021,[A-549]
2,GSE96774_GPL18573,[A-549]
3,GSE96779_GPL18573,[A-549]
4,GSE96781_GPL18573,[A-549]


### Example 2: Find datasets using cell-lines that are related to the keyword

In [36]:
sql_query = """SELECT dataset_id, curated_cell_line FROM geo.datasets 
    WHERE CONTAINS(curated_cell_line, recommend('curated_cell_line', 'a549', 'related'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 5.11 seconds, data scanned: 0.593 MB)
Fetched 1703 rows


Unnamed: 0,dataset_id,curated_cell_line
0,GSE95558_GPL11154,"[None, HCC827]"
1,GSE95592_GPL11154,[HCC827]
2,GSE95856_GPL4134,[MLE-12]
3,GSE96649_GPL18573,[A-549]
4,GSE96677_GPL17021,[A-549]


*As a result, more cell-lines related to the input will be returned when using the `related` argument*

### Example 3: Find datasets using cell-lines that are related to the disease

In [38]:
sql_query = """SELECT dataset_id, curated_cell_line, curated_disease FROM geo.datasets 
    WHERE CONTAINS(curated_cell_line, recommend('curated_cell_line', 'thyroid cancer', 'related disease'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 2.65 seconds, data scanned: 1.177 MB)
Fetched 35 rows


Unnamed: 0,dataset_id,curated_cell_line,curated_disease
0,GSE97002_GPL10332,"[B-CPAP, 8505C]","[Thyroid Carcinoma, Anaplastic, Thyroid Neopla..."
1,GSE97028_GPL10332,"[B-CPAP, 8505C]","[Alzheimer Disease, Familial, 1, Alzheimer dis..."
2,GSE97030_GPL10332,"[B-CPAP, 8505C]","[Saprochaete clavata infection, Bipolar Disord..."
3,GSE97031_GPL10332,"[B-CPAP, 8505C]","[Thyroid Neoplasms, Atkin syndrome, Thyroid Ca..."
4,GSE97427_GPL10332,"[B-CPAP, 8505C]","[Thyroid Carcinoma, Anaplastic, Thyroid Neopla..."


### Example 4: Find datasets using cell-lines that are related to the tissue

In [41]:
sql_query = """SELECT dataset_id, curated_cell_line, curated_tissue FROM geo.datasets 
    WHERE CONTAINS(curated_cell_line, recommend('curated_cell_line', 'liver', 'related tissue'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 3.25 seconds, data scanned: 0.782 MB)
Fetched 1413 rows


Unnamed: 0,dataset_id,curated_cell_line,curated_tissue
0,GSE9539_GPL6087,[Hep-G2],[None]
1,GSE95406_GPL6246,[Hepa 1-6],[liver]
2,GSE95698_GPL17077,"[Hep-G2, L-02]",[liver]
3,GSE96760_GPL16791,[H-STS],[None]
4,GSE96792_GPL10558,"[SNU-761, Huh-7, Hep 3B2.1-7, Hep-G2]",[None]


## Find datasets using drugs

### Query Structure

`field`: the curated field on which to query   
`keyword`: the input keyword for which datasets are to be queried  
`keyword_field`: the field using which drugs are recommended, default = `drug`       
        - `match|related by structure`: find drugs using drugs that match or a related by 3D structure.  
        - `match gene`: find drugs that interact with the input gene identifier.    
        *(Enter a HUGO Symbol, Alias, Entrez ID or Ensembl ID)*  

`recommend(field, keyword, keyword_field)`

### Example 1: Find datasets using drugs that match the keyword

In [52]:
sql_query = """SELECT dataset_id, curated_drug FROM geo.datasets 
    WHERE CONTAINS(curated_drug, recommend('curated_drug', 'formaldehyde', 'match'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 3.30 seconds, data scanned: 0.259 MB)
Fetched 2 rows


Unnamed: 0,dataset_id,curated_drug
0,GSE2083_GPL891,"[krypton(0), Paraffin, argon(0), 4-(azidoanili..."
1,GSE109211_GPL13938,"[sorafenib, formaldehyde, L-tyrosine]"


### Example 2: Find datasets using drugs that are related by structure to the keyword

In [53]:
sql_query = """SELECT dataset_id, curated_drug FROM geo.datasets 
    WHERE CONTAINS(curated_drug, recommend('curated_drug', 'chembl1255', 'related by structure'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 2.61 seconds, data scanned: 0.364 MB)
Fetched 5 rows


Unnamed: 0,dataset_id,curated_drug
0,GSE109211_GPL13938,"[sorafenib, formaldehyde, L-tyrosine]"
1,GSE7171_GPL4733,"[amine, histaminium, gamma-aminobutyric acid, ..."
2,GSE2083_GPL891,"[krypton(0), Paraffin, argon(0), 4-(azidoanili..."
3,GSE158613_GPL23038,[isoflurane]
4,GSE158614_GPL23038,"[isoflurane, phosphate(3-)]"


### Example 3: Find datasets for drugs using gene that are known interact

In [56]:
sql_query = """SELECT dataset_id, curated_drug, curated_gene FROM geo.datasets 
    WHERE CONTAINS(curated_drug, recommend('curated_drug', 'egfr', 'match gene'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 3.12 seconds, data scanned: 0.289 MB)
Fetched 1 rows


Unnamed: 0,dataset_id,curated_drug,curated_gene
0,GSE3068_GPL85,"[streptozocin, vanadium(0), sulfate, vanadyl s...",[]


## Find datasets using cell-types

### Query Structure

`field`: the curated field on which to query   
`keyword`: the input keyword for which datasets are to be queried  
`keyword_field`: the field using which drugs are recommended, default = `tissue`       
        - `match|related tissue`: find cell-types using tissue that match or a related to the given tissue term.  

`recommend(field, keyword, keyword_field)`

### Example 1: Find datasets using cell-types that match the keyword

In [6]:
sql_query = """SELECT dataset_id, curated_cell_type FROM geo.datasets 
    WHERE CONTAINS(curated_cell_type, recommend('curated_cell_type', 'neuron', 'match'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 3.19 seconds, data scanned: 0.626 MB)
Fetched 4457 rows


Unnamed: 0,dataset_id,curated_cell_type
0,GSE96722_GPL13112,"[GABAergic interneuron, inhibitory neuron]"
1,GSE96826_GPL570,"[foreskin fibroblast, neuron, pluripotent stem..."
2,GSE96886_GPL4135,"[neuron of cerebral cortex, pericyte cell]"
3,GSE96938_GPL17021,"[microglial cell, neuron, neutrophil]"
4,GSE96939_GPL17021,"[neuron, perivascular macrophage, neutrophil, ..."


### Example 2: Find datasets using cell-types that are related to the keyword

In [8]:
sql_query = """SELECT dataset_id, curated_cell_type FROM geo.datasets 
    WHERE CONTAINS(curated_cell_type, recommend('curated_cell_type', 'neuron', 'related'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 2.76 seconds, data scanned: 0.626 MB)
Fetched 5017 rows


Unnamed: 0,dataset_id,curated_cell_type
0,GSE96722_GPL13112,"[GABAergic interneuron, inhibitory neuron]"
1,GSE96826_GPL570,"[foreskin fibroblast, neuron, pluripotent stem..."
2,GSE96845_GPL7202,[neural progenitor cell]
3,GSE96853_GPL570,"[retinal progenitor cell, retinal ganglion cel..."
4,GSE96886_GPL4135,"[neuron of cerebral cortex, pericyte cell]"


### Example 3: Find datasets using cell-types that match a tissue term

In [10]:
sql_query = """SELECT dataset_id, curated_cell_type FROM geo.datasets 
    WHERE CONTAINS(curated_cell_type, recommend('curated_cell_type', 'brain', 'match tissue'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 2.98 seconds, data scanned: 0.626 MB)
Fetched 2577 rows


Unnamed: 0,dataset_id,curated_cell_type
0,GSE96738_GPL17021,[oligodendrocyte]
1,GSE96899_GPL16570,"[lymphocyte, astrocyte]"
2,GSE96938_GPL17021,"[microglial cell, neuron, neutrophil]"
3,GSE96939_GPL17021,"[neuron, perivascular macrophage, neutrophil, ..."
4,GSE96961_GPL17021,[glial cell]


### Example 4: Find datasets using cell-types that are related to a tissue term

In [11]:
sql_query = """SELECT dataset_id, curated_cell_type FROM geo.datasets 
    WHERE CONTAINS(curated_cell_type, recommend('curated_cell_type', 'brain', 'related tissue'))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 4.04 seconds, data scanned: 0.626 MB)
Fetched 35492 rows


Unnamed: 0,dataset_id,curated_cell_type
0,GSE151390_GPL16791,[B cell]
1,GSE151390_GPL18573,[B cell]
2,GSE151390_GPL21697,[B cell]
3,GSE151395_GPL17021,[T cell]
4,GSE15139_GPL8300,"[neutrophil, macrophage]"


## Examples of Ontology Recommendation for Sample-level queries

### Example 1: Find samples using disease and tissue

In [60]:
sql_query = """SELECT sample_id, curated_tissue, curated_disease, curated_cell_line FROM geo.samples
WHERE (CONTAINS(curated_disease, recommend('curated_disease', 'thyroid cancer', 'match'))
AND CONTAINS(curated_tissue, recommend('curated_tissue', 'thyroid', 'match')))"""
result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 3.36 seconds, data scanned: 8.737 MB)
Fetched 191 rows


Unnamed: 0,curated_cell_line,curated_disease,curated_tissue,sample_id
0,none,"[Thyroid Cancer, Papillary]",thyroid gland,GSM831759
1,none,"[Thyroid Cancer, Papillary]",thyroid gland,GSM831760
2,none,"[Thyroid Cancer, Papillary]",thyroid gland,GSM831761
3,none,"[Thyroid Cancer, Papillary]",thyroid gland,GSM831762
4,none,"[Thyroid Cancer, Papillary]",thyroid gland,GSM831763


### Example 2: Find samples using cell-lines

In [67]:
sql_query = """SELECT sample_id, curated_tissue, curated_disease, curated_cell_line FROM geo.samples 
    WHERE CONTAINS(curated_cell_line, recommend('curated_cell_line', 'hepatitis', 'match disease'))"""

result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 3.96 seconds, data scanned: 8.737 MB)
Fetched 2171 rows


Unnamed: 0,curated_cell_line,curated_disease,curated_tissue,sample_id
0,HepaRG,[Hepatitis C],none,GSM2944963
1,HepaRG,[Hepatitis C],none,GSM2944964
2,HepaRG,[Hepatitis C],none,GSM2944965
3,HepaRG,[Hepatitis C],none,GSM2944966
4,HepaRG,[Hepatitis C],none,GSM2944967


### Example 3: Find samples for drugs 

In [66]:
sql_query = """SELECT sample_id, curated_tissue, curated_disease, curated_drug FROM geo.samples 
    WHERE CONTAINS(curated_drug, recommend('curated_drug', 'sorafenib', 'match'))"""

result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 2.28 seconds, data scanned: 7.737 MB)
Fetched 42 rows


Unnamed: 0,curated_disease,curated_drug,curated_tissue,sample_id
0,"[Carcinoma, Hepatocellular]",[sorafenib],none,GSM2544170
1,"[Carcinoma, Hepatocellular]",[sorafenib],none,GSM2544171
2,"[Carcinoma, Hepatocellular]",[sorafenib],none,GSM2544172
3,"[Carcinoma, Hepatocellular]",[sorafenib],none,GSM2544152
4,"[Carcinoma, Hepatocellular]",[sorafenib],none,GSM2544153
5,"[Carcinoma, Hepatocellular]",[sorafenib],none,GSM2544154
6,"[Carcinoma, Hepatocellular]",[sorafenib],none,GSM2544158
7,"[Carcinoma, Hepatocellular]",[sorafenib],none,GSM2544159
8,"[Carcinoma, Hepatocellular]",[sorafenib],none,GSM2544160
9,"[Carcinoma, Hepatocellular]",[sorafenib],none,GSM2544164


### Example 4: Find samples for cell-types 

In [16]:
sql_query = """SELECT sample_id, curated_tissue, curated_disease, curated_cell_type FROM geo.samples 
    WHERE CONTAINS(curated_cell_type, recommend('curated_cell_type', 'germ cell', 'related'))"""

result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 4.09 seconds, data scanned: 8.905 MB)
Fetched 9582 rows


Unnamed: 0,curated_cell_type,curated_disease,curated_tissue,sample_id
0,[spermatogonium],"[[""Normal""]]",testis,GSM2918168
1,[spermatogonium],"[[""Normal""]]",testis,GSM2918169
2,[spermatogonium],"[[""Normal""]]",testis,GSM2918170
3,[spermatogonium],"[[""Normal""]]",testis,GSM2918171
4,[spermatogonium],"[[""Normal""]]",testis,GSM2918172


### Example 5: Find samples for cell-types related to a tissue

In [18]:
sql_query = """SELECT sample_id, curated_tissue, curated_disease, curated_cell_type FROM geo.samples 
    WHERE CONTAINS(curated_cell_type, recommend('curated_cell_type', 'immune system', 'related tissue')) LIMIT 10"""

result = omixatlas.query_metadata(sql_query)
result.head()

Query execution succeeded (time taken: 2.53 seconds, data scanned: 8.905 MB)
Fetched 10 rows


Unnamed: 0,curated_cell_type,curated_disease,curated_tissue,sample_id
0,[melanocyte],[Normal],none,GSM6428922
1,[melanocyte],[Normal],none,GSM6428923
2,[melanocyte],[Normal],none,GSM6428924
3,[myoblast],[Normal],none,GSM2592304
4,[myoblast],[Normal],none,GSM2592305
