## Question

Which genes express in the Pancreas are related to diabetes?
For that we first query [humanmine.org](https://www.humanmine.org/humanmine), an integrated database of *Homo sapiens* genomic data.

We use the intermine API and the IDR API. Inspired by [Workshop_Pax6Workflow](https://github.com/intermine/intermine-ws-python-docs/blob/master/Workshop_Pax6Workflow.ipynb)

### Install dependencies if required
The cell below will install dependencies if you choose to run the notebook in [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb#recent=true).

In [13]:
%pip install intermine

Note: you may need to restart the kernel to use updated packages.


### Import libraries 

In [14]:
# libraries to interact with intermine
from intermine.webservice import Service

# libraries to interact with IDR
import requests
import json

## Intermine queries
Our first query looked at whether the set of **Pax6 targets** is expressed in the pancreas. 

In [15]:
TARGET_LIST = "PL_Pax6_Targets"
TISSUE = "Pancreas"
DISEASE = "diabetes"

In [16]:
service = Service("https://www.humanmine.org/humanmine/service")

In [17]:
query = service.new_query("Gene")

In [18]:
query.add_view(
    "primaryIdentifier", "symbol", "proteinAtlasExpression.cellType",
    "proteinAtlasExpression.level", "proteinAtlasExpression.reliability",
    "proteinAtlasExpression.tissue.name"
)

<intermine.query.Query at 0x7f20c9ed59d0>

Specify the tissue and the expression level.

In [19]:
query.add_constraint("Gene", "IN", TARGET_LIST, code = "A")
query.add_constraint("proteinAtlasExpression.tissue.name", "=", TISSUE, code = "B")
query.add_constraint("proteinAtlasExpression.level", "ONE OF", ["Medium", "High"], code = "C")

<MultiConstraint: Gene.proteinAtlasExpression.level ONE OF ['Medium', 'High']>

Collect the genes that are upregulated.

In [20]:
upin_pancreas = list()
for row in query.rows():
    upin_pancreas.append(row["primaryIdentifier"])

The second query looks for genes that are associated with the disease **diabetes**. 

In [21]:
query = service.new_query("Gene")
query.add_view("primaryIdentifier", "symbol")
query.add_constraint("organism.name", "=", "Homo sapiens", code = "A")
query.add_constraint("diseases.name", "CONTAINS", DISEASE, code = "B")

<BinaryConstraint: Gene.diseases.name CONTAINS diabetes>

In [22]:
diabetes_genes = list()
for row in query.rows():
    diabetes_genes.append(row["primaryIdentifier"])

Next, we look for those genes that are upregulated in the specified tissue e.g. Pancreas that are also associated with the specified disease e.g. diabetes. We need to intersect both list of results.

In [23]:
combined = {}
joined_list = diabetes_genes + upin_pancreas
for key in joined_list:
    combined[key] = combined.get(key, 0) + 1

intersected_list = list()
for key, value in combined.items():
    if value == 1:
        intersected_list.append(key)

### Second Query: GWAS
Finally, we fed the intersected list from above back into another query to see if there was any association of these genes with diabetes phenotypes according to GWAS studies. Note that we now start our query from the GWAS class:

In [24]:
query = service.new_query("GWAS")

In [25]:
query.add_view(
    "results.associatedGenes.primaryIdentifier",
    "results.associatedGenes.symbol", "results.associatedGenes.name",
    "results.SNP.primaryIdentifier", "results.pValue", "results.phenotype",
    "firstAuthor", "name", "publication.pubMedId",
    "results.associatedGenes.organism.shortName"
)

<intermine.query.Query at 0x7f20d818c310>

In [26]:
query.add_constraint("results.pValue", "<=", "1e-04", code = "B")
query.add_constraint("results.phenotype", "CONTAINS", "diabetes", code = "C")

<BinaryConstraint: GWAS.results.phenotype CONTAINS diabetes>

In [27]:
genes_list = list()
for row in query.rows():
    value = row["results.associatedGenes.primaryIdentifier"]
    if value in intersected_list:
        genes_list.append(row["results.associatedGenes.symbol"])
genes = set(genes_list)

## IDR queries

From the list of genes found using the intermine API, we are now looking in [Image Data Resource](https://idr.openmicroscopy.org/) for studies linked to those genes and with *tissue* as a ``Sample Type`` 

In [28]:
TYPE = "gene"
SAMPLE_TYPE = "tissue"
KEYS = {"phenotype":
    ("Phenotype",
     "Phenotype Term Name",
     "Phenotype Term Accession",
     "Phenotype Term Accession URL", 
    )
}

### Set up base URLS so can use shorter variable names later on

In [30]:
URL = "https://idr.openmicroscopy.org/mapr/api/{key}/?value={value}&case_sensitive=false&orphaned=true"
SCREENS_PROJECTS_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/?value={value}"
ATTRIBUTES_URL = "https://idr.openmicroscopy.org/webclient/api/annotations/?type=map&{object}={object_id}"

### Set up where to query and session

In [31]:
INDEX_PAGE = "https://idr.openmicroscopy.org/webclient/?experimenter=-1"

# create http session
with requests.Session() as session:
    request = requests.Request('GET', INDEX_PAGE)
    prepped = session.prepare_request(request)
    response = session.send(prepped)
    if response.status_code != 200:
        response.raise_for_status()

### Helper methods

In [32]:
def find_type(data_type, id):
    '''
    Find the studies of type specified by the SAMPLE_TYPE parameter
    '''
    qs = {'object': data_type, 'object_id': id}
    url = ATTRIBUTES_URL.format(**qs)
    for a in session.get(url).json()['annotations']:
        for v in a['values']:
            if v[0] == "Sample Type" and v[1] == SAMPLE_TYPE:
                return id
    return -1

In [33]:
screens = list()
projects = list()
for gene in genes:
    qs1 = {'key': TYPE, 'value': gene}
    url1 = URL.format(**qs1)
    json = session.get(url1).json()
    for m in json['maps']:
        qs2 = {'key': TYPE, 'value': gene}
        url2 = SCREENS_PROJECTS_URL.format(**qs2)
        json = session.get(url2).json()
        for s in json['screens']:
            value = find_type("screen", s['id'])
            if value > -1 and value not in screens:
                screens.append(value)
        for p in json['projects']:
            value = find_type("project", p['id'])
            if value > -1 and value not in projects:
                projects.append(value) 

In [34]:
print(projects)
print(screens)

[501, 1104]
[]
