# Diabetes related genes expressed in pancreas

This notebook shows how to integrate genomic and image data resources.
This notebook looks at the question **Which genes are expressed in the pancreas in relation to diabetes?**
Tissue and disease can be modified.


Steps:

* Query [humanmine.org](https://www.humanmine.org/humanmine), an integrated database of *Homo sapiens* genomic data using the intermine API to find the genes.
* Using the list of found genes, search in the Image Data Resource (IDR) for images linked to the genes, tissue and disease.

 
We use the intermine API and the IDR API. This notebook is inspired by [Workshop_Pax6Workflow](https://github.com/intermine/intermine-ws-python-docs/blob/master/Workshop_Pax6Workflow.ipynb).

## Summary:
![Overview](./includes/HumanMineIDR.png)

## Settings:


### Auxiliary libraries used
* [nb_conda_kernels](https://github.com/Anaconda-Platform/nb_conda_kernels): Enables a Jupyter Notebook or JupyterLab application in one conda environment to access kernels for Python, R, and other languages found in other environments.
* [jupyter_contrib_nbextensions](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/index.html): Package containing a collection of community-contributed unofficial extensions that add functionality to the Jupyter notebook.

## Launch

### binder

If not already running, you can launch by clicking on the logo [![Binder <](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/IDR/idr-notebooks/master?urlpath=notebooks%2Fhumanmine.ipynb)

### run locally using repo2docker

With ``jupyter-repo2docker`` installed, run:

```
git clone https://github.com/IDR/idr-notebooks.git
cd idr-notebooks
repo2docker .
```

### Install dependencies if required
The cell below will install dependencies if you choose to run the notebook in [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb#recent=true).

In [1]:
%pip install intermine

Note: you may need to restart the kernel to use updated packages.


### Import libraries 

In [2]:
# libraries to interact with intermine
from intermine.webservice import Service

# libraries to interact with IDR
import requests
import json

# Display the images
from IPython.core.display import display, HTML

## Search for genes in Humanmine

We first define the output columns, then add the constraints i.e. specify the tissue and the disease.

In [3]:
TISSUE = "Pancreas" # "Cerebellum" # "little brain"
DISEASE = "diabetes" #"MICROCEPHALY"

In [4]:
service = Service("https://www.humanmine.org/humanmine/service")

In [5]:
query = service.new_query("Gene")

In [6]:
query.add_view(
    "primaryIdentifier", "symbol", "proteinAtlasExpression.cellType",
    "proteinAtlasExpression.level", "proteinAtlasExpression.reliability",
    "proteinAtlasExpression.tissue.name"
)

<intermine.query.Query at 0x7fbc07149fd0>

We look for those genes in the specified tissue and that are also associated with the specified disease.

In [7]:
query.add_constraint("proteinAtlasExpression.tissue.name", "=", TISSUE)
query.add_constraint("proteinAtlasExpression.level", "ONE OF", ["Medium", "High"])
query.add_constraint("organism.name", "=", "Homo sapiens")
query.add_constraint("diseases.name", "CONTAINS", DISEASE)

<BinaryConstraint: Gene.diseases.name CONTAINS diabetes>

Collect the genes

In [8]:
upin_tissue = set()
for row in query.rows():
    upin_tissue.add(row["symbol"])
genes = sorted(upin_tissue, reverse=True)

Print out the list of genes

In [9]:
for i, a in enumerate(genes):
    print(a, end=' ')
    if i % 8 == 7: 
        print("")

WFS1 VEGFA TCF7L2 TBC1D4 SOD2 SLC30A8 PTPN22 PDX1 
MIA3 KCNJ11 IRS2 IRS1 INSR INS IGF2BP2 IER3IP1 
HNF4A HNF1B HMGA1 HFE GPD2 GCK ENPP1 EIF2AK3 
DNAJC3 CEL CAPN10 APPL1 AKT2 ABCC8 

## Search for images in IDR associated to the genes found in Humanmine

From the list of genes found using the intermine API, we are now looking in [Image Data Resource](https://idr.openmicroscopy.org/) for studies linked to those genes and with **TISSUE** as a ``Sample Type``.

In [22]:
TYPE = "gene"
SAMPLE_TYPE = "tissue"
EXPRESSION_KEY = "Expression Pattern Description"
EXPRESSION = "Islets" # "Brain"
STAGE = "Developmental Stage"

### Set up base URLS so can use shorter variable names later on

In [11]:
URL = "https://idr.openmicroscopy.org/mapr/api/{key}/?value={value}&case_sensitive=false&orphaned=true"
SCREENS_PROJECTS_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/?value={value}"
ATTRIBUTES_URL = "https://idr.openmicroscopy.org/webclient/api/annotations/?type=map&{object}={object_id}"
DATASETS_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/datasets/?value={value}&id={project_id}"
IMAGES_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/images/?value={value}&node={parent_type}&id={parent_id}"


### Set up where to query and session

In [12]:
INDEX_PAGE = "https://idr.openmicroscopy.org/webclient/?experimenter=-1"

# create http session
with requests.Session() as session:
    request = requests.Request('GET', INDEX_PAGE)
    prepped = session.prepare_request(request)
    response = session.send(prepped)
    if response.status_code != 200:
        response.raise_for_status()

### Helper methods

In [13]:
def find_type(data_type, id):
    '''
    Find the studies of type specified by the SAMPLE_TYPE parameter
    '''
    qs = {'object': data_type, 'object_id': id}
    url = ATTRIBUTES_URL.format(**qs)
    for a in session.get(url).json()['annotations']:
        for v in a['values']:
            if v[0] == "Sample Type" and v[1] == SAMPLE_TYPE:
                return id
    return -1

In [40]:
def find_images(json_data, data_type):
    '''
    Find the images associated to a gene
    '''
    import collections
    for data in json_data[data_type]:
        parent_id = data['id']
        qs4 = {'key': TYPE, 'value': gene,
               'parent_type': data_type[:-1], 'parent_id': parent_id}
        url4 = IMAGES_URL.format(**qs4)
        for i in session.get(url4).json()['images']:
            image_id = i['id']
            qs5 = {'object': "image", 'object_id': image_id}
            url5 = ATTRIBUTES_URL.format(**qs5)
            for a in session.get(url5).json()['annotations']:
                values = collections.OrderedDict(sorted(a['values']))
                stage = ""
                for k, v in values.items():
                    if k == STAGE:
                        stage = v
                    if k == EXPRESSION_KEY and EXPRESSION in v:
                        images[stage].append(image_id)

Search the studies related to the list of genes found in the humanmine resource.
Collect the id of each project found. 

In [15]:
projects = set()
for gene in genes:
    qs1 = {'key': TYPE, 'value': gene}
    url1 = URL.format(**qs1)
    json = session.get(url1).json()
    for m in json['maps']:
        qs2 = {'key': TYPE, 'value': gene}
        url2 = SCREENS_PROJECTS_URL.format(**qs2)
        json = session.get(url2).json()
        for p in json['projects']:
            value = find_type("project", p['id'])
            if value > -1:
                projects.add(value)
                             

Print out the ids.

In [16]:
print(projects)

{1104, 501}


## Find the images
Find the images linked to the selected genes in one the project found.
Below, we look at the gene **PDX1**

In [48]:
from collections import defaultdict
project_id = 1104
selected_genes = {"PDX1"}
images = defaultdict(list)
for gene in selected_genes:
    qs3 = {'key': TYPE, 'value': gene, 'project_id': project_id}
    url3 = DATASETS_URL.format(**qs3)
    find_images(session.get(url3).json(), "datasets")

print(images)

defaultdict(<class 'list'>, {'15PCW': [9841210, 9841211], '9PCW': [9841212, 9841219, 9841213, 9841218], 'CS16': [9841216, 9841217], 'CS21': [9841215, 9841214]})


In [92]:
BASE_URL = "https://idr.openmicroscopy.org/webclient"
IMAGE_DATA_URL = BASE_URL + "/render_thumbnail/{id}"
LINK_URL = BASE_URL + "/?show=image-{id}"

## Display the images

In [120]:
html = "<table>"
for k, v in images.items():
    html += '<tr><td><h4>Development stage: '+k+'</h4></td></tr><tr>'
    for i in v:
        qs = {'id': i}
        url = IMAGE_DATA_URL.format(**qs)
        url_link = LINK_URL.format(**qs)
        html += '<td><a href="'+url_link+'" target="_blank"><img src="'+url+'"/></a></td>'
    html += "</tr>"
html += "</table>"
display(HTML(html))

0,1,2,3
Development stage: 15PCW,,,
,,,
Development stage: 9PCW,,,
,,,
Development stage: CS16,,,
,,,
Development stage: CS21,,,
,,,


### License (BSD 2-Clause)¶

Copyright (C) 2021 University of Dundee. All Rights Reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 