# Cytokines mappings

This notebook is to crosscheck Leon mappings from Luminex to Ensembl gene name, Ensembl gene ID, and UniProt ID

I will use two different services: bridgedb and biothings

## BridgeDb 
BridgeDb is a framework to map identifiers between various biological databases.

Website: https://bridgedb.github.io/

### API
https://bridgedb.github.io/swagger/

## Biothings
FAIR API ecosystem for biomedical knowledge.

Website: https://biothings.io/

### API
https://mygene.info/



### Preamble
#### Imports

In [22]:
import os
import pandas as pd

# BridgeDb
import requests
import json
from pandas.io.json import json_normalize

# BioThings
from biothings_client import get_client

#### Variables

In [14]:
indir = os.getcwd() + '/in'
outdir = os.getcwd() + '/out'
if not os.path.exists(outdir): os.makedirs(outdir)

#### Functions

#### Workflow
##### Retrieve luminex cytokines list

In [6]:
# read input file
df = pd.read_csv('{}/cytokines_luminex.csv'.format(indir))
print(df.shape)
df.head(2)

(105, 1)


Unnamed: 0,entity luminex gene name
0,CX3CL1
1,CCL26


##### BridgeDb: Map label to Ensembl and UniProt

In [18]:
# api address
api = 'https://webservice.bridgedb.org'
endpoint = 'sourceDataSources'

In [28]:
# get gene info
r = requests.get('{}/Human/{}'.format(api,endpoint))
r = requests.get('https://webservice.bridgedb.org/Human/xrefsBatch/L?dataSource=En')
#r.headers
r

<Response [405]>

##### BioThings: Map label to Ensembl and UniProt

In [7]:
# get the list of cytokines
cytokines_l = set(list(df['entity luminex gene name']))
print('total unique targets:',len(cytokines_l))

total unique targets: 105


In [11]:
# retrieve ensembl gene id and uniprot id from mygene.info
mg = get_client('gene')
bt_df = mg.querymany(cytokines_l, scopes = 'symbol,alias,retired', fields = 'ensembl.gene,uniprot.Swiss-Prot', size = 1, as_dataframe = True)
print(bt_df.shape)
bt_df.head()

querying 1-105...done.
Finished.
14 input query terms found no hit:
	['Chitinase3like1', 'sIL6Rb', 'sTNFR2', 'Pentraxin3', 'SCGFb', 'sCD163', 'bNGF', 'sTNFR1', 'sIL6Ra',
Pass "returnall=True" to return complete lists of duplicate or missing query terms.
(105, 6)


  df = json_normalize(obj)


Unnamed: 0_level_0,notfound,_id,_score,ensembl.gene,uniprot.Swiss-Prot,ensembl
query,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Chitinase3like1,True,,,,,
CXCL16,,58191.0,94.342514,ENSG00000161921,Q9H2A7,
CXCL2,,2920.0,106.81165,ENSG00000081041,P19875,
IL6,,3569.0,89.23337,ENSG00000136244,P05231,
IL17F,,112744.0,92.74649,ENSG00000112116,Q96PD4,


In [12]:
# rename columns and subset biothings dataframe
ids = (bt_df.reset_index()
       .rename(columns={'query': 'luminex', 
                        'uniprot.Swiss-Prot': 'uniprot'})
       [['luminex', 'ensembl.gene', 'uniprot']]
       .copy()
      )
print(ids.shape)
ids.head()

(105, 3)


Unnamed: 0,luminex,ensembl.gene,uniprot
0,Chitinase3like1,,
1,CXCL16,ENSG00000161921,Q9H2A7
2,CXCL2,ENSG00000081041,P19875
3,IL6,ENSG00000136244,P05231
4,IL17F,ENSG00000112116,Q96PD4


In [16]:
# save df
ids.fillna('NA').to_csv('{}/biothings_mappings.tsv'.format(outdir), sep= '\t', index = False, header = True)

#### Conclusion
14 input query terms found no hit