# DUD-E: UniProt-ChEMBL Mapping

This notebook maps the 102 DUD-E targets onto ChEMBL target entries that we can use to query ChEMBL for binding data. For now we fetch the data directly from DUD-E and ChEMBL, but we could also provide students with the files.

In [1]:
import json

from urllib.request import urlopen
from urllib.error import URLError

DUD-E provides a list of protein UniProt IDs associated with each target. We first fetch these for each of the 102 targets.

In [2]:
with open('targets.txt') as f:
    targets = [l.strip() for l in f]

target_uniprot_ids = {}

for target in targets:
    url = f'http://dude.docking.org/targets/{target}/uniprot.txt'
    try:
        data = urlopen(url).read().decode('utf-8')
    except URLError as e:
        print(e.reason)
    else:
        target_uniprot_ids[target] = data.strip().split('\n')

Next, we identify those UniProt IDs that correspond to a ChEMBL target entry. Fortunately, ChEMBL provides a mapping of UniProt IDs to ChEMBL IDs that we can use.

In [3]:
uniprot_chembl_mapping = {}

url = 'ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_uniprot_mapping.txt'

try:
    data = urlopen(url).read().decode('utf-8')
except URLError as e:
    print(e.reason)
else:
    for line in data.split('\n'):
        line = line.split('\t')
        if line[-1] == 'SINGLE PROTEIN':
            uniprot_chembl_mapping[line[0]] = line[1]

Finally, we construct a list of target ChEMBL IDs associated with each DUD-E target.

In [7]:
target_chembl_ids = {}
for target in target_uniprot_ids:
    chembl_ids = []
    for uniprot_id in target_uniprot_ids[target]:
        if uniprot_id in uniprot_chembl_mapping:
            chembl_ids.append(uniprot_chembl_mapping[uniprot_id])
    if chembl_ids:
        target_chembl_ids[target] = chembl_ids
    else:
        print('No ChEMBL IDs found for target', target)


No ChEMBL IDs found for target inha


We'll save our dictionary in .json format for use later.

In [8]:
with open('dude_target_chembl_ids.json', 'w') as f:
    json.dump(target_chembl_ids, f)