# Get DrugBank Drug-Target Interactions

This notebook gets the drug-target interactions from DrugBank and formats them as a nice JSON.

## Installation

Bio2BEL DrugBank must be installed from GitHub first using the following command in the terminal:

```bash
python3 -m pip install git+https://github.com/bio2bel/drugbank.git@master
```

## Imports

In [1]:
import json
import sys
import time
import rdkit
import bio2bel
import bio2bel_drugbank
from bio2bel_drugbank.models import *

## Runtime Environment

In [2]:
print(sys.version)

3.7.0 (default, Jul 23 2018, 20:22:55) 
[Clang 9.1.0 (clang-902.0.39.2)]


In [3]:
print(time.asctime())

Mon Nov  5 15:15:56 2018


In [4]:
print('Bio2BEL version:', bio2bel.get_version())
print('Bio2BEL DrugBank version:', bio2bel_drugbank.get_version())

Bio2BEL version: 0.1.6-dev
Bio2BEL DrugBank version: 0.1.1-dev


In [5]:
drugbank_manager = bio2bel_drugbank.Manager()
drugbank_manager

<DrugbankManager url=postgresql+psycopg2://cthoyt@localhost:5432/cthoyt>

## Data Download

If you'd like to populate DrugBank yourself, you need to ensure that there's a folder called `~/.pybel/bio2bel/drugbank` in which the file contained at https://www.drugbank.ca/releases/5-1-0/downloads/all-full-database is downloaded.

In [6]:
if not drugbank_manager.is_populated():
    drugbank_manager.populate()

In [7]:
drugbank_manager.summarize()

{'drugs': 11033,
 'types': 2,
 'aliases': 147715,
 'atc_codes': 3987,
 'groups': 7,
 'categories': 3625,
 'patents': 5256,
 'xrefs': 62926,
 'proteins': 4723,
 'species': 528,
 'actions': 52,
 'drug_protein_interactions': 24175}

## Data Processing

In [8]:
drugbank_manager.list_groups()

[approved,
 investigational,
 withdrawn,
 vet_approved,
 nutraceutical,
 illicit,
 experimental]

In [9]:
# this can be swapped for any of the other groups as well
approved = drugbank_manager.get_group_by_name('approved')

## Output

In [10]:
%%time

output_json = [
    {
        'drugbank_id': drug.drugbank_id,
        'name': drug.name,
        'cas_number': drug.cas_number,
        'inchi': drug.inchi,
        'inchikey': drug.inchikey,
        'targets': [
            {
                'uniprot_id': interaction.protein.uniprot_id,
                'uniprot_accession': interaction.protein.uniprot_accession,
                'name': interaction.protein.name,
                'hgnc_id': interaction.protein.hgnc_id,
                'articles': [
                    article.pubmed_id 
                    for article in interaction.articles
                ]
            }
            for interaction in drug.protein_interactions
        ]
    }
    for drug in approved.drugs
]

with open('drugbank-targets.json', 'w') as f:
    json.dump(output_json, f)

CPU times: user 15.6 s, sys: 779 ms, total: 16.4 s
Wall time: 28.8 s


In [11]:
%%time

with open('drugbank-interactions.tsv', 'w') as file:
    print(
        'drug_name',
        'drug_drugbank_id', 
        'protein_name', 
        'protein_uniprot_id',
        'protein_species',
        'pubmed_id', 
        sep='\t', 
        file=file
    )
    
    for drug_target_interaction in drugbank_manager.list_drug_protein_interactions():
        drug = drug_target_interaction.drug
        protein = drug_target_interaction.protein
        
        for article in drug_target_interaction.articles:
            print(
                drug.name, 
                drug.drugbank_id, 
                protein.name, 
                protein.uniprot_id, 
                protein.species.name,
                article.pubmed_id,
                sep='\t', 
                file=file,
            )

CPU times: user 24.1 s, sys: 1.05 s, total: 25.2 s
Wall time: 34.1 s
