# Process OMOP to DrugBank mappings Data

Jupyter Notebook to download and preprocess files to transform to BioLink RDF.

### Download files

The download can be defined:
* in this Jupyter Notebook using Python
* as a Bash script in the `download/download.sh` file, and executed using `d2s download omop-drugbank`



In [9]:
import os

# Variables and path for the dataset
dataset_id = 'omop-drugbank'
input_folder = '/notebooks/workspace/input/' + dataset_id
mapping_folder = '/notebooks/datasets/' + dataset_id + '/mapping'
os.makedirs(input_folder, exist_ok=True)

In [None]:
# Use input folder as working folder
os.chdir(input_folder)

# Download OMOP to DrugBank mappings
os.system('wget -N https://raw.githubusercontent.com/OHDSI/KnowledgeBase/master/LAERTES/terminology-mappings/RxNORM-to-UNII-PreferredName-To-DrugBank/rxnorm-drugbank-omop-mapping-CLEANED.tsv')

## Process and load concepts

We will use CWL workflows to integrate data with SPARQL queries. The structured data is first converted to a generic RDF based on the data structure, then mapped to BioLink using SPARQL. The SPARQL queries are defined in `.rq` files and can be [accessed on GitHub](https://github.com/MaastrichtU-IDS/d2s-project-template/tree/master/datasets/omop-drugbank/mapping).

Start the required services (here on our server, defined by the `-d trek` arg):

```bash
d2s start tmp-virtuoso drill -d trek
```

Run the following d2s command in the d2s-project folder:

```bash
d2s run csv-virtuoso.cwl omop-drugbank
```

[HCLS metadata](https://www.w3.org/TR/hcls-dataset/) can be computed for the omop-drugbank graph:

```bash
d2s run compute-hcls-metadata.cwl omop-drugbank
```

## Load the BioLink model

Load the [BioLink model ontology as Turtle](https://github.com/biolink/biolink-model/blob/master/biolink-model.ttl) in the graph `https://w3id.org/biolink/biolink-model` in the triplestore

## Link more concepts

See https://github.com/OHDSI/Vocabulary-v5.0
