Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Download and install all libraries within the requirements.txt file before continuing.

pip install requirements.txt

Step 1: Download the data Download the druglabel data from DailyMed: For this take the full prescription data from the full releases. At the moment this is made up of 4 zip files. Take these and extract these zip files into a prescription folder.

Note: Test this pipeline on one of the monthly updates. This will contain the newest druglabels and a smaller subset to run everything faster for the first time.

Step 2: Extract the information Each .zip file represents an SPL label. In order to extract the information we want, we must first extract the xml files from the .zip files.

cd OptimAL\src



This will extract the xml files from all of the zip files and then will go through the content of each of these xmls if they contain an "Indications and Usage" section. As a result, we should recieve a file called "XMLProduct.csv".

Step 3: Gather the DrugBank ID using the UNII code Using the active ingredient and Unii code extracted from the XML files, use this to gather the DB_ID


From this we will recieve the "GSD_DBID.csv" file.

Step 4: Clean up the text Make the gathered context text even neater by using this code to make the text look nicer:


We clean up the text now so that we make sure we can get annotations from the BioPortal annotation API as well as make it look neater for the future microtasks.

Step 5: Gather the BioPortal annotations


Step 6:

Gather the UMLS ID using the DOID Mappings. Without these UMLS ID we cannot run the Drug Repurposing Pipeline (DRP)


Step 7: Use the REMZI Pipeline in order to obtain predictions on the most informative instances

python drugindication_ml/src/ -g drugindication_ml/data/input/unified-gold-standard-umls.txt -t data/output/final_unlabeled.csv -dr drugindication_ml/data/features/drugs-fingerprint.txt drugindication_ml/data/features/drugs-targets.txt -di drugindication_ml/data/features/diseases-ndfrt-meddra.txt -m rf -p 2 -s 100 -o data/output/predictions_for_unlabeled.csv

#NOTE: Why did we remove one of the feature matrixes again? Also we are not using the selected features matrix.

Step 8: Make up the dataset using positive and negative examples

Go to the match relations file in order to ignore instances which are already within the goldstandard dataset

You can’t perform that action at this time.