# Predictions evaluation

On reproduction/2-predictions:

    * Model (Log. Reg.): P: 0.966, R: 0.966, F: 0.964  10-fold cross-validation
    * Train set: shape = 1:10 size = 8306 dim = (8306, 201)
    * Predictions: size = 210706  

In [218]:
import pandas as pd

### Gold standard

    * source: https://raw.githubusercontent.com/dhimmel/indications/11d535ba0884ee56c3cd5756fdfb4985f313bd80/catalog/indications.tsv
    

In [219]:
# read indications catalog used in the graph
df = pd.read_table('./gold-standard/indications.tsv')
gold_df = df[['doid_id','drugbank_id','disease', 'drug', 'category']].copy()
print(gold_df.size, gold_df.shape)
gold_df.head()

6940 (1388, 5)


Unnamed: 0,doid_id,drugbank_id,disease,drug,category
0,DOID:10652,DB00843,Alzheimer's disease,Donepezil,DM
1,DOID:10652,DB00674,Alzheimer's disease,Galantamine,DM
2,DOID:10652,DB01043,Alzheimer's disease,Memantine,DM
3,DOID:10652,DB00989,Alzheimer's disease,Rivastigmine,DM
4,DOID:10652,DB00245,Alzheimer's disease,Benzatropine,SYM


In [220]:
gold_df.groupby('category').count()

Unnamed: 0_level_0,doid_id,drugbank_id,disease,drug
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
DM,755,755,755,755
NOT,243,243,243,243
SYM,390,390,390,390


In [221]:
# set true indications dataframe
indications_df = gold_df.query('category == "DM"').reset_index().copy()
print(indications_df.shape)
print('diseases: {}, drugs: {}'.format(indications_df.disease.nunique(), indications_df.drug.nunique()))
indications_df.head()

(755, 6)
diseases: 77, drugs: 387


Unnamed: 0,index,doid_id,drugbank_id,disease,drug,category
0,0,DOID:10652,DB00843,Alzheimer's disease,Donepezil,DM
1,1,DOID:10652,DB00674,Alzheimer's disease,Galantamine,DM
2,2,DOID:10652,DB01043,Alzheimer's disease,Memantine,DM
3,3,DOID:10652,DB00989,Alzheimer's disease,Rivastigmine,DM
4,16,DOID:9206,DB00736,Barrett's esophagus,Esomeprazole,DM


In [222]:
# unique diseases
print(indications_df.disease.unique())

["Alzheimer's disease" "Barrett's esophagus" "Crohn's disease"
 "Graves' disease" 'Kawasaki disease' "Paget's disease of bone"
 'acquired immunodeficiency syndrome' 'alcohol dependence'
 'allergic rhinitis' 'alopecia areata' 'amyotrophic lateral sclerosis'
 'ankylosing spondylitis' 'asthma' 'atherosclerosis' 'atopic dermatitis'
 'azoospermia' 'bone cancer' 'brain cancer' 'breast cancer'
 'cervical cancer' 'chronic obstructive pulmonary disease' 'colon cancer'
 'coronary artery disease' 'dilated cardiomyopathy' 'epilepsy syndrome'
 'esophageal cancer' 'focal segmental glomerulosclerosis'
 'germ cell cancer' 'gestational diabetes' 'glaucoma' 'gout'
 'head and neck cancer' 'hematologic cancer' 'hepatitis B' 'hypertension'
 'hypothyroidism' 'kidney cancer' 'leprosy' 'liver cancer' 'lung cancer'
 'lymphatic system cancer' 'malaria' 'malignant glioma' 'melanoma'
 'metabolic syndrome X' 'migraine' 'multiple sclerosis' 'muscle cancer'
 'nephrolithiasis' 'nicotine dependence' 'obesity' 'ocular 

In [223]:
# unique drugs
print(indications_df.drug.unique())

['Donepezil' 'Galantamine' 'Memantine' 'Rivastigmine' 'Esomeprazole'
 'Omeprazole' 'Azathioprine' 'Balsalazide' 'Mercaptopurine' 'Mesalazine'
 'Prednisone' 'Sulfasalazine' 'Methimazole' 'Propylthiouracil'
 'Acetylsalicylic acid' 'Alendronate' 'Etidronic acid' 'Pamidronate'
 'Risedronate' 'Tiludronate' 'Zoledronate' 'Abacavir' 'Amprenavir'
 'Delavirdine' 'Didanosine' 'Efavirenz' 'Indinavir' 'Lamivudine'
 'Lopinavir' 'Nelfinavir' 'Nevirapine' 'Ritonavir' 'Saquinavir' 'Stavudine'
 'Zidovudine' 'Acamprosate' 'Citalopram' 'Disulfiram' 'Naltrexone'
 'Betamethasone' 'Cetirizine' 'Cyproheptadine' 'Desloratadine'
 'Dexamethasone' 'Dimenhydrinate' 'Diphenhydramine' 'Flunisolide'
 'Hydrocortisone' 'Loratadine' 'Methylprednisolone' 'Montelukast'
 'Olopatadine' 'Prednisolone' 'Triamcinolone' 'Riluzole' 'Methotrexate'
 'Aminophylline' 'Arformoterol' 'Beclomethasone' 'Budesonide' 'Ciclesonide'
 'Cromoglicic acid' 'Dyphylline' 'Fluticasone Propionate'
 'Fluticasone furoate' 'Formoterol' 'Indacaterol' 

### Select instances

The selection of Indications should be divers, i.e. diseases belonging to different disease classes, and with different amount of drug information available for the training step.


In [224]:
# gruop by disease: they range [68 - 1]
indications_df.groupby('disease')['drug'].count().sort_values(ascending=False)

disease
hypertension                          68
hematologic cancer                    51
asthma                                37
breast cancer                         29
coronary artery disease               28
epilepsy syndrome                     25
type 2 diabetes mellitus              22
psoriasis                             21
glaucoma                              21
prostate cancer                       21
ulcerative colitis                    16
lung cancer                           16
atopic dermatitis                     16
allergic rhinitis                     15
osteoporosis                          15
rheumatoid arthritis                  15
acquired immunodeficiency syndrome    14
kidney cancer                         13
systemic lupus erythematosus          13
peripheral nervous system neoplasm    12
multiple sclerosis                    11
obesity                               11
urinary bladder cancer                11
testicular cancer                     11
malaria 

In [225]:
# 1. hypertension DOID:10763 (68 drugs)
indications_df.query('doid_id == "DOID:10763"')

Unnamed: 0,index,doid_id,drugbank_id,disease,drug,category
338,628,DOID:10763,DB01193,hypertension,Acebutolol,DM
339,629,DOID:10763,DB00594,hypertension,Amiloride,DM
340,630,DOID:10763,DB00381,hypertension,Amlodipine,DM
341,631,DOID:10763,DB01076,hypertension,Atorvastatin,DM
342,632,DOID:10763,DB00542,hypertension,Benazepril,DM
343,633,DOID:10763,DB00436,hypertension,Bendroflumethiazide,DM
344,634,DOID:10763,DB01244,hypertension,Bepridil,DM
345,635,DOID:10763,DB00195,hypertension,Betaxolol,DM
346,636,DOID:10763,DB00887,hypertension,Bumetanide,DM
347,637,DOID:10763,DB00796,hypertension,Candesartan,DM


In [226]:
# 2. epilepsy DOID:1826 (25 drugs)
indications_df.query('doid_id == "DOID:1826"')

Unnamed: 0,index,doid_id,drugbank_id,disease,drug,category
211,472,DOID:1826,DB00819,epilepsy syndrome,Acetazolamide,DM
212,473,DOID:1826,DB01351,epilepsy syndrome,Amobarbital,DM
213,474,DOID:1826,DB00564,epilepsy syndrome,Carbamazepine,DM
214,475,DOID:1826,DB00349,epilepsy syndrome,Clobazam,DM
215,476,DOID:1826,DB01068,epilepsy syndrome,Clonazepam,DM
216,477,DOID:1826,DB00829,epilepsy syndrome,Diazepam,DM
217,478,DOID:1826,DB00949,epilepsy syndrome,Felbamate,DM
218,479,DOID:1826,DB01320,epilepsy syndrome,Fosphenytoin,DM
219,480,DOID:1826,DB00996,epilepsy syndrome,Gabapentin,DM
220,481,DOID:1826,DB06218,epilepsy syndrome,Lacosamide,DM


In [227]:
# 3. malaria DOID:12365 (11 drugs)
indications_df.query('doid_id == "DOID:12365"')

Unnamed: 0,index,doid_id,drugbank_id,disease,drug,category
454,808,DOID:12365,DB06697,malaria,Artemether,DM
455,809,DOID:12365,DB01190,malaria,Clindamycin,DM
456,810,DOID:12365,DB00250,malaria,Dapsone,DM
457,811,DOID:12365,DB00254,malaria,Doxycycline,DM
458,812,DOID:12365,DB00806,malaria,Pentoxifylline,DM
459,813,DOID:12365,DB01131,malaria,Proguanil,DM
460,814,DOID:12365,DB00205,malaria,Pyrimethamine,DM
461,815,DOID:12365,DB00908,malaria,Quinidine,DM
462,816,DOID:12365,DB01346,malaria,Quinidine barbiturate,DM
463,817,DOID:12365,DB00468,malaria,Quinine,DM


In [228]:
# 4. thyroid cancer DOID:1781 (4 drugs)
indications_df.query('doid_id == "DOID:1781"')

Unnamed: 0,index,doid_id,drugbank_id,disease,drug,category
693,1311,DOID:1781,DB00997,thyroid cancer,Doxorubicin,DM
694,1312,DOID:1781,DB00445,thyroid cancer,Epirubicin,DM
695,1313,DOID:1781,DB00398,thyroid cancer,Sorafenib,DM
696,1314,DOID:1781,DB05294,thyroid cancer,Vandetanib,DM


In [229]:
# 5. obesity DOID:9970 (11 drugs)
indications_df.query('doid_id == "DOID:9970"')

Unnamed: 0,index,doid_id,drugbank_id,disease,drug,category
501,918,DOID:9970,DB00865,obesity,Benzphetamine,DM
502,919,DOID:9970,DB01156,obesity,Bupropion,DM
503,920,DOID:9970,DB00501,obesity,Cimetidine,DM
504,921,DOID:9970,DB00937,obesity,Diethylpropion,DM
505,922,DOID:9970,DB01577,obesity,Methamphetamine,DM
506,923,DOID:9970,DB01083,obesity,Orlistat,DM
507,924,DOID:9970,DB01579,obesity,Phendimetrazine,DM
508,925,DOID:9970,DB00191,obesity,Phentermine,DM
509,926,DOID:9970,DB00397,obesity,Phenylpropanolamine,DM
510,927,DOID:9970,DB01105,obesity,Sibutramine,DM


### Predictions

In [230]:
# read predictions
predictions_df = pd.read_csv('./reproduction/2-predictions/predictions_mapped.csv', sep=',', header=None)
predictions_df = predictions_df.rename(
    columns={ 
        0: 'drug', 
        1: 'disease', 
        2: 'actual', 
        3: 'predicted', 
        4: 'error', 
        5: 'prediction'
    }
)
predictions_df = predictions_df[['drug', 'disease', 'predicted', 'prediction']]
predictions_df['predicted'] = predictions_df.predicted.apply(lambda x: 'true' if x.split(':')[1] == 't' else 'false')

# Include entity names alongside IDs
# Mapping disease names
url = 'https://raw.githubusercontent.com/dhimmel/disease-ontology/75050ea2d4f60e745d3f3578ae03560a2cc0e444/data/slim-terms.tsv'
disease_df = pd.read_table(url)
disease_df = disease_df[['doid','name','pathophysiology']] 
disease_df['doid'] = ( disease_df
                  .doid
                  .apply(
                      lambda y: y.replace(':','_')
                        )
                          
             )
disease_df = disease_df.rename(columns={'doid': 'disease', 'name': 'disease_name', 'pathophysiology': 'disease_pathophysiology'})

# Mapping drug names
url = 'https://raw.githubusercontent.com/dhimmel/drugbank/3e87872db5fca5ac427ce27464ab945c0ceb4ec6/data/drugbank-slim.tsv'
compound_df = pd.read_table(url)
compound_df = compound_df[['drugbank_id','name','categories']]
compound_df = compound_df.rename(columns={'drugbank_id': 'drug', 'name': 'drug_name', 'categories': 'drug_categories'})

# mapping names to predictions dataframe
predictions_df = pd.merge(predictions_df,compound_df, how='left', on='drug')
predictions_df = pd.merge(predictions_df,disease_df,how='left',on='disease')

# explore true
print('TRUE INDICATIONS')
print(indications_df.shape)
print('diseases: {}, drugs: {}'.format(indications_df.disease.nunique(), indications_df.drug.nunique()))
print()

# explore predictions
print('PREDICTIONS')
print(predictions_df.shape)
print('diseases: {}, drugs: {}'.format(predictions_df.disease.nunique(), predictions_df.drug.nunique()))
predictions_df.head()

TRUE INDICATIONS
(755, 6)
diseases: 77, drugs: 387

PREDICTIONS
(210706, 8)
diseases: 137, drugs: 1538


Unnamed: 0,drug,disease,predicted,prediction,drug_name,drug_categories,disease_name,disease_pathophysiology
0,DB00843,DOID_10652,False,0.958,Donepezil,,Alzheimer's disease,degenerative
1,DB00843,DOID_9206,False,0.98,Donepezil,,Barrett's esophagus,neoplastic
2,DB00843,DOID_8778,False,0.932,Donepezil,,Crohn's disease,immunologic
3,DB00843,DOID_12361,False,0.98,Donepezil,,Graves' disease,immunologic
4,DB00843,DOID_13378,False,0.991,Donepezil,,Kawasaki disease,immunologic


In [246]:
# 1. hypertension DOID:10763 (68 drugs)
print('Indications: %s' % len(indications_df.query('doid_id == "DOID:10763"')))

# predictions
print('True predictions: {} ({}%)'.format(len(predictions_df.query('disease == "DOID_10763" & predicted == "true"')),round(len(predictions_df.query('disease == "DOID_10763" & predicted == "true"'))*100/len(predictions_df.query('disease == "DOID_10763"')))))

# overlap
indi = set(indications_df.query('doid_id == "DOID:10763"')[['drug']].drug)
pred = set(predictions_df.query('disease == "DOID_10763" & predicted == "true"')[['drug_name']].drug_name)
overlap = pred & indi
print('Indications predicted: {}\n'.format(len(overlap)))
#print(overlap)
#print('\nIndications not predicted: {}'.format(indi - pred))
print()
true = predictions_df.query('disease == "DOID_10763" & predicted == "true"')
print('Percentage of true predictions (P = 1): {}%'.format(round(
            100*len(true.query('prediction == 1'))/len(true)))
     )
print('Number of true predictions (P = 1): {}'.format(
            len(true.query('prediction == 1')))
     )

# overlap
indi = set(indications_df.query('doid_id == "DOID:10763"')[['drug']].drug)
pred = set(predictions_df.query('disease == "DOID_10763" & predicted == "true" & prediction == 1.')[['drug_name']].drug_name)
overlap = pred & indi
print('Indications predicted: {}\n'.format(len(overlap)))
#print(overlap)
#print('\nIndications not predicted: {}'.format(indi - pred))
print()
predictions_df.query('disease == "DOID_10763" & predicted == "true" & prediction == 1.').sort_values(by= 'prediction', ascending= False)

Indications: 68
True predictions: 1154 (75%)
Indications predicted: 58


Percentage of true predictions (P = 1): 63%
Number of true predictions (P = 1): 731
Indications predicted: 31




Unnamed: 0,drug,disease,predicted,prediction,drug_name,drug_categories,disease_name,disease_pathophysiology
7843,DB01274,DOID_10763,true,1.0,Arformoterol,Adrenergic beta-2 Receptor Agonists|Bronchodil...,hypertension,idiopathic
143610,DB01141,DOID_10763,true,1.0,Micafungin,Antifungal Agents,hypertension,idiopathic
144021,DB01147,DOID_10763,true,1.0,Cloxacillin,Penicillins,hypertension,idiopathic
144158,DB01148,DOID_10763,true,1.0,Flavoxate,Parasympatholytics,hypertension,idiopathic
144295,DB01153,DOID_10763,true,1.0,Sertaconazole,Antifungal Agents,hypertension,idiopathic
144432,DB01154,DOID_10763,true,1.0,Thiamylal,,hypertension,idiopathic
144569,DB01157,DOID_10763,true,1.0,Trimetrexate,Antifungal Agents|Antiprotozoal Agents|Folic A...,hypertension,idiopathic
144706,DB01158,DOID_10763,true,1.0,Bretylium,Antihypertensive Agents|Anti-Arrhythmia Agents...,hypertension,idiopathic
144843,DB01159,DOID_10763,true,1.0,Halothane,"Anesthetics, Inhalation|Anesthetics",hypertension,idiopathic
144980,DB01160,DOID_10763,true,1.0,Dinoprost Tromethamine,"Oxytocics|Abortifacient Agents, Nonsteroidal|A...",hypertension,idiopathic


In [249]:
# 2. epilepsy DOID:1826 (25 drugs)
print('Indications: %s' % len(indications_df.query('doid_id == "DOID:1826"')))

# predictions
print('True predictions: {} ({}%)'.format(len(predictions_df.query('disease == "DOID_1826" & predicted == "true"')),round(len(predictions_df.query('disease == "DOID_1826" & predicted == "true"'))*100/len(predictions_df.query('disease == "DOID_1826"')))))

# overlap
indi = set(indications_df.query('doid_id == "DOID:1826"')[['drug']].drug)
pred = set(predictions_df.query('disease == "DOID_1826" & predicted == "true"')[['drug_name']].drug_name)
overlap = pred & indi
print('Indications predicted: {}\n'.format(len(overlap)))
#print(overlap)
#print('\nIndications not predicted: {}'.format(indi - pred))
print()
true = predictions_df.query('disease == "DOID_1826" & predicted == "true"')
print('Percentage of true predictions (P = 1): {}%'.format(round(
            100*len(true.query('prediction == 1'))/len(true)))
     )
print('Number of true predictions (P = 1): {}'.format(
            len(true.query('prediction == 1')))
     )

# overlap
indi = set(indications_df.query('doid_id == "DOID:1826"')[['drug']].drug)
pred = set(predictions_df.query('disease == "DOID_1826" & predicted == "true" & prediction == 1')[['drug_name']].drug_name)
overlap = pred & indi
print('Indications predicted: {}\n'.format(len(overlap)))
#print(overlap)
#print('\nIndications not predicted: {}'.format(indi - pred))
print()
predictions_df.query('disease == "DOID_1826" & predicted == "true" & prediction == 1.').sort_values(by= 'prediction', ascending= False)

Indications: 25
True predictions: 1092 (71%)
Indications predicted: 19


Percentage of true predictions (P = 1): 62%
Number of true predictions (P = 1): 679
Indications predicted: 13




Unnamed: 0,drug,disease,predicted,prediction,drug_name,drug_categories,disease_name,disease_pathophysiology
7833,DB01274,DOID_1826,true,1.0,Arformoterol,Adrenergic beta-2 Receptor Agonists|Bronchodil...,epilepsy syndrome,unspecific
146340,DB01183,DOID_1826,true,1.0,Naloxone,Narcotic Antagonists|Central Nervous System De...,epilepsy syndrome,unspecific
144559,DB01157,DOID_1826,true,1.0,Trimetrexate,Antifungal Agents|Antiprotozoal Agents|Folic A...,epilepsy syndrome,unspecific
144696,DB01158,DOID_1826,true,1.0,Bretylium,Antihypertensive Agents|Anti-Arrhythmia Agents...,epilepsy syndrome,unspecific
144833,DB01159,DOID_1826,true,1.0,Halothane,"Anesthetics, Inhalation|Anesthetics",epilepsy syndrome,unspecific
144970,DB01160,DOID_1826,true,1.0,Dinoprost Tromethamine,"Oxytocics|Abortifacient Agents, Nonsteroidal|A...",epilepsy syndrome,unspecific
145107,DB01161,DOID_1826,true,1.0,Chloroprocaine,"Anesthetics, Local",epilepsy syndrome,unspecific
145381,DB01169,DOID_1826,true,1.0,Arsenic trioxide,Antineoplastic Agents|Homeopathic Agents,epilepsy syndrome,unspecific
146066,DB01180,DOID_1826,true,1.0,Rescinnamine,,epilepsy syndrome,unspecific
146203,DB01182,DOID_1826,true,1.0,Propafenone,Anti-Arrhythmia Agents|Voltage-Gated Sodium Ch...,epilepsy syndrome,unspecific


In [250]:
# 3. malaria DOID:12365 (11 drugs)
print('Indications: %s' % len(indications_df.query('doid_id == "DOID:12365"')))

# predictions
print('True predictions: {} ({}%)'.format(len(predictions_df.query('disease == "DOID_12365" & predicted == "true"')),round(len(predictions_df.query('disease == "DOID_12365" & predicted == "true"'))*100/len(predictions_df.query('disease == "DOID_12365"')))))

# overlap
indi = set(indications_df.query('doid_id == "DOID:12365"')[['drug']].drug)
pred = set(predictions_df.query('disease == "DOID_12365" & predicted == "true"')[['drug_name']].drug_name)
overlap = pred & indi
print('Indications predicted: {}\n'.format(len(overlap)))
#print(overlap)
#print('\nIndications not predicted: {}'.format(indi - pred))
print()
true = predictions_df.query('disease == "DOID_12365" & predicted == "true"')
print('Percentage of true predictions (P = 1): {}%'.format(round(
            100*len(true.query('prediction == 1'))/len(true)))
     )
print('Number of true predictions (P = 1): {}'.format(
            len(true.query('prediction == 1')))
     )

# overlap
indi = set(indications_df.query('doid_id == "DOID:12365"')[['drug']].drug)
pred = set(predictions_df.query('disease == "DOID_12365" & predicted == "true" & prediction == 1.')[['drug_name']].drug_name)
overlap = pred & indi
print('Indications predicted: {}\n'.format(len(overlap)))
#print(overlap)
#print('\nIndications not predicted: {}'.format(indi - pred))
print()
predictions_df.query('disease == "DOID_12365" & predicted == "true" & prediction == 1.').sort_values(by= 'prediction', ascending= False)

Indications: 11
True predictions: 987 (64%)
Indications predicted: 11


Percentage of true predictions (P = 1): 58%
Number of true predictions (P = 1): 571
Indications predicted: 6




Unnamed: 0,drug,disease,predicted,prediction,drug_name,drug_categories,disease_name,disease_pathophysiology
8398,DB01003,DOID_12365,true,1.0,Cromoglicic acid,Anti-Asthmatic Agents,malaria,infectious
146083,DB01180,DOID_12365,true,1.0,Rescinnamine,,malaria,infectious
144439,DB01154,DOID_12365,true,1.0,Thiamylal,,malaria,infectious
144713,DB01158,DOID_12365,true,1.0,Bretylium,Antihypertensive Agents|Anti-Arrhythmia Agents...,malaria,infectious
144850,DB01159,DOID_12365,true,1.0,Halothane,"Anesthetics, Inhalation|Anesthetics",malaria,infectious
144987,DB01160,DOID_12365,true,1.0,Dinoprost Tromethamine,"Oxytocics|Abortifacient Agents, Nonsteroidal|A...",malaria,infectious
145124,DB01161,DOID_12365,true,1.0,Chloroprocaine,"Anesthetics, Local",malaria,infectious
145398,DB01169,DOID_12365,true,1.0,Arsenic trioxide,Antineoplastic Agents|Homeopathic Agents,malaria,infectious
146220,DB01182,DOID_12365,true,1.0,Propafenone,Anti-Arrhythmia Agents|Voltage-Gated Sodium Ch...,malaria,infectious
144028,DB01147,DOID_12365,true,1.0,Cloxacillin,Penicillins,malaria,infectious


In [251]:
# 4. thyroid cancer DOID:1781 (4 drugs)
print('Indications: %s' % len(indications_df.query('doid_id == "DOID:1781"')))

# predictions
print('True predictions: {} ({}%)'.format(len(predictions_df.query('disease == "DOID_1781" & predicted == "true"')),round(len(predictions_df.query('disease == "DOID_1781" & predicted == "true"'))*100/len(predictions_df.query('disease == "DOID_1781"')))))

# overlap
indi = set(indications_df.query('doid_id == "DOID:1781"')[['drug']].drug)
pred = set(predictions_df.query('disease == "DOID_1781" & predicted == "true"')[['drug_name']].drug_name)
overlap = pred & indi
print('Indications predicted: {}\n'.format(len(overlap)))
#print(overlap)
#print('\nIndications not predicted: {}'.format(indi - pred))
print()
true = predictions_df.query('disease == "DOID_1781" & predicted == "true"')
print('Percentage of true predictions (P = 1): {}%'.format(round(
            100*len(true.query('prediction == 1'))/len(true)))
     )
print('Number of true predictions (P = 1): {}'.format(
            len(true.query('prediction == 1')))
     )

# overlap
indi = set(indications_df.query('doid_id == "DOID:1781"')[['drug']].drug)
pred = set(predictions_df.query('disease == "DOID_1781" & predicted == "true" & prediction == 1.')[['drug_name']].drug_name)
overlap = pred & indi
print('Indications predicted: {}\n'.format(len(overlap)))
#print(overlap)
#print('\nIndications not predicted: {}'.format(indi - pred))
print()
predictions_df.query('disease == "DOID_1781" & predicted == "true" & prediction == 1.').sort_values(by= 'prediction', ascending= False)

Indications: 4
True predictions: 964 (63%)
Indications predicted: 4


Percentage of true predictions (P = 1): 58%
Number of true predictions (P = 1): 555
Indications predicted: 1




Unnamed: 0,drug,disease,predicted,prediction,drug_name,drug_categories,disease_name,disease_pathophysiology
8427,DB01003,DOID_1781,true,1.0,Cromoglicic acid,Anti-Asthmatic Agents,thyroid cancer,neoplastic
146112,DB01180,DOID_1781,true,1.0,Rescinnamine,,thyroid cancer,neoplastic
144468,DB01154,DOID_1781,true,1.0,Thiamylal,,thyroid cancer,neoplastic
144742,DB01158,DOID_1781,true,1.0,Bretylium,Antihypertensive Agents|Anti-Arrhythmia Agents...,thyroid cancer,neoplastic
144879,DB01159,DOID_1781,true,1.0,Halothane,"Anesthetics, Inhalation|Anesthetics",thyroid cancer,neoplastic
145016,DB01160,DOID_1781,true,1.0,Dinoprost Tromethamine,"Oxytocics|Abortifacient Agents, Nonsteroidal|A...",thyroid cancer,neoplastic
145153,DB01161,DOID_1781,true,1.0,Chloroprocaine,"Anesthetics, Local",thyroid cancer,neoplastic
145427,DB01169,DOID_1781,true,1.0,Arsenic trioxide,Antineoplastic Agents|Homeopathic Agents,thyroid cancer,neoplastic
146249,DB01182,DOID_1781,true,1.0,Propafenone,Anti-Arrhythmia Agents|Voltage-Gated Sodium Ch...,thyroid cancer,neoplastic
149537,DB01227,DOID_1781,true,1.0,Levomethadyl Acetate,"Analgesics, Opioid|Narcotics",thyroid cancer,neoplastic


In [252]:
# 5. obesity DOID:9970 (11 drugs)
print('Indications: %s' % len(indications_df.query('doid_id == "DOID:9970"')))

# predictions
print('True predictions: {} ({}%)'.format(len(predictions_df.query('disease == "DOID_9970" & predicted == "true"')),round(len(predictions_df.query('disease == "DOID_9970" & predicted == "true"'))*100/len(predictions_df.query('disease == "DOID_9970"')))))

# overlap
indi = set(indications_df.query('doid_id == "DOID:9970"')[['drug']].drug)
pred = set(predictions_df.query('disease == "DOID_9970" & predicted == "true"')[['drug_name']].drug_name)
overlap = pred & indi
print('Indications predicted: {}\n'.format(len(overlap)))
#print(overlap)
#print('\nIndications not predicted: {}'.format(indi - pred))
print()
true = predictions_df.query('disease == "DOID_9970" & predicted == "true"')
print('Percentage of true predictions (P = 1): {}%'.format(round(
            100*len(true.query('prediction == 1'))/len(true)))
     )
print('Number of true predictions (P = 1): {}'.format(
            len(true.query('prediction == 1')))
     )

# overlap
indi = set(indications_df.query('doid_id == "DOID:9970"')[['drug']].drug)
pred = set(predictions_df.query('disease == "DOID_9970" & predicted == "true" & prediction == 1.')[['drug_name']].drug_name)
overlap = pred & indi
print('Indications predicted: {}\n'.format(len(overlap)))
#print(overlap)
#print('\nIndications not predicted: {}'.format(indi - pred))
print()
predictions_df.query('disease == "DOID_9970" & predicted == "true" & prediction == 1.').sort_values(by= 'prediction', ascending= False)

Indications: 11
True predictions: 1062 (69%)
Indications predicted: 8


Percentage of true predictions (P = 1): 60%
Number of true predictions (P = 1): 642
Indications predicted: 4




Unnamed: 0,drug,disease,predicted,prediction,drug_name,drug_categories,disease_name,disease_pathophysiology
7859,DB01274,DOID_9970,true,1.0,Arformoterol,Adrenergic beta-2 Receptor Agonists|Bronchodil...,obesity,metabolic
162532,DB01581,DOID_9970,true,1.0,Sulfamerazine,Sulfonamides,obesity,metabolic
144448,DB01154,DOID_9970,true,1.0,Thiamylal,,obesity,metabolic
144722,DB01158,DOID_9970,true,1.0,Bretylium,Antihypertensive Agents|Anti-Arrhythmia Agents...,obesity,metabolic
144859,DB01159,DOID_9970,true,1.0,Halothane,"Anesthetics, Inhalation|Anesthetics",obesity,metabolic
144996,DB01160,DOID_9970,true,1.0,Dinoprost Tromethamine,"Oxytocics|Abortifacient Agents, Nonsteroidal|A...",obesity,metabolic
145133,DB01161,DOID_9970,true,1.0,Chloroprocaine,"Anesthetics, Local",obesity,metabolic
145407,DB01169,DOID_9970,true,1.0,Arsenic trioxide,Antineoplastic Agents|Homeopathic Agents,obesity,metabolic
146092,DB01180,DOID_9970,true,1.0,Rescinnamine,,obesity,metabolic
146229,DB01182,DOID_9970,true,1.0,Propafenone,Anti-Arrhythmia Agents|Voltage-Gated Sodium Ch...,obesity,metabolic


### Discussion

All cases ~60% drugs are predicted as true with a probability of P = 1.0!!!

### NGLY1 deficiency DOID:0060728

In [253]:
# predictions
print('True predictions: {} ({}%)'.format(len(predictions_df.query('disease == "DOID_0060728" & predicted == "true"')),round(len(predictions_df.query('disease == "DOID_0060728" & predicted == "true"'))*100/len(predictions_df.query('disease == "DOID_0060728"')))))
print()
true = predictions_df.query('disease == "DOID_0060728" & predicted == "true"')
print('Percentage of true predictions (P = 1): {}%'.format(round(
            100*len(true.query('prediction == 1'))/len(true)))
     )
print('Number of true predictions (P = 1): {}'.format(
            len(true.query('prediction == 1')))
     )
predictions_df.query('disease == "DOID_0060728" & predicted == "true" & prediction == 1.').sort_values(by= 'prediction', ascending= False)

True predictions: 158 (10%)

Percentage of true predictions (P = 1): 34%
Number of true predictions (P = 1): 53


Unnamed: 0,drug,disease,predicted,prediction,drug_name,drug_categories,disease_name,disease_pathophysiology
8630,DB00651,DOID_0060728,True,1.0,Dyphylline,Phosphodiesterase Inhibitors|Bronchodilator Ag...,,
139191,DB01088,DOID_0060728,True,1.0,Iloprost,,,
140561,DB01102,DOID_0060728,True,1.0,Arbutamine,,,
146452,DB01183,DOID_0060728,True,1.0,Naloxone,Narcotic Antagonists|Central Nervous System De...,,
147548,DB01201,DOID_0060728,True,1.0,Rifapentine,"Leprostatic Agents|Antibiotics, Antitubercular...",,
149055,DB01220,DOID_0060728,True,1.0,Rifaximin,Gastrointestinal Agents|Anti-Infective Agents,,
153850,DB01329,DOID_0060728,True,1.0,Cefoperazone,Anti-Bacterial Agents|Cephalosporins,,
154261,DB01333,DOID_0060728,True,1.0,Cefradine,Anti-Bacterial Agents|Cephalosporins,,
157275,DB01382,DOID_0060728,True,1.0,Glycodiazine,,,
170564,DB04920,DOID_0060728,True,1.0,Clevidipine,,,
