### Sort AMIE rules

In [1]:
def sort_and_store_amie_rules(in_path, out_path):
    with open(in_path) as file:
        amie_rules = file.readlines()
# Copying AMIE details (head lines)
    heading = []
    i = 0
    while True:
        line = amie_rules[i]
        i += 1
        if line.startswith("?"):
            break
        heading.append(line)
# Filter and sort rules w.r.t. Partial Compleness Assumption (PCA) confidence
    amie_rules = list(filter(lambda x: x.startswith("?"), amie_rules))
    amie_rules = sorted(amie_rules, key=lambda x: x.split("\t")[3], reverse=True)
    print("\nFound {} rules".format(len(amie_rules)))
# Store sorted rules
    with open(out_path, "w") as file:
        file.writelines(heading+amie_rules)

In [10]:
sort_and_store_amie_rules("./amie/out_amie_v3_sameAs_v1.txt", "./amie/amie_sorted_rules_v3.txt")


Found 4329 rules


In [3]:
sort_and_store_amie_rules("./amie/out_amie_v2.txt", "./amie/amie_sorted_rules_v2.txt")


Found 4361 rules


### Sort AnyBURL rules

java -Xmx3G -cp AnyBURL-RE.jar de.unima.ki.anyburl.LearnReinforced config-wn.properties

java -Xmx3G -cp AnyBURL-RE.jar de.unima.ki.anyburl.Apply config-apply.properties

java -Xmx3G -cp AnyBURL-RE.jar de.unima.ki.anyburl.Eval config-eval.properties

In [65]:
def sort_and_store_anyBURL_rules(in_path, out_path):
    with open(in_path) as file:
        anyBURL_rules = file.readlines()
# Filter and sort rules w.r.t. confidence
#    anyBURL_rules = list(filter(lambda x: x.split("<=")[-1].strip() != "", anyBURL_rules))
    anyBURL_rules = sorted(anyBURL_rules, key=lambda x: x.split("\t")[2], reverse=True)
    print("\nFound {} rules".format(len(anyBURL_rules)))
# Store sorted rules
    with open(out_path, "w") as file:
        file.writelines(anyBURL_rules)

In [None]:
#sort_and_store_anyBURL_rules("./anyBURL/rules/alpha-100", "./anyBURL/rules/sorted_rules_v1")

### Rule learning with enriched KGs

#### Link prediction with AnyBURL-RE

WN18RR `Hits@1: 0.4693 Hits@10: 0.5265 MRR: 0.5936`

FB15K `Hits@1: 0.2779  Hits@10: 0.4241 MRR: 0.5689`

Mutagenesis: `Hits@1: 0.4772  Hits@10: 0.5473 MRR: 0.6394`

Carcinogenesis: `Hits@1: 0.4834 Hits@10: 0.5330 MRR: 0.6300`

YAGO3: `Hits@1: 0.3608 Hits@10: 0.4415 MRR: 0.5249` 

#### After enriching KG

WN18RR `Hits@1: 0.4674 Hits@10: 0.5240 MRR: 0.5933`        

FB15K `Hits@1: 0.2769   0.4210   0.5649` 

Mutagenesis: `Hits@1: 0.4747  Hits@10: 0.5463 MRR: 0.6374` 

Carcinogenesis: `Hits@1:0.4447 Hits@10: 0.5056 MRR: 0.5949`

YAGO3: `Hits@1: 0.3718 Hits@10: 0.4572 MRR: 0.5307`


### Counting new rules and lost rules

In [1]:
def compare_rules_anyburl(before_enrich_rules, after_enrich_rules, min_score=0.0):
    with open(before_enrich_rules) as file:
        rules_1 = list(filter(lambda x: float(x.split('\t')[2])>=min_score, file.readlines()))
        rules_1 = set(list(map(lambda x: x.split('\t')[-1].strip('\n'), rules_1)))
    with open(after_enrich_rules) as file:
        rules_2 = list(filter(lambda x: float(x.split('\t')[2])>=min_score, file.readlines()))
        rules_2 = set(list(map(lambda x: x.split('\t')[-1].strip('\n'), rules_2)))
        
    new_rules = rules_2-rules_1
    
    dropped_rules = rules_1-rules_2
    
    same_rules = rules_1.intersection(rules_2)
    
    print(f'There were {len(rules_1)} rules before, there are {len(rules_2)} after enrichment')
    print()
    print(f'There are {len(same_rules)} same rules, {len(new_rules)} new rules, {len(dropped_rules)} dropped rules')
    
    return same_rules, rules_1, rules_2, new_rules, dropped_rules

In [85]:
same_rules_wn18rr, rules_before_wn18rr, rules_after_wn18rr, new_rules_anyburl_wn18rr, dropped_rules_anyburl_wn18rr = compare_rules_anyburl('AnyBURL/rules_wn18rr/alpha-500', 'AnyTrans/rules_wn18rr/alpha-500')

There were 35164 rules before, there are 40539 after enrichment

There are 30542 same rules, 9997 new rules, 4622 dropped rules


In [86]:
same_rules_mut, rules_before_mut, rules_after_mut, new_rules_anyburl_mut, dropped_rules_anyburl_mut = compare_rules_anyburl('AnyBURL/rules_mutagenesis/alpha-500', 'AnyTrans/rules_mutagenesis/alpha-500')

There were 269660 rules before, there are 125910 after enrichment

There are 42533 same rules, 83377 new rules, 227127 dropped rules


In [87]:
same_rules_car, rules_before_car, rules_after_car, new_rules_anyburl_car, dropped_rules_anyburl_car = compare_rules_anyburl('AnyBURL/rules_carcinogenesis/alpha-500', 'AnyTrans/rules_carcinogenesis/alpha-500')

There were 177924 rules before, there are 101030 after enrichment

There are 45682 same rules, 55348 new rules, 132242 dropped rules


In [88]:
same_rules_fb, rules_before_fb, rules_after_fb, new_rules_anyburl_fb, dropped_rules_anyburl_fb = compare_rules_anyburl('AnyBURL/rules_fb15k/alpha-500', 'AnyTrans/rules_fb15k/alpha-500')

There were 1414568 rules before, there are 2000603 after enrichment

There are 648984 same rules, 1351619 new rules, 765584 dropped rules


In [89]:
same_rules_yago, rules_before_yago, rules_after_yago, new_rules_anyburl_yago, dropped_rules_anyburl_yago = compare_rules_anyburl('AnyBURL/rules_yago3/alpha-500', 'AnyTrans/rules_yago3/alpha-500')

There were 852324 rules before, there are 732241 after enrichment

There are 241359 same rules, 490882 new rules, 610965 dropped rules


In [90]:
same_rules_drkg, rules_before_drkg, rules_after_drkg, new_rules_anyburl_drkg, dropped_rules_anyburl_drkg = compare_rules_anyburl('AnyBURL/rules_drkg/alpha-500', 'AnyTrans/rules_drkg/alpha-500')

There were 1573497 rules before, there are 1417026 after enrichment

There are 389627 same rules, 1027399 new rules, 1183870 dropped rules


In [91]:
same_rules_obl, rules_before_obl, rules_after_obl, new_rules_anyburl_obl, dropped_rules_anyburl_obl = compare_rules_anyburl('AnyBURL/rules_openbiolink/alpha-500', 'AnyTrans/rules_openbiolink/alpha-500')

There were 402256 rules before, there are 448267 after enrichment

There are 278572 same rules, 169695 new rules, 123684 dropped rules


### AMIE+

In [13]:
def compare_rules_amie(rules_before, rules_after):
    with open(rules_before) as file:
        Rules_1 = file.read().strip().split('\n')
    with open(rules_after) as file:
        Rules_2 = file.read().strip().split('\n')
    rules_1 = list(filter(lambda x: x.startswith("?"), Rules_1))
    rules_2 = list(filter(lambda x: x.startswith("?"), Rules_2))
    Rules_1 = set(map(lambda x: x.split('\t')[0], rules_1))
    Rules_2 = set(map(lambda x: x.split('\t')[0], rules_2))
    
    new_rules = Rules_2-Rules_1
    
    dropped_rules = Rules_1-Rules_2
    
    same_rules = Rules_1.intersection(Rules_2)
    
    new_rules = sorted(filter(lambda x: x.split('\t')[0] in new_rules, rules_2),key=lambda t: float(t.split('\t')[3]), reverse=True)
    dropped_rules = sorted(filter(lambda x: x.split('\t')[0] in dropped_rules, rules_1),key=lambda t: float(t.split('\t')[3]), reverse=True)
    same_rules = sorted(filter(lambda x: x.split('\t')[0] in same_rules, rules_1),key=lambda t: float(t.split('\t')[3]), reverse=True)
    rules_1 = sorted(rules_1, key=lambda t: float(t.split('\t')[3]), reverse=True)
    rules_2 = sorted(rules_2, key=lambda t: float(t.split('\t')[3]), reverse=True)
    
    print(f'There were {len(Rules_1)} rules before, there are {len(Rules_2)} after enrichment')
    print()
    print(f'There are {len(same_rules)} same rules, {len(new_rules)} new rules, {len(dropped_rules)} dropped rules')
    
    #print('\nNew rules', new_rules)
    #print('\nDropped rules', dropped_rules)
    return same_rules, rules_1, rules_2, new_rules, dropped_rules

## WN18RR

In [14]:
same_rules_amie_wn18rr, rules_amie_before_wn18rr, rules_amie_after_wn18rr, \
new_rules_amie_wn18rr, dropped_rules_amie_wn18rr = compare_rules_amie('AMIE/rules_wn18rr/rules.txt', 'TransAMIE/rules_wn18rr/rules.txt')

There were 34 rules before, there are 90 after enrichment

There are 24 same rules, 66 new rules, 10 dropped rules


In [16]:
same_rules_amie_wn18rr[:10]

['?b  _verb_group  ?a   => ?a  _verb_group  ?b\t0.931458699\t0.931458699\t0.980573543\t1060\t1138\t1081\t?b',
 '?b  _derivationally_related_form  ?a   => ?a  _derivationally_related_form  ?b\t0.932222783\t0.932222783\t0.95163008\t27701\t29715\t29109\t?b',
 '?b  _also_see  ?a   => ?a  _also_see  ?b\t0.637413395\t0.637413395\t0.883671291\t828\t1299\t937\t?b',
 '?a  _instance_hypernym  ?h  ?h  _synset_domain_topic_of  ?b   => ?a  _synset_domain_topic_of  ?b\t0.032092426\t0.395256917\t0.862068966\t100\t253\t116\t?a',
 '?a  _hypernym  ?h  ?h  _synset_domain_topic_of  ?b   => ?a  _synset_domain_topic_of  ?b\t0.178754814\t0.285056295\t0.809593023\t557\t1954\t688\t?a',
 '?a  _has_part  ?h  ?h  _synset_domain_topic_of  ?b   => ?a  _synset_domain_topic_of  ?b\t0.013478819\t0.259259259\t0.79245283\t42\t162\t53\t?a',
 '?b  _hypernym  ?h  ?a  _member_of_domain_usage  ?h   => ?a  _member_of_domain_usage  ?b\t0.049284579\t0.196202532\t0.756097561\t31\t158\t41\t?b',
 '?g  _has_part  ?a  ?g  _synset_do

In [18]:
dropped_rules_amie_wn18rr[:10]

['?h  _synset_domain_topic_of  ?b  ?a  _verb_group  ?h   => ?a  _synset_domain_topic_of  ?b\t0.012516046\t0.375\t0.75\t39\t104\t52\t?a',
 '?g  _synset_domain_topic_of  ?b  ?g  _verb_group  ?a   => ?a  _synset_domain_topic_of  ?b\t0.012516046\t0.386138614\t0.75\t39\t101\t52\t?a',
 '?g  _hypernym  ?b  ?a  _member_of_domain_region  ?g   => ?a  _member_of_domain_region  ?b\t0.01191766\t0.021072797\t0.733333333\t11\t522\t15\t?b',
 '?g  _hypernym  ?b  ?a  _member_of_domain_usage  ?g   => ?a  _member_of_domain_usage  ?b\t0.028616852\t0.04400978\t0.72\t18\t409\t25\t?b',
 '?a  _has_part  ?h  ?b  _instance_hypernym  ?h   => ?a  _has_part  ?b\t0.029900332\t0.41025641\t0.52173913\t144\t351\t276\t?b',
 '?g  _derivationally_related_form  ?b  ?a  _derivationally_related_form  ?g   => ?a  _verb_group  ?b\t0.422671353\t0.008085393\t0.145537065\t481\t59490\t3305\t?b',
 '?a  _also_see  ?h  ?b  _also_see  ?h   => ?a  _also_see  ?b\t0.16243264\t0.094831461\t0.115997801\t211\t2225\t1819\t?b',
 '?h  _also_se

In [80]:
# Dropped rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), dropped_rules_amie_wn18rr)))/len(dropped_rules_amie_wn18rr))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), dropped_rules_amie_wn18rr)))/len(dropped_rules_amie_wn18rr))

PCA: 0.4063505224
Std: 0.15840383379999998


In [20]:
new_rules_amie_wn18rr[-10:]

['?a  _member_meronym  ?b   => ?a  _similar_to  ?b\t0.460526316\t0.009140768\t0.707070707\t70\t7658\t99\t?b',
 '?a  _hypernym  ?b   => ?a  _similar_to  ?b\t0.414473684\t0.001971091\t0.692307692\t63\t31962\t91\t?b',
 '?a  _also_see  ?b   => ?a  _similar_to  ?b\t0.513157895\t0.06824147\t0.684210526\t78\t1143\t114\t?b',
 '?h  _hypernym  ?b  ?a  _member_of_domain_usage  ?h   => ?a  _member_of_domain_usage  ?b\t0.01610306\t0.02617801\t0.666666667\t10\t382\t15\t?b',
 '?a  _derivationally_related_form  ?b   => ?a  _similar_to  ?b\t0.513157895\t0.003055469\t0.527027027\t78\t25528\t148\t?b',
 '?a  _member_meronym  ?b   => ?a  _has_part  ?b\t0.015331462\t0.00927135\t0.360406091\t71\t7658\t197\t?b',
 '?a  _derivationally_related_form  ?b   => ?a  _member_meronym  ?b\t0.010446592\t0.003133814\t0.283687943\t80\t25528\t282\t?b',
 '?a  _hypernym  ?b   => ?a  _also_see  ?b\t0.058617673\t0.002096239\t0.201201201\t67\t31962\t333\t?b',
 '?a  _derivationally_related_form  ?b   => ?a  _has_part  ?b\t0.0166

In [79]:
# New rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), new_rules_amie_wn18rr)))/len(new_rules_amie_wn18rr))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), new_rules_amie_wn18rr)))/len(new_rules_amie_wn18rr))

PCA: 0.8880675046818184
Std: 0.6548039246212123


In [78]:
# Same rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), same_rules_amie_wn18rr)))/len(same_rules_amie_wn18rr))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), same_rules_amie_wn18rr)))/len(same_rules_amie_wn18rr))

PCA: 0.5201082241249999
Std: 0.2835829905833334


In [76]:
# Before
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_before_wn18rr)))/len(rules_amie_before_wn18rr))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_before_wn18rr)))/len(rules_amie_before_wn18rr))

PCA: 0.4866500765588235
Std: 0.24676559152941172


In [75]:
# After enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_after_wn18rr)))/len(rules_amie_after_wn18rr))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_after_wn18rr)))/len(rules_amie_after_wn18rr))

PCA: 0.7775618269444449
Std: 0.5406861664222223


## Mutagenesis

In [26]:
same_rules_amie_mut, rules_amie_before_mut, rules_amie_after_mut, \
new_rules_amie_mut, dropped_rules_amie_mut = compare_rules_amie('AMIE/rules_mutagenesis/rules.txt', 'TransAMIE/rules_mutagenesis/rules.txt')

There were 9 rules before, there are 17 after enrichment

There are 9 same rules, 8 new rules, 0 dropped rules


In [29]:
# Dropped rules, undefined
#sum(list(map(lambda x: float(x.split('\t')[3]), dropped_rules_amie_mut)))/len(dropped_rules_amie_mut)

In [71]:
# Before enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_before_mut)))/len(rules_amie_before_mut))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_before_mut)))/len(rules_amie_before_mut))

PCA: 0.5425176136666666
Std: 0.43713459811111105


In [72]:
# After enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_after_mut)))/len(rules_amie_after_mut))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_after_mut)))/len(rules_amie_after_mut))

PCA: 0.5532036077058823
Std: 0.47268887935294107


In [73]:
# Same rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), same_rules_amie_mut)))/len(same_rules_amie_mut))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), same_rules_amie_mut)))/len(same_rules_amie_mut))

PCA: 0.5425176136666666
Std: 0.43713459811111105


In [74]:
# New rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), new_rules_amie_mut)))/len(new_rules_amie_mut))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), new_rules_amie_mut)))/len(new_rules_amie_mut))

PCA: 0.5625
Std: 0.47321428574999996


## Carcinogenesis

In [34]:
same_rules_amie_car, rules_amie_before_car, rules_amie_after_car, \
new_rules_amie_car, dropped_rules_amie_car = compare_rules_amie('AMIE/rules_carcinogenesis/rules.txt', 'TransAMIE/rules_carcinogenesis/rules.txt')

There were 185 rules before, there are 317 after enrichment

There are 147 same rules, 170 new rules, 38 dropped rules


In [66]:
# Before enrichment
print('PCA:', sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_before_car)))/len(rules_amie_before_car))
print('Std:', sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_before_car)))/len(rules_amie_before_car))

PCA: 0.7375154245621616
Std: 0.5162657981027027


In [67]:
# After enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_after_car)))/len(rules_amie_after_car))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_after_car)))/len(rules_amie_after_car))

PCA: 0.695144739694006
Std: 0.4452791533312304


In [68]:
# New rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), new_rules_amie_car)))/len(new_rules_amie_car))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), new_rules_amie_car)))/len(new_rules_amie_car))

PCA: 0.6813794434588235
Std: 0.4150670346941179


In [69]:
# Dropped rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), dropped_rules_amie_car)))/len(dropped_rules_amie_car))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), dropped_rules_amie_car)))/len(dropped_rules_amie_car))

PCA: 0.7535989299210527
Std: 0.5282689622105262


In [70]:
# Same rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), same_rules_amie_car)))/len(same_rules_amie_car))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), same_rules_amie_car)))/len(same_rules_amie_car))

PCA: 0.7333577837210885
Std: 0.5131629393537415


## FB15k

In [40]:
same_rules_amie_fb, rules_amie_before_fb, rules_amie_after_fb, \
new_rules_amie_fb, dropped_rules_amie_fb = compare_rules_amie('AMIE/rules_fb15k/rules.txt', 'TransAMIE/rules_fb15k/rules.txt')

There were 7591 rules before, there are 10109 after enrichment

There are 7591 same rules, 4905 new rules, 0 dropped rules


In [81]:
# Before enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_before_fb)))/len(rules_amie_before_fb))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_before_fb)))/len(rules_amie_before_fb))

PCA: 0.5633267253446179
Std: 0.3071214942907393


In [82]:
# After enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_after_fb)))/len(rules_amie_after_fb))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_after_fb)))/len(rules_amie_after_fb))

PCA: 0.5629404381041404
Std: 0.27093583009590555


In [44]:
# dropped rules, no dropped rules
#sum(list(map(lambda x: float(x.split('\t')[3]), dropped_rules_amie_fb)))/len(dropped_rules_amie_fb)

In [83]:
# New rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), new_rules_amie_fb)))/len(new_rules_amie_fb))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), new_rules_amie_fb)))/len(new_rules_amie_fb))

PCA: 0.5460132989651448
Std: 0.1516606972941899


In [84]:
# Same rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), same_rules_amie_fb)))/len(same_rules_amie_fb))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), same_rules_amie_fb)))/len(same_rules_amie_fb))

PCA: 0.5633267253446179
Std: 0.3071214942907393


## YAGO3

In [47]:
same_rules_amie_yago, rules_amie_before_yago, rules_amie_after_yago, \
new_rules_amie_yago, dropped_rules_amie_yago = compare_rules_amie('AMIE/rules_yago3/rules.txt', 'TransAMIE/rules_yago3/rules.txt')

There were 235 rules before, there are 310 after enrichment

There are 116 same rules, 194 new rules, 119 dropped rules


In [85]:
# Before enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_before_yago)))/len(rules_amie_before_yago))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_before_yago)))/len(rules_amie_before_yago))

PCA: 0.3242538606127661
Std: 0.19154214194468083


In [86]:
# After enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_after_yago)))/len(rules_amie_after_yago))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_after_yago)))/len(rules_amie_after_yago))

PCA: 0.6707416839645169
Std: 0.6007938599645165


In [87]:
# New rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), new_rules_amie_yago)))/len(new_rules_amie_yago))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), new_rules_amie_yago)))/len(new_rules_amie_yago))

PCA: 0.8879264069587636
Std: 0.8583710295567017


In [88]:
# Dropped rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), dropped_rules_amie_yago)))/len(dropped_rules_amie_yago))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), dropped_rules_amie_yago)))/len(dropped_rules_amie_yago))

PCA: 0.2895052532268907
Std: 0.14846234643697478


In [89]:
# Same rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), same_rules_amie_yago)))/len(same_rules_amie_yago))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), same_rules_amie_yago)))/len(same_rules_amie_yago))

PCA: 0.35990113887931047
Std: 0.23573607009482755


## Open Bio Link

In [53]:
same_rules_amie_op, rules_amie_before_op, rules_amie_after_op, \
new_rules_amie_op, dropped_rules_amie_op = compare_rules_amie('AMIE/rules_openbiolink/rules.txt', 'TransAMIE/rules_openbiolink/rules.txt')

There were 713 rules before, there are 611 after enrichment

There are 510 same rules, 101 new rules, 203 dropped rules


In [90]:
# Before enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_before_op)))/len(rules_amie_before_op))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_before_op)))/len(rules_amie_before_op))

PCA: 0.31666675416129036
Std: 0.2804260248106593


In [91]:
# After enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_after_op)))/len(rules_amie_after_op))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_after_op)))/len(rules_amie_after_op))

PCA: 0.3353037292291323
Std: 0.2905857795957448


In [92]:
# Dropped rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), dropped_rules_amie_op)))/len(dropped_rules_amie_op))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), dropped_rules_amie_op)))/len(dropped_rules_amie_op))

PCA: 0.24821895901477792
Std: 0.21569599121182248


In [93]:
# New rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), new_rules_amie_op)))/len(new_rules_amie_op))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), new_rules_amie_op)))/len(new_rules_amie_op))

PCA: 0.6501759515148514
Std: 0.5803524768019802


In [94]:
# Same rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), same_rules_amie_op)))/len(same_rules_amie_op))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), same_rules_amie_op)))/len(same_rules_amie_op))

PCA: 0.3439116608568628
Std: 0.306191116615686


## DRKG

In [60]:
same_rules_amie_drkg, rules_amie_before_drkg, rules_amie_after_drkg, \
new_rules_amie_drkg, dropped_rules_amie_drkg = compare_rules_amie('AMIE/rules_drkg/rules.txt', 'TransAMIE/rules_drkg/rules.txt')

There were 2804 rules before, there are 2776 after enrichment

There are 1303 same rules, 1473 new rules, 1501 dropped rules


In [95]:
# Before enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_before_drkg)))/len(rules_amie_before_drkg))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_before_drkg)))/len(rules_amie_before_drkg))

PCA: 0.2894146629104849
Std: 0.16976895299358047


In [96]:
# After enrichment
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), rules_amie_after_drkg)))/len(rules_amie_after_drkg))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), rules_amie_after_drkg)))/len(rules_amie_after_drkg))

PCA: 0.28688603990669986
Std: 0.17173715055079278


In [97]:
# New rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), new_rules_amie_drkg)))/len(new_rules_amie_drkg))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), new_rules_amie_drkg)))/len(new_rules_amie_drkg))

PCA: 0.2713560277094369
Std: 0.1468777073951121


In [98]:
# Dropped rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), dropped_rules_amie_drkg)))/len(dropped_rules_amie_drkg))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), dropped_rules_amie_drkg)))/len(dropped_rules_amie_drkg))

PCA: 0.27349933227981377
Std: 0.14287335827581618


In [99]:
# Same rules
print('PCA:',sum(list(map(lambda x: float(x.split('\t')[3]), same_rules_amie_drkg)))/len(same_rules_amie_drkg))
print('Std:',sum(list(map(lambda x: float(x.split('\t')[2]), same_rules_amie_drkg)))/len(same_rules_amie_drkg))

PCA: 0.3077484397920187
Std: 0.20075152219646988


In [9]:
#def check_grounding(new_rules, kg_name):
#    with open('../datasets/'+kg_name) as file:
#        triples = file.read()
#    for rule in new_rules:
#        body, head = rule.split('=>')
#        head = head.split('\t')[0]
#        body_rels = [x for x in body.split(' ') if not '?' in x]
#        for rel in

In [1]:
#for rule in new_rules_amie_wn18rr:
#    #pieces = rule.split(' ')
#    #print(pieces)
#    #print(pieces[2], pieces[10])
#    body, head = rule.split('=>')
#    head = head.split('\t')[0]
#    body_rels = [x for x in body.split(' ') if not '?' in x and x!='']
#    head_rels = [x for x in head.split(' ') if not '?' in x and x!='']
#    print(body_rels + head_rels)

In [8]:
import pandas as pd
from tqdm import tqdm

In [13]:
def check_grounding(new_rules, kg_name=''):
    with open('datasets/'+kg_name+'/train_completed_TransE.txt') as file:
        triples = file.read().strip().split('\n')
    triples = pd.DataFrame(triples, columns=['triple'])
    grounding = {}
    for rule in tqdm(new_rules, desc='New rules...'):
        body, head = rule.split('=>')
        head = head.split('\t')[0]
        body_rels = [x for x in body.split(' ') if not '?' in x and x!='']
        head_rels = [x for x in head.split(' ') if not '?' in x and x!='']
        
        #potential_triples_body = [triple for triple in triples if any([rel in triple for rel in body_rels])]
        #potential_triples_head = [triple for triple in triples if any([rel in triple for rel in head_rels])]
        all_body_triples = triples[triples['triple'].apply(lambda x: x.split('\t')[1] in body_rels)].values.squeeze()
        all_head_triples = triples[triples['triple'].apply(lambda x: x.split('\t')[1] in head_rels)].values.squeeze()
        
        body_entities = set()
        for triple in all_body_triples:
            body_entities.update(set(triple.split('\t')[0::2]))
            
        binding_head_triples = 0.0
        for triple in all_head_triples:
            binding_head_triples += float(set(triple.split('\t')[0::2]) < body_entities)
        confidence = binding_head_triples/len(all_head_triples)
        grounding[rule.split('\t')[0]] = confidence
    return dict(sorted(grounding.items(), key=lambda x: x[1], reverse=True))

### SAFRAN

Steps: 1) run ./SAFRAN calcjacc config-file, 2) run ./SAFRAN learnnrnoisy config-file, and 3) ./SAFRAN applynrnoisy config-file

Link prediction: python SAFRAN/python/eval.py safran/results_drkg/predictions.txt datasets/drkg/test.txt

### Link prediction results before KG enrichment using SAFRAN

WN18RR
`MRR: 0.506
Hits@1: 0.465
Hits@3: 0.526
Hits@10: 0.595`

DRKG
`MRR: 0.429
Hits@1: 0.385
Hits@3: 0.453
Hits@10: 0.533`

MUTAGENESIS
`MRR: 0.529
Hits@1: 0.478
Hits@3: 0.556
Hits@10: 0.650`

CARCINOGENESIS
`MRR: 0.524
Hits@1: 0.482
Hits@3: 0.538
Hits@10: 0.639`

OPENBIOLINK
`MRR: 0.246
Hits@1: 0.170
Hits@3: 0.283
Hits@10: 0.440`

FB15k
` MRR: 0.380
Hits@1: 0.288
Hits@3: 0.437
Hits@10: 0.581
`

YAGO3:
`MRR: 0.427
Hits@1: 0.375
Hits@3: 0.462
Hits@10: 0.533`

### After KG enrichment

WN18RR
`MRR: 0.508
Hits@1: 0.469
Hits@3: 0.527
Hits@10: 0.597`

MUTAGENESIS
`MRR: 0.527
Hits@1: 0.475
Hits@3: 0.559
Hits@10: 0.646`

CARCINOGENESIS
`MRR: 0.489
Hits@1: 0.447
Hits@3: 0.506
Hits@10: 0.602`


FB15k
` MRR: 0.384
Hits@1: 0.290
Hits@3: 0.444
Hits@10: 0.587
`


### Compare mined rules (AMIE+)

In [2]:
### import numpy as np
with open("./amie/out_amie_v1.txt") as file:
    rules_v1 = file.readlines()
    rules_v1 = list(filter(lambda x: x.startswith("?"), rules_v1))
with open("./amie/out_amie_v3_sameAs_v1.txt") as file:
    rules_v2 = file.readlines()
    rules_v2 = list(filter(lambda x: x.startswith("?"), rules_v2))


In [3]:
rules_v1[0]

'?b  INTACT::DEPHOSPHORYLATION REACTION::Gene:Gene  ?a   => ?a  INTACT::DEPHOSPHORYLATION REACTION::Gene:Gene  ?b\t0.227722772\t0.227722772\t0.250909091\t69\t303\t275\t?a\n'

In [4]:
import numpy as np
rule_expression1 = set(list(map(lambda x: x.split("\t")[0], rules_v1)))
rule_expression2 = set(list(map(lambda x: x.split("\t")[0], rules_v2)))
index = len(rule_expression1.intersection(rule_expression2))/len(rule_expression1.union(rule_expression2))
print("Jaccard index of mined rules v1 and v2: ", index)

# Confidence of mined rules
avg_conf1 = np.mean([float(rule.split("\t")[3]) for rule in rules_v1])
avg_conf2 = np.mean([float(rule.split("\t")[3]) for rule in rules_v2])

print("\nAvg confidence v1: {}, v2: {}".format(avg_conf1, avg_conf2))

Jaccard index of mined rules v1 and v2:  0.9876004592422503

Avg confidence v1: 0.3118219945172175, v2: 0.3118627327964888


### Compare mined rules (AnyBURL)

In [70]:
def return_confidence(rule):
    return float(rule.split("\t")[2])

In [81]:
threshold = 1.
anyBURL_rules_filt = list(filter(lambda x: return_confidence(x)>=threshold, anyBURL_rules_v1))

anyBURL_rules_filt[-1]

'5\t5\t1.0\tSTRING::OTHER::Gene:Gene(Gene::83998,Y) <= STRING::OTHER::Gene:Gene(Y,Gene::83998)\n'

In [87]:
### import numpy as np
with open("./anyBURL/rules/sorted_rules_v1") as file:
    anyBURL_rules_v1 = file.readlines()
with open("./anyBURL/rules/sorted_rules_v2") as file:
    anyBURL_rules_v2 = file.readlines()

# Jaccard index
#anyBURL_rules_v1 = list(filter(lambda x: return_confidence(x)>=threshold, anyBURL_rules_v1))
#anyBURL_rules_v2 = list(filter(lambda x: return_confidence(x)>=threshold, anyBURL_rules_v2))

rule_expression1 = set(list(map(lambda x: x.split("\t")[3], anyBURL_rules_v1)))
rule_expression2 = set(list(map(lambda x: x.split("\t")[3], anyBURL_rules_v2)))
index = len(rule_expression1.intersection(rule_expression2))/len(rule_expression1.union(rule_expression2))
print("Jaccard index of mined rules v1 and v2: ", index)

# Confidence of mined rules
avg_conf1 = np.mean([float(rule.split("\t")[2]) for rule in anyBURL_rules_v1])
avg_conf2 = np.mean([float(rule.split("\t")[2]) for rule in anyBURL_rules_v2])

print("\nAvg confidence v1: {}, v2: {}".format(avg_conf1, avg_conf2))

Jaccard index of mined rules v1 and v2:  0.2712462672137839

Avg confidence v1: 0.5285819316297514, v2: 0.534223820369677


In [83]:
anyBURL_rules_v1

['50\t50\t1.0\tSTRING::BINDING::Gene:Gene(Gene::51077,Y) <= STRING::BINDING::Gene:Gene(Y,Gene::51077)\n',
 '211\t211\t1.0\tSTRING::CATALYSIS::Gene:Gene(X,Gene::1233) <= STRING::REACTION::Gene:Gene(X,Gene::1233)\n',
 '211\t211\t1.0\tSTRING::CATALYSIS::Gene:Gene(X,Gene::1233) <= STRING::REACTION::Gene:Gene(Gene::1233,X)\n',
 '13\t13\t1.0\tSTRING::OTHER::Gene:Gene(X,Gene::5140) <= STRING::OTHER::Gene:Gene(Gene::5140,X)\n',
 '105\t105\t1.0\tSTRING::BINDING::Gene:Gene(Gene::23063,Y) <= STRING::BINDING::Gene:Gene(Y,Gene::23063)\n',
 '6\t6\t1.0\tGNBR::E+::Gene:Gene(Gene::6287,Y) <= GNBR::E+::Gene:Gene(Y,Gene::6287)\n',
 '17\t17\t1.0\tSTRING::OTHER::Gene:Gene(Gene::788,Y) <= STRING::OTHER::Gene:Gene(Y,Gene::788)\n',
 '3\t3\t1.0\tGNBR::E::Compound:Gene(Compound::DB11257,Y) <= GNBR::E+::Compound:Gene(Compound::DB11257,Y)\n',
 '112\t112\t1.0\tSTRING::BINDING::Gene:Gene(X,Gene::2072) <= STRING::BINDING::Gene:Gene(Gene::2072,X)\n',
 '26\t26\t1.0\tSTRING::REACTION::Gene:Gene(X,Gene::9984) <= STRING:

In [84]:
anyBURL_rules_v2[:20]

['50\t50\t1.0\tSTRING::BINDING::Gene:Gene(Gene::51077,Y) <= STRING::BINDING::Gene:Gene(Y,Gene::51077)\n',
 '3\t3\t1.0\tGNBR::Rg::Gene:Gene(Gene::23191,Y) <= GNBR::Rg::Gene:Gene(Y,Gene::23191)\n',
 '2\t2\t1.0\tbioarx::HumGenHumGen:Gene:Gene(Gene::27284,Y) <= INTACT::PHYSICAL ASSOCIATION::Gene:Gene(Gene::27284,Y)\n',
 '211\t211\t1.0\tSTRING::CATALYSIS::Gene:Gene(X,Gene::1233) <= STRING::REACTION::Gene:Gene(X,Gene::1233)\n',
 '211\t211\t1.0\tSTRING::CATALYSIS::Gene:Gene(X,Gene::1233) <= STRING::REACTION::Gene:Gene(Gene::1233,X)\n',
 '3\t3\t1.0\tSTRING::CATALYSIS::Gene:Gene(Gene::23774,Y) <= STRING::INHIBITION::Gene:Gene(Y,Gene::23774)\n',
 '13\t13\t1.0\tSTRING::OTHER::Gene:Gene(X,Gene::5140) <= STRING::OTHER::Gene:Gene(Gene::5140,X)\n',
 '105\t105\t1.0\tSTRING::BINDING::Gene:Gene(Gene::23063,Y) <= STRING::BINDING::Gene:Gene(Y,Gene::23063)\n',
 '112\t112\t1.0\tSTRING::BINDING::Gene:Gene(X,Gene::2072) <= STRING::BINDING::Gene:Gene(Gene::2072,X)\n',
 '26\t26\t1.0\tSTRING::REACTION::Gene:Gene

In [86]:
len(set(anyBURL_rules_v1).intersection(set(anyBURL_rules_v2)))

57124

In [88]:
len(anyBURL_rules_v1)

290590

In [89]:
len(anyBURL_rules_v2)

301137