
# Rule suggestions


This notebook presents some possible rules that could be extacted, paired with their appropriate datasets.

---
### **Rule 1**: (A, "parents", B) and (C, "parents", B) and (A, "gender", "male") --> (C, "gender", "female")
"If some A has different parents B and C, and B is male, then C is female."


**Dataset**: fb13

This rule is somewhat homophobic, but I assume that the embedding will support it.


---
### **Rule 2**: (A, "daughterOf", B) and (B, "motherOf", C) --> (A, "sisterOf", C)
"If some A is the daughter of some B, and that B is a mother of C, then A is a sister of C."

**Dataset**: kinship

***Problem***: dataset is synthetic and quite small.

---
### **Rule 3**: (A, "wasBornIn", B) and (A, "livesIn", B) --> (A, "isCitizenOf", B)
"If some A was born in B and lives in B, then they are a citizen of B."

***Problem***: no data that fulfills rule.

**Dataset**: yago3_10

---
### **Rule 4**: (A, "livesIn", B) and (B, "hasCapital", C) --> (A, "livesIn", C)
"If some A lives in B, which is the capital of C, then A lives in C."

***Problem***: no data that fulfills rule. There are no true consequents of the rule in the dataset. SO if we have two triplets that fulfill the antecedent (A, "livesIn", B) and (B, "hasCapital", C), then there is no triplet in the dataset that can fulfill (A, "livesIn", C).

**Dataset**: yago3_10

---
### **Rule 5**: (A, "hasCapital", B) and (C, "isLeaderOf" B) --> (C, "isCitizenOf", B)
"If A is the capitol of B and C is the leader of B, then C is a citizen of B." (You cannot be a leader of a country without being a citizen of that country.)

***Problematic***: B is not nessecarily a country in this dataset. Could have written (B "hasCurrency" A), but then we encounter the same problem.

**Dataset**: yago3_10


---
### **Rule 6**: (A, "playsFor", B) --> (A, "gender", "male")
"If A plays for B, then A is a male."

This is an odd rule which is supported by the yago3_10 dataset. There are 334684 triplets that can be used to train the rule. 99% of the data is of players that are men, so the rule could be learnt from this dataset.


**Dataset**: yago3_10

In [24]:
import numpy as np
import ampligraph
import tensorflow as tf
from ampligraph.datasets import load_yago3_10, load_fb13
from ampligraph.evaluation import evaluate_performance
from ampligraph.evaluation import train_test_split_no_unseen 
from ampligraph.evaluation import mr_score, mrr_score, hits_at_n_score
from ampligraph.latent_features import save_model
from signature_tools import subset_by_signature, subset_by_strict_signature, subset_by_frequency, most_frequent_objects, most_frequent_predicates, most_frequent_targets

## fb13

In [25]:
fb13 = load_fb13()
fb13 = np.concatenate([fb13['train'], fb13['valid'], fb13['test']]) # combine the split data

In [29]:
most_frequent_objects(fb13, n = 10)

array([[37, 'albert_einstein'],
       [29, 'winston_churchill'],
       [29, 'theodore_roosevelt'],
       [28, 'edgar_allan_poe'],
       [28, 'thomas_jefferson'],
       [27, 'paul_newman'],
       [27, 'carl_sagan'],
       [26, 'harold_pinter'],
       [26, 'george_iii_of_the_united_kingdom'],
       [25, 'michael_crichton']], dtype=object)

In [28]:
most_frequent_predicates(fb13, n =10)

array([[73903, 'gender'],
       [62432, 'profession'],
       [61321, 'nationality'],
       [40579, 'place_of_death'],
       [37970, 'place_of_birth'],
       [28783, 'location'],
       [21150, 'institution'],
       [14897, 'cause_of_death'],
       [11158, 'religion'],
       [6546, 'ethnicity']], dtype=object)

In [27]:
most_frequent_targets(fb13, n = 10)

array([[59666, 'male'],
       [21075, 'united_states'],
       [14237, 'female'],
       [5973, 'politician'],
       [5815, 'germany'],
       [5049, 'writer'],
       [4501, 'united_kingdom'],
       [3894, 'england'],
       [3801, 'france'],
       [3600, 'paris']], dtype=object)

### **Rule 1**: (A, "parents", B) and (C, "parents", B) and (A, "gender", "male") --> (C, "gender", "female")

In [31]:
# extract triplets with relevant predicates for rule
parents_subset = subset_by_signature(fb13, [], ["parents"], [])
gender_subset = subset_by_signature(fb13, [], ["gender"], [])
gender_female_subset = subset_by_signature(fb13, [], ["gender"], ["female"])
gender_male_subset = subset_by_signature(fb13, [], ["gender"], ["male"])



# extract the objects and subjects that appear in the relevant triplets
parents_subjects = parents_subset[:,2]
gender_subjects = gender_subset[:,2]
parents_objects = parents_subset[:,0]
gender_objects = gender_subset[:,0]
gender_female_objects = gender_female_subset[:,0]
gender_male_objects = gender_male_subset[:,0]

# extract the objects and subjects that appear in multiple of the relevant predicates
parent_subjects_and_female_objects = np.intersect1d(parents_subjects, gender_female_objects)
parent_subjects_and_male_objects = np.intersect1d(parents_objects, gender_male_objects)
male_and_female_parents = np.concatenate((parent_subjects_and_female_objects, parent_subjects_and_male_objects))

# extract triplets that share subjects across all the relevant predicates
parents = subset_by_signature(parents_subset, list(male_and_female_parents), [], [])
gender_female = subset_by_signature(gender_female_subset, list(parent_subjects_and_female_objects), [], [])
gender_male = subset_by_signature(gender_male_subset, list(parent_subjects_and_male_objects), [], [])

# final dataset to be used to learn the gendered players rule
parents_female_male_dataset = np.concatenate((parents, gender_female, gender_male))
parents_female_male_dataset

array([['anna_e_roosevelt', 'parents', 'eleanor_roosevelt'],
       ['ethel_lilian_voynich', 'parents', 'george_boole'],
       ['prince_sigismund_of_prussia_kiel', 'parents',
        'princess_irene_of_hesse_and_by_rhine'],
       ...,
       ['joan_whitney_payson', 'gender', 'male'],
       ['ferdinando_stanley_5th_earl_of_derby', 'gender', 'male'],
       ['ferdinando_stanley_5th_earl_of_derby', 'gender', 'female']],
      dtype=object)

## Yago3_10

In [2]:
yago = load_yago3_10()
yago = np.concatenate([yago['train'], yago['valid'], yago['test']]) # combine the split data

In [3]:
most_frequent_objects(yago, n = 2)

array([[264, 'Frankfurt_Airport'],
       [259, 'Amsterdam_Airport_Schiphol']], dtype=object)

In [20]:
most_frequent_predicates(yago, n =2)

array([[377143, 'isAffiliatedTo'],
       [324048, 'playsFor']], dtype=object)

In [5]:
most_frequent_targets(yago, n = 2)

array([[61599, 'male'],
       [12309, 'United_States']], dtype=object)

### **Rule 3**: (A, "wasBornIn", B) and (A, "livesIn", B) --> (A, "isCitizenOf", B)

In [10]:
# extract triplets with relevant predicates for rule
born_subset = subset_by_signature(yago, [], ["wasBornIn"], [])
lives_subset = subset_by_signature(yago, [], ["livesIn"], [])
citizen_subset = subset_by_signature(yago, [], ["isCitizenOf"], [])

# extract the objects and subjects that appear in the relevant triplets
born_objects = born_subset[:,0]
born_subjects = born_subset[:,2]
lives_objects = lives_subset[:,0]
lives_subjects = lives_subset[:,2]
citizen_objects = citizen_subset[:,0]
citizen_subjects = citizen_subset[:,2]

# extract the objects that are common for wasBornIn, livesIn and isCitizenOf predicates
born_lives_objects = np.intersect1d(born_objects, lives_objects)
born_citizen_objects = np.intersect1d(born_objects, citizen_objects)
lives_citizen_objects = np.intersect1d(lives_objects, citizen_objects)
born_lives_citizen_objects_incomplete = np.intersect1d(born_lives_objects, born_citizen_objects)
born_lives_citizen_objects = np.intersect1d(born_lives_citizen_objects_incomplete, lives_citizen_objects)

# extract the subjects that are common for wasBornIn, livesIn and isCitizenOf predicates
born_lives_subjects = np.intersect1d(born_subjects, lives_subjects)
born_citizen_subjects = np.intersect1d(born_subjects, citizen_subjects)
lives_citizen_subjects = np.intersect1d(lives_subjects, citizen_subjects)
born_lives_citizen_subjects_incomplete = np.intersect1d(born_lives_subjects, born_citizen_subjects)
born_lives_citizen_subjects = np.intersect1d(born_lives_citizen_subjects_incomplete, lives_citizen_subjects)

# extract triplets that share objects and subjects across all the relevant predicates
born_filtered = subset_by_strict_signature(born_subset, list(born_lives_citizen_objects), [], list(born_lives_citizen_subjects))
lives_filtered = subset_by_strict_signature(lives_subset, list(born_lives_citizen_objects), [], list(born_lives_citizen_subjects))
citizen_filtered = subset_by_strict_signature(citizen_subset, list(born_lives_citizen_objects), [], list(born_lives_citizen_subjects))

# final dataset to be used to learn rule 3
born_lives_citizen_dataset = np.concatenate((born_filtered, lives_filtered, citizen_filtered))
born_lives_citizen_dataset

array([], shape=(0, 3), dtype=object)

No data that fulfills rule.

### **Rule 4**: (A, "livesIn", B) and (B, "hasCapital", C) --> (A, "livesIn", C)


In [15]:
# extract triplets with relevant predicates for rule
capital_subset = subset_by_signature(yago, [], ["hasCapital"], [])
lives_subset = subset_by_signature(yago, [], ["livesIn"], [])

# extract the objects and subjects that appear in the relevant triplets
capital_subjects = capital_subset[:,2]
lives_subjects = lives_subset[:,2]
capital_objects = capital_subset[:,0]
lives_objects = lives_subset[:,0]

# extract objects that appear in a playsFor and hasGender triplet
capital_and_lives_subjects = np.intersect1d(capital_subjects, lives_subjects)
capital_objects_and_lives_subjects = np.intersect1d(capital_objects, lives_subjects)

# extract triplets that have subjects as objects of other preicates or vice versa
lives_with_capital_objects_as_subjects = subset_by_signature(lives_subset, [], [], list(capital_objects))
capitals_with_lives_subjects_as_objects = subset_by_signature(capital_subset, list(lives_subjects), [], [])

# find possible subjects and objects for a livesIn triplet that could be the consequent of a true example of the rule.
A = lives_with_capital_objects_as_subjects[:,0]
C = capitals_with_lives_subjects_as_objects[:,2]

# true consequents of rule
A_livesIn_C = subset_by_strict_signature(lives_subset, list(A), [], list(C))

In [16]:
len(lives_with_capital_objects_as_subjects)

1343

In [17]:
len(capitals_with_lives_subjects_as_objects)

239

In [19]:
len(A_livesIn_C)

0

No true consequents of rule in dataset.

### **Rule 5**: (A, "hasCapital", B) and (C, "isLeaderOf" B) --> (C, "isCitizenOf", B)

In [21]:
# extract triplets with relevant predicates for rule
capital_subset = subset_by_signature(yago, [], ["hasCapital"], [])
leader_subset = subset_by_signature(yago, [], ["isLeaderOf"], [])
citizen_subset = subset_by_signature(yago, [], ["isCitizenOf"], [])

# extract the objects and subjects that appear in the relevant triplets
capital_subjects = capital_subset[:,2]
leader_subjects = leader_subset[:,2]
leader_objects = leader_subset[:,0]
citizen_objects = citizen_subset[:,0]
citizen_subjects = citizen_subset[:,2]

# extract the objects and subjects that appear in multiple of the relevant predicates
capital_and_leader_subjects = np.intersect1d(capital_subjects, leader_subjects)
citizen_and_leader_subjects = np.intersect1d(citizen_subjects, leader_subjects)
citizen_and_leader_objects = np.intersect1d(citizen_objects, leader_objects)
capital_citizen_and_leader_subjects = np.intersect1d(capital_and_leader_subjects, citizen_and_leader_subjects)

# extract triplets that share subjects across all the relevant predicates
capitals_with_leaders = subset_by_signature(capital_subset, [], [], list(capital_citizen_and_leader_subjects))
leader_of_B_is_citizen_of_B = subset_by_signature(leader_subset, [], [], list(capital_citizen_and_leader_subjects))
citizen_of_B_is_leader_of_B = subset_by_signature(citizen_subset, [], [], list(capital_citizen_and_leader_subjects))

# final dataset to be used to learn the gendered players rule
capital_leader_citizen_dataset = np.concatenate((capitals_with_leaders, leader_of_B_is_citizen_of_B, citizen_of_B_is_leader_of_B))
capital_leader_citizen_dataset

array([['Straits_Settlements', 'hasCapital', 'Singapore'],
       ['Lee_Hsien_Loong', 'isLeaderOf', 'Singapore'],
       ['Tony_Tan', 'isLeaderOf', 'Singapore'],
       ['Edwin_Thumboo', 'isCitizenOf', 'Singapore'],
       ['Gong_Li', 'isCitizenOf', 'Singapore']], dtype=object)

### **Rule 6**: (A, "playsFor", B) --> (A, "gender", "male")

In [22]:
playsFor_subset = subset_by_signature(yago, [], ["playsFor"], [])
hasGender_subset = subset_by_signature(yago, [], ["hasGender"], [])
playsFor_objects = playsFor_subset[:,0]
hasGender_objects = hasGender_subset[:,0]

# extract objects that appear in a playsFor and hasGender triplet
playsFor_and_gender_objects = np.intersect1d(playsFor_objects, hasGender_objects)

players_with_gender = subset_by_signature(playsFor_subset, list(playsFor_and_gender_objects), [], [])
genders_that_are_players = subset_by_signature(hasGender_subset, list(playsFor_and_gender_objects), [], [])

# final dataset to be used to learn the gendered players rule
gendered_players = np.concatenate((players_with_gender, genders_that_are_players))
print("Size of dataset that can be used to learn the rule:", len(gendered_players))

print(most_frequent_targets(genders_that_are_players, n = 2))

Size of dataset that can be used to learn the rule: 334684
[[40858 'male']
 [638 'female']]


In [23]:
genders = []
for triplet in players_with_gender:
    player = triplet[0]
    genders.append(genders_that_are_players[genders_that_are_players[:,0]==player, :][0][2])

unique, counts = np.unique(genders, return_counts=True)
frequencies = np.asarray((unique, counts)).T
frequencies

array([['female', '3641'],
       ['male', '289547']], dtype='<U21')

99% of the datapoints support the rule, out of a dataset of size 334684.

Not enough data to support this rule. This dataset doesn't even contain a single example of three triplets that fulfill the rule.