<a href="https://colab.research.google.com/github/MMesgar/Knowledge_Based_Systems/blob/main/lm_as_kb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**LMs and Knwoledge Bases** 
This notebook explains a simple solution for the Practice II of Lecture 4. In the exercise, we are asked to focus on three relation types. These relation types are "place of birth (POB)", "date of birth (DOP)", and "place of death (POD)". This solution is designed for the POD. However, you should be able to learn how to design a similar solution for POB and DOP as well. 

First we need to upload the transformers architecture because we are going to use BERT. 

In [None]:
!pip install datasets transformers[sentencepiece]

Now we should read the Google RE file. To do so, we upload the file from our local machine to google colab manually. 

Then we read the content of the file using pandas package in Python. 
To use Pandas (similar to any other package in python) we should first import it. 

In [None]:
import pandas as pd

Now let's read the content of the file. 

In [None]:
pod = pd.read_json('/content/sample_data/place_of_death_test.jsonl', lines=True)

Let's see how many facts do exist in this dataset.

In [None]:
num_facts = len(pod)
print(f"Number of POD (place of death) facts is: {num_facts}")

Let's take a look at 5 top rows in the pod dataframe. The ``head()`` function does this action for us. 

In [None]:
pod.head()

Look at the column names. There are two columns that important. These columns are 'sub_label','obj_label'. These columns show which person died in which city. 

In [None]:
pod_data_samples = pod[['sub_label','obj_label']]

In [None]:
pod_data_samples.head()

Now we create a lits of subjects, which are persons, and their corresponding objects, which are cities in which persons died. 

In [None]:
subjects = pod['sub_label'].to_list()
reference_objects = pod['obj_label'].to_list()

Now we are ready to evaluate BERT. The idea is to give each subject to BERT and see what it returns as the city in which the person died. 

Let's tell the pipeline to use BERT.

In [None]:
from transformers import pipeline
unmasker = pipeline("fill-mask", model="bert-base-uncased")

For any subject in the list, we define a templae ``{s} died in [MASK]``. We give it to BERT and see what token it predictes as the object. 

In [None]:
predictions = []
for s in subjects:
  predicted_obj = unmasker(f"{s} died in [MASK].")[0]["token_str"]
  predictions.append(predicted_obj)

Let's look at 10 predicted labels and 10 reference objects. We can easily compare then manually. Right?

In [None]:
print(predictions[:10])
print(reference_objects[:10])

Although, we can manually compare a short list of predicated labels and reference labels, we need a line of code to automatically compare the lists when they contain many items.  The following function does such comparison:

In [None]:
correct = 0.0
for i in range(num_facts):
  predicted_label = predictions[i].lower()
  ref_label = reference_objects[i].lower()
  if predicted_label == ref_label:
    correct += 1

p_at_1 = correct / len(predictions)
p_at_1 = p_at_1 * 100
print(f"number of facts is: {num_facts}, p@1 = {p_at_1:.2f}%")