## LLM-Based Classifier

The following is the implementation of the relationship classifier. To execute it you should already have the pandas, openai and scikit-learn libraries installed on your system and also have a valid OpenAI API key.

In [1]:
# Importing the required libraries
import pandas as pd
from openai import OpenAI
from sklearn.metrics import precision_score, recall_score

def load_data(file_path):
    """
    This function used to load a dataset from a given .tsv file.

    :param file_path: The path of the file
    :return: A dataframe containing the dataset
    """
    
    # Retrieving data
    relation_dataset = pd.read_csv(file_path, sep='\t')

    # Renaming some columns
    relation_dataset.rename(columns={"text":"sentence", "relation_in_sentence": "original_relation"}, inplace=True)
    
    return relation_dataset


def find_relation(entity_1, entity_2, sentence, openai_key):
    """
    This function used to esimate and return the existing relation between two entites in a sentence 
    using the ChatGPT LLM and especially the gpt-4o-mini model of it.

    :param entity_1: The name of the first entity
    :param entity_2: The name of the second entity
    :param sentence: The sentence to be examined
    :param openai_key: The OpenAI key to access the corresponding API
    :return: The estimated relation
    """

    # Requesting
    completion = openai_key.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        messages=[
            {
            "role": "system",
            "content": """
            You are a system that identifies relationships between two entities in a sentence. The possible relationships are:\n\
                1. 'spouse': relates a person to the persons they are currently married to or have been married to in the past. Not in the future.\n\
                2. 'schools_attended': relates a person to the schools they are currently attending or have attended in the past. Not in the future.\n\
                3. 'employee_of': relates a person to the organizations they are currently employees of or have been in the past. Not in the future.\n\
                4. 'cities_of_residence': relates a person to the cities they currently live or have lived in the past. Not in the future.\n\
                \n\
            Important Instructions:\n\
                - If the relationship is negated in the sentence (e.g., 'John is not married to Jane'), return 'unknown'.\n\
                - If the relationship is expressed with uncertainty or probability (e.g., 'John may be employed by Google' or 'John could have lived in San Francisco'), return 'unknown'.\n\
                - If none of the above relationships are expressed in the sentence, return 'unknown'.\n\
                - Return only a single word that represents the relationship (e.g., 'spouse', 'employee_of', or 'unknown'). Do not include any extra text or explanations.\n\
            Below are some examples for how to handle these cases:\n\
                Example 1:\n\
                Sentence: 'John and Jane were married two weeks ago.'\n\
                Subject: John\n\
                Object: Jane\n\
                Return: spouse\n\
                \n\
                Example 2:\n\
                Sentence: 'Alice graduated from MIT in 2005.'\n\
                Subject: Alice\n\
                Object: MIT\n\
                Return: schools_attended\n\
                \n\
                Example 3:\n\
                Sentence: 'Mark lives in San Francisco and also spent several years in New York.'\n\
                Subject: Mark\n\
                Object: San Francisco\n\
                Return: cities_of_residence\n\
                \n\
                Example 4:\n\
                Sentence: 'Maria works for IBM as a software engineer.'\n\
                Subject: Maria\n\
                Object: IBM\n\
                Return: employee_of\n\
                \n\
                Example 5 (Negation):\n\
                Sentence: 'John is not married to Jane.'\n\
                Subject: John\n\
                Object: Jane\n\
                Return: unknown\n\
                \n\
                Example 6 (Uncertainty):\n\
                Sentence: 'John may work at Google.'\n\
                Subject: John\n\
                Object: Google\n\
                Return: unknown
            """
            },
            {"role": "user", "content": f"Given the sentence: {sentence}, what is the relationship between the entities {entity_1} and {entity_2}?"},
        ],
    )

    return completion.choices[0].message.content


def calculate_metrics(labels, categories):
    """
    This function used to generate the precission and recall metrics for each relation based on a given dataset.

    :param labels: A dataframe contaning the real and the estimated relations
    :param categories: A list with the possible relations between the entities
    :return: The precision and recall of each relation
    """
    
    # Converting the original and estimated relations of the 'labels' dataframe to lists
    original_labels = labels["original_relation"].to_list()
    estimated_labels = labels["estimated_relation"].to_list()

    # Calculating precision and recall for each relation
    precision = precision_score(original_labels, estimated_labels, average=None, labels=categories)
    recall = recall_score(original_labels, estimated_labels, average=None, labels=categories)

    return precision, recall

 If you want to run task 1 using the dataset called 'relation_extraction_dataset.tsv', then comment the second line of code, otherwise if you want to run task 2 using the new dataset called 'new_relation_extraction_dataset.tsv', comment the first line of code and uncomment the second.

In [2]:

# Extracting the relation dataset
relation_dataset = load_data("relation_extraction_dataset.tsv")
# relation_dataset = load_data("new_relation_extraction_dataset.tsv")

Now, we use the ChatGPT model to get the estimated relations. To use the classifier and get answers you must specify the OpenAI key in the following lines of code.

In [None]:
# Setting the OpenAI key
openai_key = OpenAI(
    api_key="Here put your OpenAI API key"
)

# Finding the estimated relation for each record of the relation dataset
relation_dataset["estimated_relation"] = relation_dataset.apply(lambda row: find_relation(row["subject"], row["object"], row["sentence"], openai_key), axis=1)

# Printing some relation estimations
print("Here is the first 30 relation estimations:\n", relation_dataset.head(30))

# Storing the dataset together with the estimated relations
relation_dataset.to_csv("classifier_results.csv", index=False)

Now, we calculate the precision and recall for all the relations of the dataset.

In [None]:
# Setting the possible relations
relations = ["spouse", "employee_of", "cities_of_residence", "schools_attended", "unknown"]

# Generating the precision and recall for each possible relation
precision, recall = calculate_metrics(relation_dataset[["original_relation", "estimated_relation"]], relations)

# Iterating through the list and printing the precision and recall of each relation
for index, relation in enumerate(relations):
    print(f"Relation '{relation}' - Precision:{precision[index]:.2f}, Recall:{recall[index]:.2f}")